Base64 Encoding Wiki

A comprehensive guide to understanding the technical details, history, and applications of Base64 encoding.

Introduction to Base64

Base64 is a binary-to-text encoding scheme that represents binary data in an ASCII string format. It is designed to carry data stored in binary formats across channels that only reliably support text content. The term "Base64" comes from the specific encoding scheme that uses a 64-character alphabet.

History

The Base64 encoding scheme was first proposed by RFC 989 in 1987 for use in Privacy Enhanced Mail (PEM), which allowed email systems to send binary files as email attachments. It was later refined in RFC 1421 and other RFCs, eventually becoming a standard component in many internet applications and protocols.

Key Historical RFCs:

  • RFC 989 (1987) - Original specification in Privacy Enhanced Mail
  • RFC 1421 (1993) - PEM refinements
  • RFC 2045 (1996) - MIME Base64 implementation
  • RFC 3548 (2003) - Base16, Base32, and Base64 Data Encodings
  • RFC 4648 (2006) - The Base16, Base32, and Base64 Data Encodings (Current)

How Base64 Encoding Works

The Base64 encoding process converts binary data into text by following these steps:

  1. Take 3 bytes (24 bits) of binary data.
  2. Divide those 24 bits into 4 groups of 6 bits each.
  3. Convert each group of 6 bits to a decimal value (0-63).
  4. Map each decimal value to a character in the Base64 alphabet.

The Base64 Alphabet

A-Z (26 characters)

ABCDEFGHIJKLM
NOPQRSTUVWXYZ

a-z (26 characters)

abcdefghijklm
nopqrstuvwxyz

0-9 (10 characters)

0123456789

Special (2 characters)

+ /
(= is used for padding)

If the input data length is not divisible by 3, padding characters (=) are added to ensure the encoded output length is a multiple of 4.

Example Encoding

Let's encode the text "Man" to Base64:

Input Text

"Man"

ASCII Values

77 97 110

Binary

01001101 01100001 01101110

6-bit Groups

010011 010110 000101 101110

Decimal Values

19 22 5 46

Base64 Characters

T W F u

Final Base64 Result

TWFu

Base64 Variants

Several variants of Base64 encoding exist for specific use cases:

Standard Base64

Uses A-Z, a-z, 0-9, +, / with = as padding. This is the most common variant used in most applications.

URL-safe Base64

Uses A-Z, a-z, 0-9, -, _ instead of +, / to avoid URL encoding issues when used in URLs, filenames, and other contexts where certain characters have special meaning.

MIME Base64

Line length limited to 76 characters with CRLF line breaks. This variant is used in email systems to comply with MIME specifications for maximum line length.

Base64 without padding

Omits the = padding characters. This variant is often used in contexts where the padding is not necessary for decoding, such as in JSON Web Tokens (JWT).

Common Use Cases

Email Attachments

Email systems were originally designed to handle only ASCII text. Base64 encoding allows binary files to be attached to emails by converting them to text. MIME (Multipurpose Internet Mail Extensions) uses Base64 to encode attachments like images, documents, and executables.

Data URIs

Data URIs allow embedding small files directly in HTML or CSS by encoding them in Base64. This technique can reduce HTTP requests and improve page load times for small images.

Example: Embedding a small PNG image in HTML

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNk+A8AAQUBAScY42YAAAAASUVORK5CYII=" alt="Tiny PNG" />

API Communication

Base64 is used in HTTP Basic Authentication to encode username and password combinations:

Example: HTTP Basic Authentication Header

Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

Note: "dXNlcm5hbWU6cGFzc3dvcmQ=" is the Base64 encoding of "username:password"

JSON Web Tokens (JWT)

JWTs use Base64URL encoding for their header and payload components. This allows JWTs to be passed in URLs, cookies, and HTTP headers without encoding issues.

Example: JWT Token Structure

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Format: header.payload.signature (each part is Base64URL encoded)

Advantages and Disadvantages

Advantages

  • Allows binary data to be transmitted as text
  • Implemented in almost all programming languages
  • Safe to use in URLs, XML, and JSON
  • Can be easily reversed to retrieve the original data
  • Standardized and widely supported

Disadvantages

  • Increases data size by approximately 33% (4 bytes for every 3 bytes)
  • Not a form of encryption or security measure
  • Can be computationally intensive for very large data
  • Multiple variants can cause compatibility issues if used incorrectly

Implementation in Programming Languages

JavaScript

// Encoding
const encoded = btoa('Hello, World!');
// Result: SGVsbG8sIFdvcmxkIQ==

// Decoding
const decoded = atob('SGVsbG8sIFdvcmxkIQ==');
// Result: Hello, World!

Python

import base64

# Encoding
encoded = base64.b64encode(b'Hello, World!')
# Result: b'SGVsbG8sIFdvcmxkIQ=='

# Decoding
decoded = base64.b64decode(b'SGVsbG8sIFdvcmxkIQ==')
# Result: b'Hello, World!'

PHP

<?php
// Encoding
$encoded = base64_encode('Hello, World!');
// Result: SGVsbG8sIFdvcmxkIQ==

// Decoding
$decoded = base64_decode('SGVsbG8sIFdvcmxkIQ==');
// Result: Hello, World!
?>

Ready to try it yourself?