Base64 Encoding Wiki
A comprehensive guide to understanding the technical details, history, and applications of Base64 encoding.
Introduction to Base64
Base64 is a binary-to-text encoding scheme that represents binary data in an ASCII string format. It is designed to carry data stored in binary formats across channels that only reliably support text content. The term "Base64" comes from the specific encoding scheme that uses a 64-character alphabet.
History
The Base64 encoding scheme was first proposed by RFC 989 in 1987 for use in Privacy Enhanced Mail (PEM), which allowed email systems to send binary files as email attachments. It was later refined in RFC 1421 and other RFCs, eventually becoming a standard component in many internet applications and protocols.
Key Historical RFCs:
- RFC 989 (1987) - Original specification in Privacy Enhanced Mail
- RFC 1421 (1993) - PEM refinements
- RFC 2045 (1996) - MIME Base64 implementation
- RFC 3548 (2003) - Base16, Base32, and Base64 Data Encodings
- RFC 4648 (2006) - The Base16, Base32, and Base64 Data Encodings (Current)
How Base64 Encoding Works
The Base64 encoding process converts binary data into text by following these steps:
- Take 3 bytes (24 bits) of binary data.
- Divide those 24 bits into 4 groups of 6 bits each.
- Convert each group of 6 bits to a decimal value (0-63).
- Map each decimal value to a character in the Base64 alphabet.
The Base64 Alphabet
A-Z (26 characters)
NOPQRSTUVWXYZ
a-z (26 characters)
nopqrstuvwxyz
0-9 (10 characters)
Special (2 characters)
If the input data length is not divisible by 3, padding characters (=) are added to ensure the encoded output length is a multiple of 4.
Example Encoding
Let's encode the text "Man" to Base64:
Input Text
ASCII Values
Binary
6-bit Groups
Decimal Values
Base64 Characters
Final Base64 Result
Base64 Variants
Several variants of Base64 encoding exist for specific use cases:
Standard Base64
Uses A-Z, a-z, 0-9, +, / with = as padding. This is the most common variant used in most applications.
URL-safe Base64
Uses A-Z, a-z, 0-9, -, _ instead of +, / to avoid URL encoding issues when used in URLs, filenames, and other contexts where certain characters have special meaning.
MIME Base64
Line length limited to 76 characters with CRLF line breaks. This variant is used in email systems to comply with MIME specifications for maximum line length.
Base64 without padding
Omits the = padding characters. This variant is often used in contexts where the padding is not necessary for decoding, such as in JSON Web Tokens (JWT).
Common Use Cases
Email Attachments
Email systems were originally designed to handle only ASCII text. Base64 encoding allows binary files to be attached to emails by converting them to text. MIME (Multipurpose Internet Mail Extensions) uses Base64 to encode attachments like images, documents, and executables.
Data URIs
Data URIs allow embedding small files directly in HTML or CSS by encoding them in Base64. This technique can reduce HTTP requests and improve page load times for small images.
Example: Embedding a small PNG image in HTML
<img src="" alt="Tiny PNG" />
API Communication
Base64 is used in HTTP Basic Authentication to encode username and password combinations:
Example: HTTP Basic Authentication Header
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
Note: "dXNlcm5hbWU6cGFzc3dvcmQ=" is the Base64 encoding of "username:password"
JSON Web Tokens (JWT)
JWTs use Base64URL encoding for their header and payload components. This allows JWTs to be passed in URLs, cookies, and HTTP headers without encoding issues.
Example: JWT Token Structure
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
Format: header.payload.signature (each part is Base64URL encoded)
Advantages and Disadvantages
Advantages
- Allows binary data to be transmitted as text
- Implemented in almost all programming languages
- Safe to use in URLs, XML, and JSON
- Can be easily reversed to retrieve the original data
- Standardized and widely supported
Disadvantages
- Increases data size by approximately 33% (4 bytes for every 3 bytes)
- Not a form of encryption or security measure
- Can be computationally intensive for very large data
- Multiple variants can cause compatibility issues if used incorrectly
Implementation in Programming Languages
JavaScript
// Encoding const encoded = btoa('Hello, World!'); // Result: SGVsbG8sIFdvcmxkIQ== // Decoding const decoded = atob('SGVsbG8sIFdvcmxkIQ=='); // Result: Hello, World!
Python
import base64 # Encoding encoded = base64.b64encode(b'Hello, World!') # Result: b'SGVsbG8sIFdvcmxkIQ==' # Decoding decoded = base64.b64decode(b'SGVsbG8sIFdvcmxkIQ==') # Result: b'Hello, World!'
PHP
<?php // Encoding $encoded = base64_encode('Hello, World!'); // Result: SGVsbG8sIFdvcmxkIQ== // Decoding $decoded = base64_decode('SGVsbG8sIFdvcmxkIQ=='); // Result: Hello, World! ?>