How to Encode data in Base64

2024-11-10

How to Encode data in Base64

Introduction

Base64 is a method for encoding binary data into text. Computers use binary data, which is a series of 0s and 1s, to represent everything from text to images. However, binary data isn’t easy to read or send through systems that are designed to handle text, like email, web APIs, configuration files, URLs, or websites. Base64 encoding solves this problem by converting binary data into a string of readable text made up of letters, numbers, and a couple of special symbols.

The main purpose of Base64 encoding is to keep the original data safe and unchanged during transport or storage. It does this by converting the binary data into a format that text-based systems can handle. In this guide, we’ll break down the Base64 encoding process from scratch, focusing on every detail, so even beginners can follow along easily.

Let’s start with encoding a simple example, then explore padding and see why and how it’s used, followed by practice examples.

Understanding 24-Bit Grouping in Base64

To encode data in Base64, we start by breaking it down into chunks of 24 bits. This is just a fancy way of saying we work with 3 bytes at a time (since each byte is 8 bits, and 3 x 8 = 24 bits).

Why do we use 24 bits? Well, Base64 divides each 24-bit chunk into four smaller groups, each 6 bits long. But why 6 bits? The Base64 alphabet has 64 characters, and 6 bits can represent 64 different values (since 2^6 = 64), which is why it's called "Base64."

Example:

  • Take 3 bytes of data (24 bits).
  • Split them into four 6-bit groups.
  • Convert each group to a character from the Base64 set.

This way, every chunk of 3 bytes (24 bits) becomes 4 Base64 characters.

What if the data doesn’t fit perfectly into 24 bits?

If we’re left with only 1 or 2 bytes at the end, we can’t split it into four 6-bit groups evenly. So, we add extra symbols, called padding (=), to make it a full 24-bit chunk. This allows Base64 to handle any amount of data reliably.

Encoding Data in Base64 - Example with "CAT"

Let’s encode the word "CAT" in Base64, step-by-step.

Steps to Encode "CAT" in Base64

  1. Convert Each Character to Binary: For your reference :(ASCII Table is Here)

    • In ASCII, each character corresponds to an 8-bit (1-byte) binary number.
    • For "CAT":
      • C → ASCII 67 → Binary 01000011
      • A → ASCII 65 → Binary 01000001
      • T → ASCII 84 → Binary 01010100
    • Combining these gives us: 01000011 01000001 01010100.
  2. Split into 6-Bit Groups:

    • Base64 uses 6-bit groups, so we split our 24-bit binary sequence:
      010000 | 110100 | 000101 | 010100
      
  3. Convert Each 6-Bit Group to Decimal:

    • Now, we convert each 6-bit binary group to a decimal number:
      • 010000 → 16
      • 110100 → 52
      • 000101 → 5
      • 010100 → 20
  4. Map Each Decimal to a Base64 Character: For your reference :(Base64 Table is Here)

    • Each decimal number corresponds to a Base64 character:
      • 16 → Q
      • 52 → 0
      • 5 → F
      • 20 → U
    • The Base64 encoding of "CAT" is: Q0FU.

And that’s it! Since “CAT” fits perfectly into 3 bytes, no padding is needed here. Now, let’s see what happens when the data doesn’t fit as evenly.

Understanding Padding in Base64

Base64 encoding requires a set of four 6-bit groups for each 24-bit chunk. If data doesn't fit into this format, we add padding using the = symbol. Padding ensures the encoded text is the right length, even when the input text doesn’t divide perfectly by 3 bytes.

Note that padding is essential only if the data isn't a multiple of 24 bits (3 bytes). Each set of four 6-bit groups corresponds to 4 Base64 characters.

How Padding Works:

  • If there’s 1 extra byte left after encoding, we add one =.
  • If there are 2 extra bytes left, we add two ==.

Encoding Example with Padding – "Hi"

Let’s go through encoding "Hi" in Base64 to understand how padding works when the input isn’t a full 3 bytes.

  1. Convert Each Character to Binary:

    • "Hi" has two characters:
      • H → ASCII 72 → Binary 01001000
      • i → ASCII 105 → Binary 01101001
    • Combined, we get: 01001000 01101001 (only 16 bits instead of 24).
  2. Split into 6-Bit Groups:

    • Join the bits together: 0100100001101001.
    • Split into 6-bit groups we notice that it’s only 16 bits. To make it compatible with Base64, we add extra 00 bits at the end to bring it up to 18 bits:
      010010 | 000110 | 100100
      
    • Note: Extra zeros are placeholders only, used to maintain format. The padding = symbol indicates their presence without altering the original data.
    • Now we have only 3 groups, but Base64 requires 4 groups.
  3. Add Padding:

    • Since we’re missing one 6-bit group, we add one = at the end to complete the Base64 block.
  4. Convert Each 6-Bit Group to Decimal:

    • Convert each group to decimal:
      • 010010 → 18
      • 000110 → 6
      • 100100 → 36
  5. Map to Base64 Characters:

    • Using the Base64 table, we map each decimal:
      • 18 → S
      • 6 → G
      • 36 → k
    • Adding our padding symbol, the Base64 encoding for "Hi" is: SGk=.

Practice Examples

To make sure everything’s clear, here are two more examples covering both cases: with and without padding.

Example 1: Encoding "A" (Two == Padding)

  1. Text: "A"

  2. Convert to Binary:

    • "A" → ASCII 65 → Binary 01000001
    • This gives us only 1 byte (8 bits).
  3. Split into 6-Bit Groups:

    • After splitting, we have:
      010000 | 010000
      
    • We’re missing two 6-bit groups, so we’ll need two == for padding.
  4. Convert to Decimal:

    • Convert each 6-bit group:
      • 010000 → 16
      • 010000 → 16
  5. Map to Base64 Characters:

    • Map each decimal to Base64:
      • 16 → Q
      • 16 → Q
    • With padding, the final Base64 encoding is: QQ==.

Example 2: Encoding "Go" (One = Padding)

  1. Text: "Go"

  2. Convert to Binary:

    • "G" → ASCII 71 → Binary 01000111
    • "o" → ASCII 111 → Binary 01101111
    • Combined: 01000111 01101111 (16 bits).
  3. Split into 6-Bit Groups:

    • Join these bits, then split:
      010001 | 110110 | 111100
      
    • We have only 3 groups, so we’ll add one = for padding.
  4. Convert to Decimal:

    • Convert each 6-bit group:
      • 010001 → 17
      • 110110 → 54
      • 111100 → 60
  5. Map to Base64 Characters:

    • Map each decimal to Base64:
      • 17 → R
      • 54 → 2
      • 60 → 8
    • Adding the padding gives us: R28=.

Conclusion

In conclusion, Base64 encoding is an easy and reliable way to convert binary data into a text format that can be shared across different systems. By breaking data into 24-bit chunks and dividing it into 6-bit groups, you ensure that the original information remains intact and is easily transmitted. With the addition of padding when needed, Base64 allows for smooth handling of any text or binary data. With practice, you’ll find that these steps become second nature, making Base64 a valuable tool in your data encoding toolkit.

Back to Blog

Recent Posts