URL Encoding and Base64, Explained

URL encoding and Base64 are two of the most commonly confused transformations on the web. They look superficially similar — both turn a piece of text into a longer, less readable piece of text — and they are often applied in sequence, which only deepens the confusion. Neither is encryption. Neither is compression. Both exist for the same underlying reason: to take data that contains characters a particular channel cannot safely carry, and rewrite it using only characters the channel can. The differences are in which channel and which characters, and that is where almost every bug lives.

What URL encoding actually does

URL encoding, more precisely called percent-encoding, is defined by RFC 3986. It exists because URLs are not allowed to contain arbitrary characters. A URL is parsed by software — proxies, routers, browsers, servers — and a number of characters carry structural meaning. The slash separates path segments. The question mark introduces the query string. The ampersand separates query parameters. The hash introduces a fragment. If any of those characters appear inside a value rather than as structure, the parser will misread the URL.

Percent-encoding solves this by replacing each problem character with a % followed by the two hexadecimal digits of its byte value. A space becomes %20. An ampersand becomes %26. A forward slash becomes %2F. Multi-byte characters under UTF-8 are encoded one byte at a time, so the pound sign £ (two bytes in UTF-8) becomes %C2%A3.

Reserved versus unreserved characters

RFC 3986 splits characters into three groups, and the distinction matters:

Unreserved characters are the ASCII letters, digits, and the four punctuation marks -, ., _, and ~. These are always safe and never need to be encoded.
Reserved characters are the ones that carry structural meaning: :/?#[]@ as general delimiters, and !$'()*+,;= as sub-delimiters. These must be encoded when they appear inside a component value, but left alone when they are doing their structural job.
Everything else — control characters, spaces, non-ASCII bytes — must always be encoded.

This is why "encode this whole URL" is not actually a well-defined operation. You need to know which parts of the string are structure and which are values, because the same character has to be treated differently depending on its role.

encodeURI vs encodeURIComponent in JavaScript

JavaScript ships with two URL encoding functions, and beginners reach for the wrong one with remarkable consistency. The difference is precisely the reserved-character distinction above.

encodeURI assumes its input is an entire URL and leaves the reserved characters alone, on the theory that they are doing structural work. encodeURIComponent assumes its input is a single component value being slotted into a URL, and encodes the reserved characters along with everything else.

The rule is simple: if you are building a URL out of pieces, use encodeURIComponent on each piece, then assemble. Never use encodeURI on a value you are about to interpolate into a query string. Consider:

const query = "cats & dogs";

// Wrong
const bad = "https://example.com/search?q=" + encodeURI(query);
// → https://example.com/search?q=cats%20&%20dogs
// The unescaped & is read as a parameter separator.

// Right
const good = "https://example.com/search?q=" + encodeURIComponent(query);
// → https://example.com/search?q=cats%20%26%20dogs

encodeURI has a narrow legitimate use: cleaning up a URL that already exists but happens to contain a few illegal bytes (a stray space, a non-ASCII character). For anything else, reach for encodeURIComponent. The URL Encoder defaults to the component-level behaviour, which is what you want most of the time.

What Base64 is, and what it is not

Base64 is a way of representing arbitrary binary data using only printable ASCII characters. It uses a 64-character alphabet — the upper and lowercase letters, the ten digits, and the two symbols + and / — plus = as padding. Three input bytes (24 bits) are split into four 6-bit groups, and each group is mapped to one alphabet character. The output is therefore always a multiple of four characters long, padded with =if the input length was not a multiple of three.

Base64 is not encryption. It provides no confidentiality, no integrity, and no authentication. Anyone who sees a Base64 string can decode it back to the original bytes with a single function call. Treating Base64 as a way to hide credentials, protect tokens, or obscure sensitive data is a recurring and embarrassing mistake. If you need secrecy, use real cryptography; if you need to carry binary data through a text channel, Base64 is the right tool.

Why it exists: a brief history

Base64 dates to the early days of email. SMTP and the broader internet mail infrastructure were designed for 7-bit ASCII text, with a maximum line length and a habit of mangling anything that did not fit. To send a binary attachment — an image, a program, a compressed archive — over a channel that might strip the high bit off every byte and break long lines, you first had to rewrite the data using only safe characters. MIME standardised Base64 for this purpose in the early 1990s, and the same trick has been quietly powering binary-over-text transport ever since.

Where you encounter Base64 today

Data URIs in HTML and CSS, where a small image is embedded directly in the markup as data:image/png;base64,iVBORw0KGgo... rather than loaded from a separate file.
JSON Web Tokens (JWTs), which are three Base64-encoded segments separated by dots. The JWT Decoder simply splits on the dots and Base64-decodes each part.
HTTP Basic authentication, where the header value is literally the username and password joined by a colon and Base64 encoded. This is also why HTTPS is non-negotiable for Basic auth: the credentials are trivially readable on the wire.
Embedded binary in JSON, since JSON has no native binary type. Cryptographic keys, file contents, and binary message payloads all typically travel as Base64 strings.
Email attachments, still, more than thirty years later.

The 33% size overhead

Because Base64 packs three bytes into four characters, the encoded output is always about 33% larger than the input. For a small image embedded in a stylesheet, that overhead is invisible. For a multi-megabyte binary payload sent over a slow connection, it matters a great deal. The rule of thumb is: Base64-encode small things that benefit from being inline, and use a separate binary channel for anything large. Wrapping a 50 MB upload in Base64 because it is convenient is a choice you will regret.

URL-safe Base64

Standard Base64 uses + and / in its alphabet, and both have meaning inside URLs. Standard Base64 also uses = for padding, which is reserved in some contexts. The URL-safe variant defined in RFC 4648 substitutes - for + and _ for /, and often omits the padding entirely. This is the form used inside JWTs (where it is called base64url), in OAuth tokens, and in most places where a Base64 string needs to travel through a URL or filename without further encoding.

The same string, both ways

Take a small string that contains characters both transformations have to deal with:

hello world & goodbye

URL-encoded (component form), it becomes:

hello%20world%20%26%20goodbye

The spaces become %20, the ampersand becomes %26, and the letters are untouched. The result is still recognisably the same string and is exactly the same length plus a few escape sequences. It is safe to drop into a query parameter.

Base64-encoded, the same string becomes:

aGVsbG8gd29ybGQgJiBnb29kYnll

The output is opaque. There is no visible relationship between input and output, the length has grown by roughly a third, and you cannot read the original at a glance. But — and this is the point — the output contains no spaces, no ampersands, and no characters that any text channel would object to. It is safe to email, paste into a JSON string, or stick on the end of a URL (after switching to the URL-safe alphabet).

These are different tools for different jobs. URL encoding preserves readability and is the right answer when you are building a URL. Base64 flattens any byte sequence into safe ASCII and is the right answer when you have binary or text-with-arbitrary-encoding to carry through a text-only pipe.

Common bugs

Double encoding. A value gets URL-encoded once on the way out, then URL-encoded again by a framework or middleware that did not realise it was already encoded. Spaces become %2520 (the % from %20 got encoded to %25). The fix is to stop encoding twice, not to decode twice on receipt.
Decoding twice. The mirror image: a value that was encoded once gets decoded twice, mangling any literal % characters that were genuinely in the original.
Treating Base64 as URL-safe. Standard Base64 with + and / in a URL parameter will be interpreted as a space and a path separator on the other end. Use the URL-safe variant, or URL-encode the Base64 output, but pick one.
Mixing the layers. A JWT is Base64-encoded JSON carried inside an HTTP header. Each layer has its own decode step. Trying to URL-decode a JWT signature, or Base64-decode a query string, will give you nonsense at best and a crash at worst.
Forgetting padding. Some Base64 decoders are strict about the trailing = characters and refuse strings without them. Others are lenient. If you are interoperating between systems and seeing intermittent decode failures, check whether one side is stripping padding.

When debugging any of these, the trick is to identify each layer in order and undo them one at a time. Pasting a value into the URL Encoder or the Base64 Encoder and watching what comes out is much faster than reasoning about it from first principles.

The short version

URL encoding makes a string safe to put inside a URL by escaping the characters that would otherwise be parsed as structure. Base64 makes any byte sequence safe to put inside a text channel by re-expressing it using only safe ASCII characters. Neither offers any security. Neither saves space — both make the data larger. Pick the one that matches the channel you are carrying data through, apply it once, decode it once, and most of the problems people have with these two formats simply do not occur.

URL Encoding and Base64, Explained

What URL encoding actually does

Reserved versus unreserved characters

encodeURI vs encodeURIComponent in JavaScript

What Base64 is, and what it is not

Why it exists: a brief history

Where you encounter Base64 today

The 33% size overhead

URL-safe Base64

The same string, both ways

Common bugs

The short version

Tools mentioned in this guide

URL Encoder/Decoder

Base64 Encoder/Decoder

JWT Decoder

More guides

How to Compress Images for the Web Without Losing Quality

A Practical Guide to JSON: Formatting, Validating and Converting

How to Create Strong Passwords (and Why Length Beats Complexity)