Of Course "changeme" Is Valid Base64
Today, I came across
this blog
post regarding how the author of the post used the
string "changeme"
as test data while testing a Base64
decoding functionality in their application. However, the author
incorrectly believed that this test data is not a valid
Base64-encoded string and therefore would fail to decode
successfully when decoded as Base64. To their surprise, they found
that this string "changeme"
does in fact decode
successfully.
The post did not go any further into understanding why
indeed "changeme"
is a valid Base64-encoded string and
why it can successfully be decoded into binary data. It appears
that the author was using Base64 encoding scheme as a black box.
I think it is worth noting and illustrating that any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string. Here are some examples that illustrate this:
$ printf AAAA | base64 --decode | od -tx1 0000000 00 00 00 0000003 $ printf AAAAAAAA | base64 --decode | od -tx1 0000000 00 00 00 00 00 00 0000006 $ printf AQEB | base64 --decode | od -tx1 0000000 01 01 01 0000003 $ printf AQID | base64 --decode | od -tx1 0000000 01 02 03 0000003 $ printf main | base64 --decode | od -tx1 0000000 99 a8 a7 0000003 $ printf scrabble | base64 --decode | od -tx1 0000000 b1 ca da 6d b9 5e 0000006 $ printf 12345678 | base64 --decode | od -tx1 0000000 d7 6d f8 e7 ae fc 0000006
Further, since +
and /
are also used as
symbols in Base64 encoding (for binary 111110
and 111111
, respectively), we also have a few more
intersting examples:
$ printf 1+2+3+4+5/11 | base64 --decode | od -tx1 0000000 d7 ed be df ee 3e e7 fd 75 0000011 $ printf "\xd7\xed\xbe\xdf\xee\x3e\xe7\xfd\x75" | base64 1+2+3+4+5/11
I think it is good to understand why any string with a length that
is a multiple of 4 turns out to be a valid Base64-encoded string.
The Base64 encoding scheme encodes each group of 6 bits in the
binary input with a chosen ASCII character. For every possible
6-bit binary value, we have assigned an ASCII character that appears
in the Base64-encoded string. Each output ASCII character can be
one of the 64 carefully chosen ASCII characters: lowercase and
uppercase letters from the English alphabet, the ten digits from the
Arabic numerals, the plus sign (+
) and the forward
slash (/
). For example, the bits 000000
is encoded as A
, the bits 000001
is
encoded as B
, and so on. The equals sign
(=
) is used for padding but that is not something we
will discuss in detail in this post.
The smallest positive multiple of 6 that is also a multiple of 8 is 24. Thus every group of 3 bytes (24 bits) of binary data is translated to 4 ASCII characters in its Base64-encoded string. Thus the entire input data is divided into groups of 3 bytes each and then each group of 3 bytes is encoded into 4 ASCII characters. What if the last group is less than 3 bytes long? There are certain padding rules for such cases but I will not discuss them right now in this post. For more details on the padding rules, see RFC 4648.
Now as a natural result of the encoding scheme, it turns out that any 4 alphanumeric characters is a valid Base64 encoding of some binary data. That's because for every alphanumeric character, we can find some 6-bit binary data that would be translated to it during Base64 encoding. This is the reason why any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string and can be successfully decoded to some binary data.