Cryptopals Challenge 1.1: Convert hex to base64

This is the first set and first challenge of Cryptopals. Many will come after this. Before directly jumping into the topic, I wanted to shortly explain what Cryptopals is and why I do these.

What is Cryptopals?

To shortly explain what Cryptopals is, Cryptopals is a website with 8 sets of cryptography challenges, each having different number of challenges inside, with increasing difficulty.

Why Do I Do These Challenges?

I wanted to deepen my knowledge about cryptography, which is a field I want to specialize in future. It will potentially increase my programming skills and mathematical way of thinking too!

Why Do I Write my Solutions?

I wanted to show my way of thinking about coming up with solutions, encountering and overcoming problems, and maybe it may motivate someone else to start getting their hands dirty with cryptography.

Another reason is because I written an “.el” program for my Emacs which automatically converts .org files into HTML with formatting and also with an index to display on the “Latest Posts” part in “Blog” section for my website. I may write a post from which I explain how it works in future.

Challenge

Now without any delay, let’s start actually solving the first challenge for Set 1.

Description

Here is what the challenge asks us to do:

The string:

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

Should produce:

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

So go ahead and make that happen. You’ll need to use this code for the rest of the exercises.

Cryptopals Rule: Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.

Alright, so basically we need to take a hex string and convert it to base64. Sounds simple enough, but that “raw bytes” rule is important, let me explain why.

Solution

When I first approached this challenge, I needed to understand what was being asked:

  • Convert a hex-encoded string to base64
  • What did author of this problem meant by “Always operate on raw bytes, never on encoded strings”.

Well I knew that hex and base64 were just different representations of the same underlying raw bytes but I didn’t really get why we operate on raw bytes.

To fully understand the potential problem, I started researching how hex and base64 encoding work. Through reading various programming resources (aka Wikipedia :)), I learnt that:

  • Hex encoding uses two characters (0-9, a-f) to represent each byte
  • Base64 encoding uses four characters to represent every three bytes
  • Both are just encoding schemes, not the data itself.

So to clarify this problem and understand why we can’t just convert hex directly to base64, let me give an example:

  1. If treat hex string as literal (WRONG):

    Input hex: "49276d"
    (Treating as 6 ASCII characters)
    -> '4', '9', '2', '7', '6', 'd'
    base64: "NDkyNzZk"
    
  2. If we do hex decoding (CORRECT):

    Input hex: "49276d" (6 characters)
    
    Step 1: Decode hex to raw bytes
    Hex pair "49" → byte 0x49 (decimal 73)
    Hex pair "27" → byte 0x27 (decimal 39)
    Hex pair "6d" → byte 0x6d (decimal 109)
    Result: 3 raw bytes [0x49, 0x27, 0x6d]
    
    What these bytes represent in ASCII:
    0x49 = 'I'
    0x27 = '''
    0x6d = 'm'
    (spells "I'm")
    
    Step 2: Encode to base64
    Base64: "SSdt"
    

As we can see the results are completely different.

Finding a Library

After I understood the reason why we needed to convert to raw bytes and not just directly convert our hex into base64, I needed to find a library to use it along my “adventure”.

After looking at forums, I found a library called “Crypto++”, and as far as I understand, it’s like an industry standard library for cryptography in C++.

I read the documentation to see how I can use it and I found out that there are things called StringSource and StringSink which pipelines input and output of strings together which automatizes the procedure.

I found out that there is also a way of writing things line by line (if you are familiar with PyTorch, you may remember writing the model’s layers line by line) which may be cumbersome, so I decided to learn these String Source logic.

Source & Sink Logic

The StringSource and StringSink pattern in Crypto++ works like a pipeline. Think of it as a water system:

  • StringSource is the water tank (input) - it pumps data through the system
  • Filters (like HexDecoder, Base64Encoder) are the pipes that transform the data
  • StringSink is the collection bucket (output) - it stores the final result

The data flows from the outermost component to the innermost:

StringSource → HexDecoder → StringSink
(input)       (transform)   (output)

The true parameter in StringSource means “pump all data through immediately” - it processes everything in one go rather than chunk by chunk.

The Thought Process

After understanding these underlying things, the problem is quite easy to solve. First we import our libraries to use, I also decided to use namespace for CryptoPP (Crypto++) library:

#include <cryptopp/filters.h>
#include <iostream>
#include <cryptopp/base64.h>
#include <cryptopp/hex.h>

using namespace CryptoPP;

Then inside the main, we create a string called “usr_input” which holds our input.

std::string usr_input =
    "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69"
    "736f6e6f7573206d757368726f6f6d";

Then we create a StringSource called “ss1” that takes usr_input as input, with the true flag telling it to process all the data immediately. Inside our StringSource, we create a HexDecoder filter that will convert the hex string to raw bytes, and finally a StringSink that stores the result in raw_bytes. The data flows through this pipeline automatically.

// Hex to raw bytes
std::string raw_bytes;
StringSource ss1(usr_input, true,
         new HexDecoder(new StringSink(raw_bytes)));

Now we do the same thing for the base64 conversion. We create another StringSource called “ss2” that takes our raw_bytes as input. This time we use a Base64Encoder filter instead of a decoder, since we’re encoding the raw bytes into base64 format. The result gets stored in base64_output through the StringSink.

// Raw Bytes to Base64
std::string base64_output;
StringSource ss2(raw_bytes, true,
         new Base64Encoder(new StringSink(base64_output)));

And lastly, we are going to print out our results.

std::cout << "User Input: " << usr_input << "\n";
std::cout << "Raw Bytes: " << raw_bytes << "\n";
std::cout << "Base64: " << base64_output << "\n";

Thus, we finished the first challenge! This is first of many to come.

You can reach the full code at my GitHub: https://github.com/AydoganArslantash/Cryptopals-Solutions/blob/main/set1/chal1.cpp