Project Euler Problem 7: 10001st Prime

The Problem

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13. What is the 10001st prime number?

First Thoughts: Sieve of Eratosthenes

My first instinct was the Sieve of Eratosthenes. It is the classic algorithm for this kind of problem and the idea is clean. You start with all numbers marked as prime, then for each prime you find, you go through and mark all its multiples as composite. What is left unmarked at the end is your list of primes.

Let me walk through a small example to make it concrete. Say you want all primes up to 20.

Start with every number from 2 to 20 marked as prime:

2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20

Take the first prime, 2. Mark every multiple of 2 as composite, starting from \(2^2 = 4\):

2  3  [4]  5  [6]  7  [8]  9  [10]  11  [12]  13  [14]  15  [16]  17  [18]  19  [20]

Next unmarked number is 3. Mark every multiple of 3 starting from \(3^2 = 9\):

2  3  [4]  5  [6]  7  [8]  [9]  [10]  11  [12]  13  [14]  [15]  [16]  17  [18]  19  [20]

Next is 5. \(5^2 = 25\) which is already beyond 20, so nothing new gets marked. Same for 7 and beyond. What remains unmarked is your answer:

2  3  5  7  11  13  17  19

The reason you start marking from \(i^2\) instead of \(2i\) is that any smaller multiple of \(i\) would already have been marked by a previous prime. For example when you are at \(i = 5\), the multiples 10, 15, 20 were already marked by 2 and 3. So you can safely skip straight to 25.

The implementation I had uses a std::vector<bool> where the index represents the number itself, and the value represents whether it is still considered prime:

#include <iostream>
#include <vector>

static const int UPPER_BOUND = 120000;
static const int TARGET = 10001;

int main()
{
    std::vector<bool> is_prime(UPPER_BOUND + 1, true);
    is_prime[0] = false;
    is_prime[1] = false;
 
    for (int i = 2; i <= UPPER_BOUND; i++)
    {
        if (is_prime[i])
        {
            for (int j = i * i; j <= UPPER_BOUND; j += i)
            {
                is_prime[j] = false;
            }
        }
    }
 
    int count = 0;
    for (int i = 2; i <= UPPER_BOUND; i++)
    {
        if (is_prime[i])
        {
            count++;
            if (count == TARGET)
            {
                std::cout << "10001st prime: " << i << "\n";
                return 0;
            }
        }
    }
 
    return 0;
}

This works and is fast, but it has a fundamental problem: you need to know the upper bound ahead of time. I used 120000 here which works because the 10001st prime happens to fall below it, but that is a priori knowledge. If someone asked for the 50000th prime, this would silently give the wrong answer or find nothing.

The INT_MAX Attempt

The natural response to the hardcoded upper bound problem is to just use INT_MAX and be done with it. No prior knowledge needed, the range covers every possible int value. Here is what that looked like:

#include <iostream>
#include <vector>
#include <limits.h>

static const int UPPER_BOUND = INT_MAX;
static const int TARGET = 10001;

int main()
{
    std::vector<bool> is_prime(UPPER_BOUND + 1, true);  // core dump here
    is_prime[0] = false;
    is_prime[1] = false;
 
    for (int i = 2; i <= UPPER_BOUND; i++)
    {
        if (is_prime[i])
        {
            for (int j = i * i; j <= UPPER_BOUND; j += i)
            {
                is_prime[j] = false;
            }
        }
    }
    // ...
}

This core dumps immediately. The reason is UPPER_BOUND + 1 on the vector constructor line. INT_MAX is \(2{,}147{,}483{,}647\). Adding 1 to that overflows a signed integer and wraps around to a large negative number. The vector then tries to allocate with that garbage size and the program crashes before doing anything useful.

I also tried INT_MAX - 1 to sidestep the overflow:

static const int UPPER_BOUND = INT_MAX - 1;

This avoids the overflow on UPPER_BOUND + 1, but it does not actually solve anything. You are still asking for a std::vector<bool> with 2.1 billion elements. Even though vector<bool> is bit-packed internally, that is still around 256MB just for the array. Then the outer sieve loop runs from 2 to \(2.1 \times 10^9\). This caused around 25GB of memory usage and crashed everything before it got anywhere.

The sieve is fundamentally the wrong tool when you do not know your upper bound. It is a batch algorithm that needs to know the whole search space upfront. Trying to make it general by inflating the upper bound just trades one problem for a worse one.

Switching to Trial Division

Trial division works completely differently. Instead of pre-allocating a search space and eliminating composites, you test each candidate number individually and collect primes as you go. You stop as soon as you have found enough of them. No upper bound needed at all.

Let me walk through the same small example. Say you want the first 8 primes.

Start with 2 in the list. Now test 3: check if any prime up to \(\sqrt{3} \approx 1.7\) divides it. There are none, so 3 is prime. Test 4: \(4 / 2 = 2\) remainder 0, composite. Test 5: check primes up to \(\sqrt{5} \approx 2.2\), so just 2. \(5 / 2\) has remainder 1, so 5 is prime. Test 7: check primes up to \(\sqrt{7} \approx 2.6\), so just 2. \(7 / 2\) has remainder 1, prime. Test 9: check primes up to \(\sqrt{9} = 3\). \(9 / 3 = 3\) remainder 0, composite. And so on.

The \(\sqrt{n}\) cutoff is what makes this efficient. To check if 97 is prime, you only need to test divisors up to \(\sqrt{97} \approx 9.8\). So you only check 2, 3, 5, 7. If none of them divide 97, it is prime. You do not need to check 11, 13, 17 and beyond because if 97 had a factor \(f > \sqrt{97}\), the corresponding paired factor \(97/f\) would have to be less than \(\sqrt{97}\), and you would have already caught it.

And crucially, you only test against previously found primes, not all numbers. There is no point testing if 6 divides your candidate because if 6 divided it, 2 or 3 would have already caught it first.

The Final Solution

#include <iostream>
#include <cmath>
#include <vector>

static const int TARGET = 10001;

int main()
{
    std::vector<int> primes;
    primes.push_back(2);
 
    int candidate = 3;
    while ((int)primes.size() < TARGET)
    {
        bool is_prime = true;
        int root = std::sqrt(candidate);
     
        for (int p : primes)
        {
            if (p > root) break;
            if (candidate % p == 0)
            {
                is_prime = false;
                break;
            }
        }
     
        if (is_prime)
            primes.push_back(candidate);
         
        candidate += 2;
    }
 
    std::cout << TARGET << "st prime: " << primes.back() << "\n";
    return 0;
}

A few things worth pointing out. The candidate starts at 3 and increments by 2 each time, skipping all even numbers since no even number greater than 2 can be prime. This cuts the number of candidates in half. The \(\sqrt{n}\) check means the inner loop breaks early as soon as the prime being tested exceeds the square root of the candidate. The cast (int)primes.size() is there because size() returns an unsigned type and comparing it with a signed TARGET produces a compiler warning.

No upper bound, no fixed array size, no overflow, no 25GB memory usage (yay). The vector holds only confirmed primes and grows exactly as large as it needs to.

Sieve vs Trial Division: When to Use Which

These two algorithms are good at different things.

The sieve is faster when you want all primes up to some known limit. Its time complexity is \(O(n \log \log n)\) which is very close to linear. If you need every prime below a million and you know that upfront, the sieve is the right call.

Trial division is better when you want the $n$-th prime and do not know where it lives. It grows dynamically, uses minimal memory, and never needs to know the answer before computing it. The tradeoff is that in the worst case checking a single candidate takes \(O(\sqrt{n})\) time, but for reasonable values of \(n\) it is more than fast enough.

For this problem, trial division is the right tool. The 10001st prime is not that large and the algorithm finds it quickly without any prior knowledge of where to look.

Key Takeaways

The sieve of Eratosthenes is elegant and fast but it is a batch algorithm that needs to know its search space upfront. Trying to work around that by using INT_MAX just introduces integer overflow and memory allocation problems that make it completely impractical. Trial division sidesteps all of that by being inherently dynamic. Picking the right algorithm for the problem structure matters more than optimizing the wrong one.