Bruteforcing the seed in Code4Life

Introduction

CodinGame’s latest contest Code4Life is over, and I finished with an okay but disappointing 20th place. I will not elaborate on the reasons or my approach because quite frankly, there is not much interesting stuff to say on the topic, and since it did not work out, who cares right?

After estimating my chances of finishing in the top 5 to be very low on the last day, my motivation went out. So I set off on accomplishing a different objective instead. Reading this post on the CG forums, it is clear that one could gain a decisive advantage if somehow the sample composition could be known without diagnosing it. Is it even possible?

My answer is: yes, but not often enough to make a real difference. When the stars align, games like this one or that one are possible. But it might only happen 1 game out of 10, and then the bot would need to win where it would have lost before. In practice it turns out to maybe a 2-3% advantage overall, not enough to cause a significant improvement in ranking.

Nonetheless it was a fun adventure, so this article will document how it was made possible.

Analysing the referee

The referee code being available, it is easy to figure out how the samples are drawn. A simple shuffle operation is done at the beginning of the game on three lists of samples, one per rank, using the Collections.shuffle method, which implements the classic Fisher-Yates shuffle.

Knowing the random seed would therefore make it possible to predict future samples at ease. At first, this line of code in the referee looked very promising:

seed = Long.valueOf(prop.getProperty("seed", String.valueOf(new Random(System.currentTimeMillis()).nextLong())));

This is a seed generated by using the current system time in milliseconds. It is a classic mistake security-wise (e.g. srand(time(0)) in C), as it is incredibly easy to bruteforce a few thousand milliseconds in the past to figure out the real seed. It turned out to be a dead-end however, as the seed seen in the IDE is always roughly comprised of numbers ranging in 109, which is a clearly impossible output from this code. nextLong() returns a 64 bit number, so it would not always be so low in the IDE (and I did check for truncation just in case).

But why 109? I have no idea, but it never goes above this in the IDE from empirical observation. However it is not that many combinations for a computer to make… A dumb test done locally without optimization was able to test it all in under a minute. Not bad. Now the issue is, with only 1 second on the first turn and 50ms for 200 turns, that only gives 11 seconds of CPU time total. And to make the information useful, it needs to be found by mid-game at most, so practically it’s only 6 seconds. Exceptionally, most of it is available since my bot uses very little time due to the nature of the game, so any remaining time can be devoted to bruteforcing the seed.

It still looks like quite the stretch, but let’s try anyway.

Validating the seed

Unfortunately, there is little information available during a game to validate if the seed is correct. First, there are the 3 random science projects selected out of 10 at the start of the game and given on first turn. Then, since everyone draws 3 rank-1 samples at the beginning, there are 6 known samples relatively early by turn 10.

I was not sure if it would be enough, but running another dumb test on several games, it turns out only 1 seed out of 109 can give those results, so finding a seed that works with this input is practically guaranteed to be the right one. Good.

25% of the 6 seconds available is before we get to know the samples however, so to be efficient, the seeds need to be tested on the projects first and the good ones kept until the samples are available for further testing.

Accelerating the shuffle

Naively doing a shuffle of the 3 samples and projects list for every possible seed is way too slow. Rejecting a seed as early as possible is essential to make this work. So how?

The answer lies in digging into the Fisher-Yates algorithm. Starting from the last element, it will swap this element with an element in the list at random, up to the current element (in which case it is effectively doing no swap). If we are given the projects with ids 0, 9 and 3 then the final shuffled array looks like [0 9 3 x x x x x x x]. From the fact that once an element is swapped towards the end, it will never move again, we can deduce that the seven first swaps must NOT target either 0, 9 or 3, otherwise the number will never be at the beginning of the array.

So for example, if the first swap gives the index 0, it can be discarded immediately because 0 will be placed at the end instead of the beginning. So if we test from the projects known at start, on average 30% of the seeds (because we know 3 out of 10 possible entries) can be eliminated in a single operation, 33% of the remaining by the second one, and so on. Nice!

Accelerating the validation

There is another problem however. The projects are shuffled after the samples. This means 87 random numbers (39+29+19 swaps for all sample lists) must be computed before we can test the projects. Clearly unacceptable for performance.

Looking at Java’s random number generation code, it is a simple linear congruential generator, so each random number is just generated by a multiply/add operation. Given this information, you can ‘fast forward’ the seed by any number of steps using math as described in this StackOverflow post. So skipping ahead 87 steps of the seed can be done with the same cost as one step, and makes the validation start with the projects right away as necessary.

Results and source code

After squeezing in as much as I could in the little time I had, the bot was able to test on average 300 million seeds over the course of a full game. It can vary a lot according to the known inputs, since the seeds will not always be discarded with the same efficiency.

Since it did not end up making a difference and with the very low chance of this technique being usable in another game/contest, here is the full source code for your curiosity.

DISCLAIMER: This is poor quality code, even by contest standards. I cleaned and commented some of it to help your reading. Please DO NOT use this code as an indicator of my personal/professional coding standards. :) It could also be further optimized but most likely not by a significant margin.

– reCurse signing off from Hack4Life!


Reference source code:

struct Brute
{
    // Quickly test if a project id is forbidden to swap.
    int8_t projectLookup[16] = { 0 };
    // Quickly test if a sample id is forbidden to swap.
    int8_t sampleLookup[64] = { 0 };
    // Last seed tested, starting at 600 million because why not?
    // Used to be 50 million in case you wonder about the first example game.
    int32_t lastSeed = 600000000;
    // When != 0, seed was found.
    int32_t foundSeed = 0;
    // Project ids we know about.
    int8_t projects[3];
    // Sample ids we know about.
    int8_t samples[6];
    // Do we know samples yet?
    bool hasSamples = false;
    // When things don't go as expected, e.g. the opponent draws rank 2 at start.
    bool disabled = false;
    // All seeds found before we know about samples.
    Array<uint32_t, 1000000> seeds;
    // How many seeds have been tested after knowing samples.
    int seedsIdx = 0;

    // Called when project ids or sample ids are known.
    void init()
    {
        for (int i = 0; i < 3; i++)
            projectLookup[projects[i]] = 1;
        if (!hasSamples) return;
        for (int i = 0; i < 6; i++)
            sampleLookup[samples[i]] = 1;
    }

    // This method is templated for performance reasons. This allows to eliminate
    // the branches on the bound, and lets the compiler statically optimize the
    // remainder operation.
    // J = bound of the random number, J-1 = index to swap at the end
    template <int J>
    __attribute__((always_inline)) bool testProjectSwap(int64_t& seed, int8_t* curProjects)
    {
        // Advance the seed according to Java implementation.
        seed = (seed * 0x5DEECE66DLL + 0xBLL) & ((1LL << 48) - 1);
        int index = (int)(seed >> (48 - 31));

        // Java speeds up the remainder when the bound is a power of 2.
        if ((J & (J - 1)) == 0)
            index = (int)((J * (int64_t)index) >> 31);
        else
            index %= J;

        if (J > 3)
        {
            // This is the unknown part of the project array, so make sure
            // a known project id is not swapped into it.
            if (projectLookup[curProjects[index]]) return false;
        }
        else
        {
            // This is the known part of the project array, so make sure
            // the known project id is at the right index.
            if (curProjects[index] != projects[J - 1]) return false;
        }

        // Do the swap.
        swap(curProjects[index], curProjects[J - 1]);
        return true;
    }

    // This function checks if the given seed matches the known projects/samples.
    __attribute__((always_inline)) bool testSeed(int64_t currentSeed)
    {
        // The seed given to Java is first mangled in that way (see setSeed).
        int64_t seed = (currentSeed ^ 0x5DEECE66DLL) & ((1LL << 48) - 1);

        // Fast forward the seed by 87 steps to reach the project array shuffle.
        int64_t projSeed = (seed * 0x24025DA35345LL + 0x238FF343A821LL) & ((1LL << 48) - 1);

        // The first swap test of the project array is expanded manually here
        // (as done in testProjectSwap). This is to avoid initializing curProjects
        // for nothing if the seed ends up discarded on the first test.
        projSeed = (projSeed * 0x5DEECE66DLL + 0xBLL) & ((1LL << 48) - 1);
        int result = (int)(((int64_t)projSeed) >> (48 - 31));
        result %= 10;
        if (projectLookup[result]) return false;

        int8_t curProjects[10] = { 0,1,2,3,4,5,6,7,8,9 };
        swap(curProjects[result], curProjects[10 - 1]);

        // Perform the other swap tests, loop is unrolled to benefit from the
        // optimizations mentioned above in the template.
        if (//!testProjectSwap<10>(projSeed, curProjects) ||
            !testProjectSwap<9>(projSeed, curProjects) ||
            !testProjectSwap<8>(projSeed, curProjects) ||
            !testProjectSwap<7>(projSeed, curProjects) ||
            !testProjectSwap<6>(projSeed, curProjects) ||
            !testProjectSwap<5>(projSeed, curProjects) ||
            !testProjectSwap<4>(projSeed, curProjects) ||
            !testProjectSwap<3>(projSeed, curProjects) ||
            !testProjectSwap<2>(projSeed, curProjects)) return false;

        // Without known samples that's all we can test so far, store the seed to
        // test for when they are known.
        if (!hasSamples)
        {
            seeds.push_back(currentSeed);
            return false;
        }

        // Do the swap tests for the sample array. Performance is not nearly as
        // much of a concern since it rarely makes it past the project swap test.
        // This is virtually identical to testProjectSwap but with samples instead.
        int8_t curSamples[40] = { 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 };
        for (int j = 40; j > 1; j--)
        {
            seed = (seed * 0x5DEECE66DLL + 0xBLL) & ((1LL << 48) - 1);
            int index = (int)(seed >> (48 - 31));
            if ((j & (j - 1)) == 0)
                index = (int)((j * (int64_t)index) >> 31);
            else
                index %= j;
            if (j > 6)
            {
                if (sampleLookup[curSamples[index]]) return false;
            }
            else
            {
                if (curSamples[index] != samples[j - 1]) return false;
            }
            swap(curSamples[index], curSamples[j - 1]);
        }

        foundSeed = currentSeed;
        return true;
    }

    void go(const Timeout& timeout)
    {
        // Nothing to do if we found the seed or disabled.
        if (foundSeed != 0 || disabled) return;

        // If we have stored seeds, check if they are valid when we know samples.
        if (hasSamples)
        {
            // I should have put seedsIdx in a local variable as done below.
            // Also, only the samples need to be retested for, not the projects.
            // In practice though, this path is not used often enough to make
            // a real difference, so it was left unoptimized.
            while (seedsIdx < seeds.count)
            {
                // The checks are deliberately made that way to help the compiler
                // predict the branch correctly.
                if (!testSeed(seeds[seedsIdx++]))
                {
                    // Do not check the timeout too often.
                    if ((seedsIdx & 2047) != 0) continue;
                    if (!timeout.isDone()) continue;
                    return;
                }

                cerr << "EUREKA! " << foundSeed << endl;
                lastSeed = foundSeed;
                return;
            }
        }

        // Identical as above, but with an incremental seed instead.
        int32_t seed = lastSeed;
        while (true)
        {
            seed++;
            if (!testSeed(seed))
            {
                if ((seed & 2047) != 0) continue;
                if (!timeout.isDone()) continue;
                lastSeed = seed;
                return;
            }

            cerr << "EUREKA! " << foundSeed << endl;
            lastSeed = foundSeed;
            return;
        }
    }
};