Wondev Woman post-mortem

Introduction

I ranked 3rd on the Wondev Woman contest, out of 2300 participants. I also managed to keep a comfortable lead in the top 2 for half the competition, then top 3 for the last half, so quite happy with that! Cheers to T1024 and Agade for such a heated battle at the top throughout the whole event. The game was really fun, simple to learn yet hard to master, which makes for the best competitions.

Following feedback from my last post-mortem on Coders of the Carribean, where I described my workflow, methods and the whole journey behind, this one will be more focused on my final game strategy. I hope you enjoy the read!

What worked well

I am not 100% certain on what made my bot dominant for most of the competition. I used similar strategies to what others described, so maybe the devil is in the details. This section will be about all the things that proved to work out great in practice, as removing/changing any of them made my bot play noticeably worse.

Minimax

Big surprise here (not), a straightforward minimax search algorithm with alpha-beta pruning was the backbone of my bot’s decision making. A major obstacle to this kind of search is the ridiculous branching factor this game possesses, which can easily average to 128 on early-mid game. For this reason alone, I think the best algorithm to play this game would be Monte-Carlo tree search, which has proven to play Go at a professional level where minimax fails miserably. But given the tiny time limit (50 ms per move), I figured it would probably not be viable for this contest. Too bad, as I really wanted to try it.

Still, the branching factor remains an issue, as it will be almost impossible to see 4 moves in the future (2 per player) before the end-game when analyzing every move. But considering how easy it is to get trapped, it is essential to see this coming, to avoid it or make the opponent fall for it. I figured the only way to make this work would be by pruning very aggressively.

Taking inspiration from beam search, on every reached depth I am sorting the list of possible moves by my evaluation function, only keeping the best moves and discarding the others. With this, my bot was always able to reach depth 4, and often even depth 6. It could sometimes become blind to certain moves, but the extra depth made up for it, and the really good/bad moves were mostly sorted out by the evaluation function. I rarely saw an improvement when playing the same game with all moves allowed.

A transposition table also helped accelerating the search by eliminating redundancies in move sequences and giving a headstart to the higher depth search.

Enemy detection

As soon as I reached bronze and saw the fog of war, it became clear that, unlike CotC with the mines, knowing the enemy’s position would be essential, otherwise what are you going to run minimax against? Using a wrong position such as the last known one can have catastrophic results, as it turns the bot randomly blind to traps or proper territory control. Figuring it would be key to get this right before any other meaningful work could be done, I spent half of the first Sunday on getting it reliable enough.

After toying with a few ideas, I found one that worked reasonably well. My bot keeps the last known states it received as input, as well as the moves it played. Then, starting up to 20-25 turns ago, a depth-first search is used to try to reach the current state, using my played moves on my turn, and try every possible enemy position and move on his turn. I needed to write a bunch of rules to detect an impossible condition and fallback, which pruned out the vast majority of possibilities very quickly. For example, if an expected enemy position becomes visible, and the piece was not visible at that time, it quickly aborts and tries the next one.

Sometimes it would end with multiple possible positions. At first it just took the first one it found, but analyzing some losses that were directly caused by guessing badly, I then used my evaluation function on each possibility to keep the one with lowest score. That made the bot play better afterwards by anticipating the worst.

In the end there were still a few rare unhandled cases, like pushing a unit into an unseen one, which caused funny games where my bot would spend all its time ramming into an enemy that was defeated. Statistically speaking though, after parsing hundreds of replays, it did not seem to happen often enough to cause issues with my win rate.

Voronoi territory

Again, not a big surprise here, this game has similarities to Tron in which territory control is everything, so the most effective heuristic for this game is to compute a Voronoi territory (technically inaccurate, but for lack of a better term). This is done by using a breadth-first search using the initial unit positions to calculate the cells that can be reached before your opponent.

I have added one twist to it that turned out a big improvement. The problem with this technique is you need to give one side the advantage, which turns out to be an issue in practice. If two units are separated by an empty cell, one or the other will end up controlling it. But because of the game dynamics with pushing, it is uncertain if you really control those contested areas. For instance, I noticed on multiple games the bot thought the enemy was cornered, but where he ended up slipping away because it could not be pushed away.

I ended up fixing this with what I call a “double voronoi”. So basically, I run one version where I have the advantage, and another where the opponent has it, and sum the results together. This made sure to further limit the zones the enemy could get to, and maintain better board control.

Height analysis

Given the whole game is played around building on cells, it is straightforward to consider as a heuristic the height of the cell a unit stands on, as well as the neighboring cells. From that statement it would also be natural, like others did, to simply consider the average height of the surrounding cells. For some reason though, I did not even think of doing that until after the contest… so maybe it was one good factor I missed.

Instead, I went with classifying each neighboring cell in different categories:

hole: cell with height -1 or 4
wall: cell at least two steps higher
drop: cell at least two steps lower
stairs: cell exactly one step higher
floor: cell at the same height
goal: cell with height 3, aka a scoring opportunity.

Each of those categories had a score tied to it, and part of the evaluation is summing scores of all neighboring cells. Holes and walls had a big penalty, drops and stairs a smaller one, while floors and goals gave a positive score. The overall reasoning was to setup or avoid traps, favor flattening my own area and keep control of score tiles, while doing the opposite for the opponent.

I spent a lot of time tweaking those values, as they had a dramatic effect on the bot’s playstyle. It was also very hard to figure out what worked as a better strategy overall. Some styles were great against specific opponents but terrible against others, and so on.

The middle cell also added a linear score based on its height, the higher the better.

What didn’t work

I am not sure whether to be impressed or disappointed by how little made it to my final version… here’s some of what ended up on the cutting room floor, and other wastes of time.

Territory height analysis: I tried applying a similar heuristic to neighboring cells to analyze the whole territory calculated by the Voronoi algorithm mentioned earlier, using absolute and relative heights. Either I did not find the right combination or it was a dead-end from the start, but ultimately it was a fruitless effort.
Distributed voronoi: I calculated territories using pairs of units on top of the global territory, to favor a unit taking more responsibility into controlling an enemy one, freeing up the other one for other tasks. This showed a lot of promise in the games I watched, but batch testing revealed a noticeable drop in win rate, and I had no time left…
Guerrilla warfare: Probably not the most accurate term but that’s what I called it. The idea was to reward stealth by remaining out of sight of the enemy. If my unit was visible before a move, but became invisible after a move, it was rewarded. The opposite was also penalized. This caused major havoc, sometimes completely fooling the opponent, and sometimes only fooling itself. Tuning it properly while staying consistent across different players proved too much for me, so I had to discard it in the end, but I loved that one.
Absolute distances: I tried favoring staying close to the enemy or close to the center, but the effects were once again really unpredictable, sometimes better, but more often worse.
Quiescence search: This is an idea borrowed from chess AI, which consists of searching beyond the limited depth with only capture moves, in order to foresee bad exchanges, such as QxP followed by KxQ. I did the same here, replacing captures with pushes, in order to foresee bad positioning if it could be pushed in a bad spot. Maybe I had bugs, or maybe it was just a bad idea.
Symmetric guessing: A rough analysis showed around 20% of the games were played with a parameter symmetric=true which gave symmetric start positions. I figured this could be used to get a headstart on the enemy’s expected position when it was not known. In practice it did not seem to help that much on symmetric games, and sometimes being worse by guessing wrong when the game was not symmetric. In the end I prefered focusing on other stuff, so this went on the junk pile.
Batch testing: I developed my own batching tool (inspired from CG Spunk) to repetitively play games against opponents and compare results. While it did weed out terrible ideas quickly, I had a really hard time seeing a clear contrast for more subtle adjustments. The same run could end with a 70% or 40% win ratio against a good opponent, something I rarely saw in other competitions. Even worse, the same game played against the same opponent could give completely different results. My bot lost games in the arena that it would never lose in the IDE. Some bots would time out right at the start 20% of the time. This proved to be a nightmare for testing. To say it didn’t work at all would be an exaggeration, but not without having to filter out a great amount of noise and losing a lot of time for it.
Arena testing: To make it worse for me, tweaking parameters in a local arena once again rarely found something that translated well in the arena. The only game I had undeniable success with this technique is Fantastic Bits, which makes me suspect either something is wrong with my methodology, or local arena testing is not all that useful in some situations…
Matchmaking: This one is beyond my control and a bit off topic, but it’s important to mention. I think CG’s matchmaking system is broken when a large point distribution happens in the top 10. It was an issue during the Fantastic Bits contest, and it is still the case here. There is no reason to waste 80% of matches in a submit to players 8+ points below, and barely have 10% played against the only other opponent that can give meaningful results. Some say it’s because of the low number of games, but I believe the selection algorithm also needs to be addressed. It is really frustrating to have to wait until legend league opens to have good quality submits (and sometimes, even then…).

Conclusion

Thanks for reading! If you have any questions or comments, feel free to poke me on the CG chat or leave me a private message on the forum. Feedback on the post-mortem content is also welcome.

– reCurse signing off with… a Kettlebell?! :p