TCO22 Marathon Finals
The TCO22 Marathon Finals was a virtual event so the rules were a little different than usual. It was a 24 hour competition, which minimised the effect of local timezones. Competitors were monitored throughout the competition by a dedicated team of Topcoder staff. They were asked to share their webcam and their screen. They were allowed to use internet resources, but of course no external help was allowed.
There were 12 competitors from 8 different countries. In brackets we show how many MM finals each competitor had prior to this competition:
Brazil: [h color="red"]wleite[/h] (12)
France: [h color="red"]sullyper[/h] (3)
Georgia: [h color="red"]nika[/h] (6)
Germany: [h color="red"]blackmath[/h] (4), [h color="red"]eulerscheZahl[/h] (0)
Hungary: [h color="yellow"]vdave[/h] (2)
Japan: [h color="red"]bowwowforeach[/h] (0), [h color="yellow"]iehn[/h] (2), [h color="red"]simanman[/h] (1)
Poland: [h color="red"]Psyho[/h] (12), [h color="red"]Stonefeang[/h] (0)
Romania: [h color="red"]mugurelionut[/h] (5)
Only two of these competitors have won the TCO MM final before: [h color="red"]Psyho[/h] with an unmatched 6 wins and [h color="yellow"]iehn[/h] who won it once in 2020.
The Problem
This year’s problem was about a game similar to Tetris. You are given an NxN grid that is partially filled with C types of fruit. You are also given T 3x3 tiles. Each tile is partially filled with a single type of fruit. Each turn, you can select a tile and either drop it (remove it from your hand) or place it somewhere on the grid such that no fruit overlap. After that you will receive a new tile. We also have the concept of lines - a line is a row or a column that is completely filled with fruit (of any colour). If your newly placed tile forms one or more lines than those lines will be cleared and you will receive points. In particular, you will receive k*m*m points for each line, where k is the total number of lines cleared on this turn and m is the maximum number of fruits of the same type in this line. This game continues for 1000 turns and your task is to collect as many points as possible. Here is a little animation of the game:
This problem had a wide range of parameters, which means that solutions had to be individually tuned for each setup. The grid size N varied from 6x6 to 20x20. The number of tiles T varied from 1 to 5. The number of fruit types varied from 1 to 5.
In general we want to maximise the number of lines that we clear. Furthermore we want to clear multiple lines in one turn as that increases the k parameter used in the scoring formula. Finally, we want to try clearing lines that are mostly composed of the same fruit type.
The Contest
The contest was full of drama and excitement. [h color="red"]wleite[/h] made the first submission, just 26 minutes after the contest launch. He was closely followed by [h color="red"]Stonefeang[/h] who took the early lead. Seven people submitted in the first 3 hours with [h color="yellow"]iehn[/h] and [h color="red"]wleite[/h] in the lead:
Meanwhile [h color="red"]Psyho[/h] slept for the first 4 hours. When he joined, he had major problems with the zoom app. It was really slowing down his computer and not showing his webcam video. Unfortunately it took 1.5 hours to fix these issues. After 5.5 hours we had 9 people submit with [h color="red"]iehn[/h] and [h color="red"]wleite[/h] still in the lead:
[h color="red"]Psyho[/h] and [h color="red"]eulerscheZahl[/h] started submitting and quickly worked their way to the top. After 10 hours we had every competitor make a submission with 4 joint leaders: [h color="red"]wleite[/h], [h color="red"]eulerscheZahl[/h], [h color="yellow"]iehn[/h] and [h color="red"]simanman[/h]:
Meanwhile [h color="red"]mugurelionut[/h] was having problems with his submissions. His code size was not compiling on the server as it contained a large amount of precomputed data. After some time, he managed to resolve these problems. At the halfway point of the competition [h color="red"]Psyho[/h] joined the leading pack of 5:
In the next 2.5 hours, [h color="red"]Psyho[/h] had a breakthrough idea and moved away from the pack:
[h color="red"]Psyho[/h] kept optimising the parameters of his solution and extended his lead. Unfortunately [h color="yellow"]iehn[/h] had to withdraw from the contest due to tiredness and a big headache. Some contestants went to sleep. Here is the leaderboard with 4.5 hours to go until the end:
All of a sudden [h color="red"]blackmath[/h] came out of nowhere and took the lead! Here is the moment when it happened:
It was now a two horse race for the top position between [h color="red"]blackmath[/h] and [h color="red"]Psyho[/h]. Both of them were frantically testing their solutions and tuning their evaluation functions one parameter at a time. The third place was a battle between [h color="red"]Stonefeang[/h], [h color="red"]simanman[/h] and [h color="red"]sullyper[/h]. After 24 hours and 201 submissions we are done! The contestants can finally move away from the keyboards and take a well-earned rest. As the dust settled, it was [h color="red"]blackmath[/h] in the lead, followed by [h color="red"]Psyho[/h], followed closely by [h color="red"]simanman[/h] and [h color="red"]sullyper[/h]. The final provisional leaderboard looked like so:
Let the system tests begin. Good luck everyone!
The Approaches
A good Marathon Match problem is not only one that is easy to understand and attractive to solve, but it is also the one that allows a broad variety of approaches. While the basic ideas of those approaches may be somewhat trivial, it is difficult to find the optimal combination of those approaches for all the potential corner-cases of the problem’s parametric space.
Here is the list of the basic ideas one could think of from the start:
S1. Greedy Fill - place the tile whenever you can
This approach is to focus on placing as many tiles on the board as one can, regardless of the fruittype of the tile, skipping as few tiles as possible. The goal is to maximise the number of rows/columns cleared, hoping that a large number of clearing events allows to compensate for lower scores per event due to mixed fruit composition.
While this approach is attractive to start, it does not produce high scores if implemented in its pure form, as a scoring return for mixed lines is much lower than for the clean (non-mixed) lines. Below is a computed statistical table for the average score per cleared line, composed of random multinomial-distributed fruit.
Number of fruit (C = 1..5) vs Size of the Board (N = 6..20)
Thus for the N=6 board and C=2 fruits, a “greedy” approach would return only 45% of the score per line on average, making it beneficial to drop half of the tiles with one of the fruits entirely, and go for clean lines with just one selected fruit - receiving a 100% score for a half number of lines instead. In fact, for all the board sizes N and all numbers C of the fruit, it is better to pick just one fruit type and go with it, skipping all the tiles with other fruit, than to mix up the fruit within a simple greedy strategy without any modifications, as each of the numbers with C>1 in the table above is lower than 1/C.
S2. Perfectionist - make as clean lines as you can
The ultimate version of the above strategy is just to go with one fruit type which, as it is shown above, is beneficial as compared to a simple greedy strategy. The disadvantage of this approach is that it is hard to improve it any further.
A more promising approach is to designate parts of the board to fruit of certain types, collecting clean lines on a few selected fruit and getting some extra points on a mixed “dumpster” of the remaining ones. One could clearly distinguish implementations of such strategy in the visualizer - due to striped layouts of the fruit:
One of the observed solutions (N=20, C=5): A two-striped strategy for a test-case with 5 fruit types.
An integrated version of this strategy with the strategy S1 - is to maximise the prevalence of one of the fruits in the rows/columns around the tile-placement.
S3. Tactics Champ - make the maximum with the tiles on hand
Instead of optimising the position of each tile individually - take into account all the possible combinations of the tiles on hand, picking the most promising outcome of them all. Also, like in poker, some promising, but risky-to-place combinations of the tiles could be kept on hand in the hope that a newly-added tile would clarify their placement on the board.
S4. Future Teller - predict the tiles
Some gaps in nearly-filled lines are easier to fill than the others, as they allow for more realisations of the tiles. Thus, one can become pickier with the types of fruit for the gaps that are easier to fill, and also strategize to leave such gaps for the rows that are intended to be clean.
S5. House Hoarder - make use of the initial fruit on the board
While the parameters of the test case may suggest an optimal long-term strategy to use, this strategy may be suboptimal for the starting placement of fruit on the board. Some starting steps, targeting early easy gains or optimal clean-ups may earn extra points on a test case.
S6. Gamble Master - make use of random flukes
Sometimes an unforeseen coincidence on the board allows for tactical gain outside of the main long-term strategy, or poses some blockers, making a direct application of the main strategy suboptimal. Picking up on those gains and finding workarounds on blockers - may add some points needed. This could be seen as a generalised and simplified version of the S5 strategy.
Summary of the strategies.
While we don’t know in advance how much score and where each of the strategies gains, we can still foresee what corner-cases could be more important for each of the strategies.
The strategy S1 should be more effective on small boards with too many fruits (N=6 and C=5), less tiles on hand (T=1), and higher fruit density per tile (P=0.5).
The strategy S2 should be more effective on large boards (N=20), with more tiles on hand (T=5), less fruit density per tile (P=0.2), and lower starting filling with fruit (F=0.0).
The strategy S3 should add the most on small boards (N=6) with many tiles on hand (T=5).
The strategy S4 is a good addition to the strategy S2, especially on smaller boards, and more filled tiles, allowing for better hits on clean rows for test cases with intermediate parameters.
The strategy S5 is relevant on high-filled (F=0.3) large boards (N=20) with low-filled tiles (P=0.2) and many fruits (C=5), as the initial setup of the board in this case may have consequences, lasting over the preset 1000 turns of the game.
The strategy S6, on contrary to S5, could be a good addition to the strategy S1 on small boards (N=6).
What is the proper mix of these strategies for a test case with intermediate parameters? Well, this is exactly the secret sauce needed for the TCO22 Champion cup!
The Final Testing Analysis
As it was discussed during the TCO broadcast, the current marathon problem is characterised by an abundance of parametric corner cases together with a large variety of potential strategies. All together, this made it hard for the competitors to come up with a single solution that would dominate the opponent’s solutions on all or the majority of the cases. Instead, nearly every of the submitted solutions had a set of test cases where it was beating the others, which made the contest very competitive, and thus exciting. Consequently, 50 test cases, offered for provisional testing, was not enough to cover the parametric space in full and at balance, thus suggesting that there might be changes in the results at the system testing.
The system test consisted of 5000 test-cases, uniformly distributed across the problem’s parametric space:
Board size N, Tiles at hand T, Number of fruit types C, Starting board density F, and tile density P.
The latter two are bucketed into 5 buckets for convenience.
After the system tests, some expected shuffling of the leaderboard happened. Which on one hand was fairly small on the absolute scale, and was mostly impacting the solutions with more unstable performance across the cases. On the other hand, it turned out to be critical for the most important places at the top. Here is the final leaderboard:
The top-provisional submissions N1 and N3 have lost a few scoring points, while the provisional submissions N4 and N6 have gained a few, dramatically changing the prize distribution. Small shuffling of the placements also happened at the bottom of the leaderboard, where the provisionally bottom submission gained a couple of points and moved up a few places.
While usually the competitors can observe only total results, averaged across all the tests, we have an opportunity to dive deep into the test cases. There are several extra insights one can draw from the scores per test:
Corner Cases. We can point out the corner-cases - areas in parametric space which may benefit from rather unique strategies, and thus require special attention and approach. A reliable sign of such a corner case is when we see an unusual scoring difference/ordering of the solutions - when the scores are determined not by the average strength of the solution, but rather by some of the contestants finding the working idea for the corner case, and some others - missing it.
Important Problem Parameters. While in advance it was hard to foresee which of the five parameters of the problem parametric space would pose the highest challenges to contestants - the score per test may provide us with such understanding. The more relative solution scores are changing across the range of the parameter - the more impact on the strategies it carries.
Approach insights. For each solution we can find the areas with relatively strong and relatively weak performance, which can give us a good sense of the solution's overall strategy - which test-cases it is the most tuned to, versus which ones it left underexplored.
Wins and Failures. We can also check the extreme cases of good and bad solution performances. A pattern among the former ones could be a sign of some working unique idea, missed by others, while the latter ones may be a sign of some bug-type imperfections, easy gains for the score, missed by the authors.
1. Size of the Board.
Score (in the % to max), averaged per board size.
Clearly, the large boards (N ≥ 14) are not only very different from the small boards (N ≤ 9), but are also highly distinguishable between the top two places on the leaderboard. While the solution from [h color="red"]blackmath[/h] is a clear winner for the large boards, beating everyone else at N = 20 by at least 18 points, it is clearly not the best for the small boards! In fact, it is so far behind that at N = 6 it succeeds to the solution from [h color="red"]Psyho[/h] by 29 points, and to the solution from [h color="red"]Stonefeang[/h] by 32 points. The solution from [h color="red"]Psyho[/h] leads on the middle-sized boards (8 ≤ N ≤ 13).
2. Number of Tiles on hand.
The picture behind this parameter turns out to be even more rich. While [h color="red"]Psyho[/h] is leading for the full deck of tiles (T = 5), his solution is not the strongest for the smaller decks. [h color="red"]Blackmath[/h]’s solution is absolutely dominating at T = 2, and beating [h color="red"]Psyho[/h] with T = 3 and 4. What is also apparent is that the case of T = 1 is clearly the trickiest in the entire problem space. With the winners being [h color="red"]Stonefeang[/h], [h color="yellow"]iehn[/h] and [h color="red"]wleite[/h]. [h color="red"]Psyho[/h] takes only 4th place, and [h color="red"]blackmath[/h] dramatically falls to the last place, missing a whooping 24 points relative to the top performers. The huge miss on this case arguably cost him the crown.
3. Number of Fruit types.
Here comes a surprise. While the test cases with multiple different fruits were expected to be very different from the test cases with just a single fruit… apparently, not as much. For the majority of the competitors, this parameter did not impact the score significantly, meaning that they have found approaches working equally well for the entire range of this parameter. The scores of the top-2 places are barely distinguishable. 3-rd place [h color="red"]sullyper[/h] and 5-th place [h color="red"]simanman[/h] solved really well the single-fruit problem, but have missed something on the multi-fruit end. 4-th place [h color="red"]Stonefeang[/h] has somehow managed to do the opposite, finding a good idea specifically for the multi-fruit cases.
4. Initial Density of Fruit.
While for most of the competitors, the initial fruit setup was not interfering with their strategies, this is not the case for the top 2 places. [h color="red"]Blackmath[/h]'s solution works well on empty boards, while [h color="red"]Psyho[/h]’s solution takes good advantage of the initial fruit distribution.
5. Density of Fruit on a Tile.
Yet, another parameter that highly distinguishes the solutions. [h color="red"]Psyho[/h] and [h color="red"]Stonefeang[/h] are dealing well with the tiles full of fruit, while [h color="red"]blackmath[/h] and [h color="red"]sullyper[/h] are dominating on the nearly-empty ones.
6. Best vs Worst Test Cases
The parameters of the top-5 best (with the scoring advantage over the closest competitor).
While there is a substantial element of randomness in which cases become outliers, yet, there is also a pattern there. [h color="red"]Psyho[/h]’s best successes are with medium-sized boards (N = 11-14) and full deck (T = 5) of well-stacked (P ≅ 0.5) tiles. While [h color="red"]blackmath[/h] has found some of the large-board cases where everyone else has near-failed (leading by 60 points over the next competitor).
The parameters of the top-5 worst (with the names of the test-winners) performance cases.
All [h color="red"]Psyho[/h]’s biggest upsets were on the largest boards with less tiles on hand (T = 2-3), more fruit types, and surprisingly withwell-stacked tiles (P 0.4). Somehow, his great strategy for medium-sized boards was reaching its limit. There is a clear pattern among [h color="red"]blackmath[/h]’s biggest upsets - something does not work in his solution when he has only one tile on hand (T = 1) and multiple fruit types (C = 4-5). Especially on large boards, which otherwise are his favourite.
Also, we can notice some cases with a zero score for [h color="red"]sullyper[/h] and [h color="red"]Stonefeang[/h], suggesting that there are cases when their solutions get stuck and don’t know what to do.
7. Psyho versus blackmath
It was hard to foresee that the fate of the first prize would be determined between two such opposite solutions as these two:
[h color="red"]Psyho[/h] is leading on small/medium boards, [h color="red"]blackmath[/h] - on large ones.
[h color="red"]Psyho[/h] is slightly better with 5 tiles on hand, but [h color="red"]blackmath[/h] is failing the 1-tile case.
[h color="red"]Psyho[/h] is slightly better on full boards, while [h color="red"]blackmath[/h] is dominating on empty ones.
[h color="red"]Psyho[/h] is much better with full tiles, while [h color="red"]blackmath[/h] is better with empty ones.
Here are the 20 biggest wins and upsets in this rivalry:
While all the biggest wins of [h color="red"]blackmath[/h] fall under the same pattern - large boards with 2-3 tiles on hand, the biggest wins of [h color="red"]Psyho[/h] roughly fall into two groups:
Large boards with 1 tile on hand and small boards with 5 tiles on hands. While the former ones are clearly a mis-performance of [h color="red"]blackmath[/h]’s solution, the latter ones - outline where [h color="red"]Psyho[/h]’s solution strengths really are. It is probably really good on tactics - efficiently combining the strategies S3 to S6!
Congratulations to our TCO 2022 MM Champion Psyho!
Thank you to everyone who participated, and see you at TCO23.