Match Editorial

SRM 361
Wednesday, August 1, 2007

Match summary

This match, with the money prizes sponsored by TopCoder itself, attracted 1109 coders, including 5 targets. They faced a tricky set of problems in both divisions. Unfortunately, due to an error in one of the problems, the results from Division 2 had to be removed from the statistics.

In Division 1, the submissions started to pop up very quickly with many competitors solving the 250 problem in less than 5 minutes. Soon after that, the solutions for the 500 started to come up, but surprisingly most of them were from blue and yellow coders, with the higher rated coders taking their time. That's often a sign that there is some non-obvious trickiness in the problem, and this time was no exception. Most of the quick submisions were later brought down in the challenge phase and system tests.

With around 30 minutes before the end of the coding phase, Petr submitted his third problem, and seemed to be in a comfortable situation to win the match. But very soon after his initial submission he spotted a small bug and was forced to resubmit, which made him lose the lead. After the eventful challenge phase and the system tests, there were three targets ACRush, Petr, and SnapDragon in the top four, but it was zhuzeyuan who claimed the win, and came very close to regaining his target status as well. Congratulations!

Even though it won't show up in their profiles, the top performers in Division 2 did a great job on this difficult set, and I'm confident it won't take them long to move up to Division 1, this time for good.

The Problems

SearchBox

Used as: Division Two - Level One:

Value 250

Submission Rate 491 / 541 (90.76%)

Success Rate 122 / 491 (24.85%)

High Score tstan436 for 245.52 points (3 mins 51 secs)

Average Score 183.32 (for 122 correct submissions)

String matching is a common, and well known problem. Many more or less sophisticated algorithms exist. But here, with a text no longer than 50 characters, the simplest naive algorithm is more than sufficient. Let's forget for a moment about the "Whole Word" part and look how the search alone can be implemented.

return text.indexOf(search, start);

Luckily Java already has a method that does exactly what we need; the String.indexOf(String, int) method will find the first occurrence of search inside text starting from the index start, and even return -1 if the match is not found.

Of course one can achieve the same with just two loops and comparing character after character, but I would recommend using the built-in functions of your language whenever it's possible for a few reasons:

You use code that is well tested, and very unlikely to contain bugs.
It's easy to tell at the first glance what a line of code does. It's very important especially when the code doesn't work as we expected. You don't want to spend precious seconds wondering if there really should be "<" or "<=", or randomly trying to add or subtract 1 somewhere to make it work.
The code gets shorter and faster to type. I would say that's a secondary advantage and writing correct code is what really saves time -- still, in TopCoder matches every second has value.

This only applies when you know, and are comfortable with, the functions you want to use. If you are not exactly sure what indexOf returns, it's probably faster just to use the two loops. Afterwards you can test it in the practice room and make sure you remember it the next time.

Going back to the original problem, we still need to take care of the "Whole Word" option. We don't want to destroy our simple and clean code by adding dozens of if-s and hacks to handle corner cases, so we will use a simple trick instead.

1 public int find(String text, String search, String wholeWord, int start) {
2         if (wholeWord.equals("Y")) {
3             search = " " + search + " ";
4             text = " " + text + " ";
5         }
6         returntext.indexOf(search, start);//the general search function discussed above
7 }

If wholeWord is "N" we don't need to change anything, the indexOf function will do the work. If it's "Y" we surround the search by spaces, to make sure it won't match a word that is a substring of another word. Additionally we add a space to the beginning and to the end of the text, to make the words touching the edges surrounded by spaces too. The one question that might not be obvious here is: how does adding these space affects the starting and resulting indices? Luckily it doesn't change a thing, by adding a space to the beginning of the text, we make the starting index point one character before the original, and that's good because now we want to match a leading space too. In the same way the returned index is the index of a space before the matched word, which is fine, as the text has one character added at the beginning.

With the very low success rate on this problem, I guess the lesson is that, even if a problem looks easy, one should always stop and think before rushing into coding, and try to come up with a way to make the implementation as simple as possible.

WhiteHats

Used as: Division Two - Level Two:

Value 500

Submission Rate 365 / 541 (67.47%)

Success Rate 155 / 365 (42.47%)

High Score rogerbdly for 470.65 points (7 mins 11 secs)

Average Score 339.86 (for 155 correct submissions)

Used as: Division One - Level One:

Value 250

Submission Rate 481 / 493 (97.57%)

Success Rate 384 / 481 (79.83%)

High Score kia for 247.98 points (2 mins 34 secs)

Average Score 207.72 (for 384 correct submissions)

Let's say there is a total N people, out of which W wear white hat. In that case, what should the count array look like? The answer is:

Exactly W elements are equal to W-1.
Exactly N-W elements are equal to W.

The first point corresponds to the people in white hats: they see all the white hats, except their own. The rest are the people in black hats and they can see all the W white hats.

All that's left to do is check if the given array has the property described above, for some value of W. One of the ways to do that is to count the number of occurrences of each element in count, and then loop through all the possible values for W. The sample code follows.

 1 public int whiteNumber(int count[]){
 2     int N = count.length;
 3     int numbers[]= new int[51]; //elements in count are between 0 and 50, inclusive.
 4     for(int x:count) numbers[x]++;
 5     if(numbers[0] == N) return 0; //All hats are black.
 6     for(int W=1;W<numbers.length;W++){
 7         if( numbers[W-1] == W && numbers[W] == N-W ) return W;
 8     }
 9     return -1;  //No W was found.
10 }

You can see that the condition inside the second loop is a direct translation of the two points stated before. The corner case of W=0 is handled separately to avoid accessing numbers[-1]. As a side note, we can avoid the second loop altogether by noticing that W must be the maximal element in count, so that's the only value we really need to check.

Election

Used as: Division Two - Level Three:

Value 1000

Submission Rate 32 / 541 (5.91%)

Success Rate 0 / 32 (0.00%)

High Score sejert for 0.00 points (47 mins 22 secs)

Average Score No correct submissions

This is the problem that caused the division 2 rating changes to be reverted. It has an incorrect greedy solution that is so appealing that myself, three problem testers, and 15 coders during the match coded it in the same, flawed way. Only after the match was it pointed out in the forum that the way it distributes the votes is not always optimal.

As of this writing, I still don't know how to write a correct solution, so I would love to hear from someone who managed that. If anyone is interested in the reference solution used during the match, here is the analysis I wrote before I knew that it didn't work. Treat finding a flaw in the logic as an exercise.

Warning: The solution below is not correct and left over only for educational purpose

Let's define an array finalOrder that will contain the final candidates ranking, according to our wishes. Since it must match the wishList, for every element k where p = wishList[k] is not -1, finalOrder[p] = k. Now, what about the candidates where wishList is equal to -1? We don't care where they end up, but since we are looking for the solution with the minimal number of additional votes, it's never optimal to change the relative order which they are already in. Take a look at the following example.

candidate name A B C D E

current position 3 1 0 4 2

wished position ? 0 ? 3 ?

final position 4 0 1 3 2

The initial ordering is (C, B, E, A, D). Now we want to have the candidate B on the place 0, and the candidate D on the place 3, which makes the ordering look like (B, _, _, D, _). Since there is no point in changing the order of candidates (C,E,A), we put them in the same order on the free spots, which finally gives (B, C, E, D, A). The pseudocode generating the finalOrder can look like this:

 1 int[] generateFinalOrder( int[] currentOrder, int[] wishList) {
 2     int[] finalOrder={-1};
 3     boolean[] used = {false}; //array used to mark the candidates that were already placed on a position.
 4     for(i=0; i<wishList.length; i++) if(wishList[i] != -1) {  //first filling the elements with fixed positions.
 5         finalOrder[wishList[i]] = i;
 6         used[i] = true;
 7     }
 8     for(i=0; i<finalOder.length; i++) { //now filling all the empty spots
 9         if( finalOrder[i] != -1 ) continue; //alredy filled
10         for(j=0; j<currentOrder.length; j++){
11             candidate = currentOrder[j];
12             if( !used[candidate] ) { //the highest ranked candidate that wasn't placed so far.
13                 finalOrder[i] = candidate;
14                 used[candidate] = true;
15                 break; // breaking the j loop.
16             }
17         }
18     }
19     return finalOrder;
20 }

The currentOrder here holds the indices of the candidates, sorted by decreasing number of votes (and a tie-breaker as in the problem statement). What this function does is first place the candidates according to the wishList and than fill the empty spots with the remaining candidates, without breaking the ordering from the currentOrder. It can be also implemented with a lesser complexity, but with no more than 50 candidates it doesn't really matter.

Now that we have determined the final ordering, computing the minimum required number of additional votes becomes quite simple. Just look at the candidates starting from the end. Obviously we don't need to add any votes to the candidate who's meant to end the last. For the second to last we need only as many votes as are needed to overcome the last one -- and so on, for all the others, every time we add the minimal number of votes needed to overcome the previous candidate (keeping the tie-breaker rule in mind). The total number of votes we added is the final answer. The code follows.

 1 public int votesNeeded(int[] votes, int[] wishList) {
 2     int result = 0;
 3     int[] currentOrder = ... //Just a sorting, so the code ommited.
 4     int[] finalOrder = generateFinalOrder( currentOrder, wishList); //the function described above.
 5     for(i = finalOrder.length - 2; i >= 0; i--){  //starting from the second to last and moving backwords
 6         int a = finalOrder[i];
 7         int prev = finalOrder[i+1];
 8         if( votes[a] < votes[prev] ){
 9             result += votes[prev] - votes[a]; //adding votes to make a and prev have the same number;
10             votes[a] = votes[prev];
11         }
12         if( votes[a] == votes[prev] && a > prev ){ //if needed adding one more vote to satissfy the tie-breaker.
13             result ++;
14             votes[a]++;
15         }
16     }
17     return result;
18 }

ReverseDistance

Used as: Division One - Level Two:

Value 500

Submission Rate 246 / 493 (49.90%)

Success Rate 38 / 246 (15.45%)

High Score Petr for 443.28 points (10 mins 26 secs)

Average Score 239.37 (for 38 correct submissions)

This problem had a short and simple statement, and as often happens a solution that was short, simple... and incorrect. Many coders went for a solution that tries every number, computes its reverse, and checks the difference between them. While this is obviously correct, the running time is what causes the problem. The key observation is that even though the input is at most 1000000, the answer can be as big as twelve digits. That makes it impossible to simply try all the possibilities.

How not to get trapped by a problem like this? There is no definite answer but there are few hints that could help you avoid it.

It is Div 1 medium. It's a vague argument, as problem difficulty is a highly subjective matter. But still most of the time D1 medium requires more than a simple loop and reversing numbers.
Take a quick peek at the division summary. As I already mentioned in the introduction, if there are a lot of submissions from lower rated coders but many reds, including targets, are still working on it, it's a hint that there is more to the problem than meets the eye.
Return value is a String. Again, it's very vague, but it might suggest that the writer is mean enough to hide the actual result range.
And the most important hint is left to the end. Read all the examples at least once. There was a case with the input "900" and the output "101001", so the output has twice as many digits as the input. If you could extrapolate it to "900000" and "100001000001" it gives you a clear answer that brute force won't work, and an excellent challenge case as a bonus!

The "divide in two" solution

Try and write your equation in a manner:

   a b c d e f g h
 - h g f e d c b a
 ------------------
   i j k l m n o p

Where each letter represents a single digit (imagine that numbers are aligned with leading zeros to have the same length). The same thing can be written as:

   i j k l m n o p  
 + h g f e d c b a
 ------------------
   a b c d e f g h

We know the ijklmnop part, as that's what we are given. Now imagine you know the dcba. From the equation above you can see the mnop + dcba = efgh -- well, not exactly but the last four digits of mnop + dcba are efgh. So if we already know mnop and we assumed that we know dcba, we can easily compute the efgh. And now, when we have efgh we can reconstruct the whole abcdefgh!

The remaining question is: where to take the first half of the number from? Just brute force it. Since it's only half of the digits, the number of options is small enough to check them all. You still need to code it carefully, to handle cases like odd number of digits and tom make sure it runs fast enough. The fairly simple, working Java solution follows:

 1 public class ReverseDistance {
 2    final long rev(long x){
 3        long r =0;
 4        while(x>0){
 5            r=r*10+x%10;
 6            x/=10;
 7        }
 8        return r;
 9    }
10 
11    int len(long x){
12        int r=0;
13        while(true){
14            r++;
15            x/=10;
16            if(x==0)return r;
17        }
18    }
19 
20    public String find(int difference) {
21        for(long p=0;p<200000;p++){
22            long rp = rev(p);
23            long q = rp+difference;
24            int len = len(p);
25           for(int totalLen = 2*len-1;totalLen<=2*len;totalLen++){
26            String qs = ""+q;
27            while(qs.length()>totalLen-len)qs = qs.substring(1);
28            while(qs.length()<totalLen-len)qs = "0"+qs;
29            long y = Long.parseLong(p+qs);
30            if(y-rev(y)==difference)return ""+y;
31           }
32        }
33        return "NONE";
34    }
35 }

The greedy solution

When I thought about this problem for the first time I wanted to allow inputs as big as 2,000,000,000, so the solution described above won't work on it. I coded up a solution that is entirely based on greedily picking digits, starting from the outmost ones and moving towards the middle. It's not that elegant, as there were quite a few cases to handle to make it work, so I'll just sketch the idea rather than giving a full solution.

Assume that a number looks like azb where a and b are single digits and z is a sequence of digits. The reverse of it looks like bya, where y is the reverse of z. Furthermore, assume that azb has the same number of digits as bya (bya can have leading zeros), and that azb > bya. That gives an equation:

  a z b
- b y a
-------
  d x f

Lets look just at the last digits b - a = f, since if azb > bya then a >= b as well. The last digit equation is actually 10 + b - a = f. We can rewrite it as
a - b = 10 - f
It can have a few results, but remember that we are looking for the smallest number that results in the difference, so we want the digit a to be as small as possible. In that case we can set it to:
a = 10 - f
b = 0
Now look at the right-most digits. Since we already know that a - b = 10 - f, then the digit d must be either 10 - f or 9 -f. If it's not, the difference can not be achieved, and you should return "NONE". If it is, we can consider a and b as computed and move to computing z and y in the same manner. As I said there are more details into it, but that should be enough to give you an idea of how to construct a fully working solution.

WeighingScale

Used as: Division One - Level Three:

Value 1000

Submission Rate 25 / 493 (5.07%)

Success Rate 9 / 25 (36.00%)

High Score zhuzeyuan for 604.25 points (27 mins 3 secs)

Average Score 503.73 (for 9 correct submissions)

The solution for this problem is quite complicated so I'll just start from the basic observations, and see how it goes from there...

Iteration 1

Removing all the difficult bits from the statement we get "bla bla bla, return the number of different pairs of weights that, bla, bla, bla." There is only around 1100 of these pairs, so it seems doable to check them one by one.

1 int[] count(String[] measures, int A, int B){
2     int[] result = {0,0,0};
3     n = measures.length;
4     for( {C, D} in {0..n-1}^2 ){ //all pairs of indices
5         if( {A,B} & {C, D} != {} ) continue; //A and B are already taken
6         if( something ) increment one position in result.
7     }
8     return result;
9 }

Iteration 2

Now we can look a bit closer at what exactly we need to check for each of the pairs.

The number of different pairs of weights you can put on the right pan to make it lighter than the left pan, the number of pairs to make both pans the same weight, and the number of pairs to make the left pan lighter.

Keeping in mind that each weight can have only one of the three values -- 10, 20, or 30 grams. We can again check all the 81 combinations and directly see which side of the scale is heavier. The difficult bit is checking if the combination of the values is possible to achieve, considering the results in measures, but for now let's just focus on the part we know how to do.

 1 //... We are inside the C,D loop
 2 possibleOutcomes = {0,0,0};
 3 for( {w1,w2,w3,w4} in {10,20,30}^4 ){
 4    //Somehow checks if weights A,B,C,D can tak values w1,w2,w3,w4 respectively:
 5    if( !possible({A,B,C,D}, {w1,w2,w3,w4}) ) continue;
 6    if( w1 + w2 > w3+ w4 ) possibleOutcomes[0] = 1; //left heavier
 7    if( w1 + w2 == w3+ w4 ) possibleOutcomes[1] = 1; //same
 8    if( w1 + w2 < w3+ w4 ) possibleOutcomes[2] = 1; //left lighter
 9 }
10 if( sum(possibleOutcomes) == 1 ){ //checking if there is only one possible outcome
11    result += possibleOutcomes;
12 }

So we loop through {w1,w2,w3,w4} and, ignoring the combinations that are not possible, compare the total weight of the left pan with the right pan. The line number 10 implements the sentence "You should only consider the pairs that make the result unambiguously predictable based on the results in measures." If we are able to predict the outcome unambiguously, that means that the left pan was always in the same relation to the right pan, thus possibleOutcomes has only a single 1 in it. The last step is to increment the result on the correct position.

Iteration 3

Now the solution is almost ready and we still haven't had to do anything difficult! The only part left is a function checking if the weights with indices {A,B,C,D} can have values {w1,w2,w3,w4}. That's the time when we need to take a closer look at the measures matrix. It should be obvious that if we have two weights x and y and measures[x][y] == '+' than for sure x doesn't weigh 10 grams, nor y weigh 30 grams. Moreover, if we have a weight z such as measures[y][z] == '+', then there is no other way to assign values than (x = 30; y = 20; z = 10). To check the possible values for all the weights we can do the following:

initialize possibleWeights[measures.length][3] to make all the elements true.
For each node x run a dfs that traverses only the '+' and '=' edges. The dfs returns a maximum number of pluses it found on a single path from x.
If dfs returns 0 it doesn't give us any new knowledge about the value of x. If it returned 1 we can set possibleWeights[x][0]=false (index zero represents 10 grams, 1 - 20 gram, etc), as it can not be 10 grams if it's heavier than something else. When the dfs returns 2 we can set both possibleWeights[x][0] and possibleWeights[x][1] to false, which leaves 30 grams as the only possible value. Note the it can't return anything above 2, as if it did there is a bug in our code.
Do the same as previously, but now run through '-' and '=' edges and update the possibleWeights matrix accordingly.

Now our code looks like this :

 1 int dfs( int p, char sign, boolean[]used) {
 2   if(used[p]) return 0;
 3   used[p] = true;
 4   int res = 0;
 5   for(int q=0; q <n ;q++){
 6       //Edges with '=' don't add anything to the result, edges '+' or '-' do.
 7     if(measures[p][q] == '=' ) res = max(res, dfs( q, sign, used));
 8     if(measures[p][q] == sign ) res = max(res, 1 + dfs( q, sign, used));
 9   }
10   return res;
11 }
12 
13 
14 int[] count(String[] measures, int A, int B){
15     int[] result = {0,0,0};
16     possibleWeights = {{true}};
17     for( x =0; x<n; x++ ){
18         int plusDepth = dfs(x, '+', new boolean[n]);
19         int minuDepth = dfs(x, '-', new boolean[n]);
20         //updating possibleWeight[x] based on the dephts; 
21     }
22    //   ...
23    //   Inside the loop with the four possible weight values 
24     if( ! possibleWeights[A][w1] ||  ! possibleWeights[B][w2] ||
25         possibleWeights[C][w3] ||  ! possibleWeights[D][w4] ) continue;
26     //the rest as before
27 }
28

And that's all. It wasn't so hard, was it?
System> slex is compiling the 1000-point problem.
System> slex is testing the 1000-point problem.
System> slex is testing the 1000-point problem.
System> slex is testing the 1000-point problem.
System> slex is testing the 1000-point problem.
Oops it disagrees on the example 3. Is it a wrong example test (again)? ...

Iteration 4

Back to the drawing board. Let's look at example 3.

3)

{"??+?", "???+", "-???", "?-??"}
0
1
Returns: {1, 0, 0 }
We know that w0 > w2 and w1 > w3 thus w0+w1 must be heavier than w2+w3.

So, we have only one pair of weights to consider {2,3}, our possibleWeights table looks like:

	10 grams	20 grams	30 grams
weight 0	false	true	true
weight 1	true	true	false
weight 2	false	true	true
weight 3	true	true	false

So in example we can have 30+20 > 20+10, which makes the left pan heavier, but we can also have 20+20 == 20+20, which makes the both pans the same weight. But wait! How can both w0 and w1 be 20 if the measures says that w1>w2? Well it can't be, it's just that our program doesn't know it yet.

The example shows that it's not only important what values a single weight can have, but also if all the four values together match the relation defined by measures. To fix that we introduce one more matrix, Relation[n][n], where each cell can have one of the 4 values:

'+' - the i-th weight is heavier than the j-th weight.
'-' - the i-th weight is lighter than the j-th weight.
'=' - both weights have the same weight.
'?' - the two weights are not correlated.

Looks familiar, doesn't it? It is a transitive closure of the measure matrix. To compute it we can again run n pairs of dfs's, just the same way as we did for computation of possibleWeights, only this time we are not interested in how deep we can go, but in what weights we can reach. After this is done, we check for every pair from {A,B,C,D} and corresponding weights from {w1,w2,w3,w4} if the values match the relation.

 1     private boolean check(int a, int b, int w1, int w2) {
 2         if (Relation[a][b] == ?)
 3             return true; //no relation
 4         if (Re[a][b] == '-' && w1 >= w2)
 5             return false;
 6         if (Re[a][b] == '=' && w1 != w2)
 7             return false;
 8         if (Re[a][b] == '+' && w1 <= w2)
 9             return false;
10         return true;
11     }
12 
13     /** This gets {A,B,C,D}, {w1,w2,w3,w4} **/
14     private boolean check(int[] ind, int[] w) {
15         for (int i = 0; i < ind.length; i++)
16             for (int j = i + 1; j < ind.length; j++) {
17                 if (ind[i] == ind[j])
18                     return false;
19                 if (!check(ind[i], ind[j], w[i], w[j]))
20                     return false;
21             }
22         return true;
23     }
24

Adding this check in the inner-most loop makes the solution pass. That was basically the reference solution, and the thought process behind it. You can see the similar structure in SnapDragon's code from the match.

By slex
TopCoder Member