Wednesday, March 14, 2007 Match summaryThis match was coincident with the ACM ICPC World Finals - not just the same week or the same day, but the SRM actually started and ended during the 5 hour event. As such, several top student competitors were out of this competition. In spite of this and the apparent widespread confusion surrounding daylight savings time changing early this year in the USA, five targets and more than 900 total competitors still showed up for the event. At least one of the coaches for the ICPC World Finals also competed, something that we found amusing in the admin room. In division 1, the match started fast with Quelloquialism finishing the first problem in just under three minutes. Six other people submitted for more than 240 points. Division two saw an even faster start, as their trivial 250 was solved by about 8 people within 2 minutes. The division 2 500 submissions followed, some as fast as the division 1 250 submissions, but most a bit slower. The Division One 500 proved challenging to solve and slow to implement for most competitors, with JongMan's impressive 13-minute submission as the only standout exception. Except for a couple solutions in the 17-minute range, everyone else took at least 20 minutes to solve it, and some took more than twice that long. The 1000-point problem was harder to figure out but reasonably quick to code for both divisions, which also meant they had a big effect on the final scores. When the dust settled, Eryx was the clear winner of division 1, with one of the fastest 250s, the fastest 1000, and 2 successful out of three challenges. ahyangyicame in next with another very impressive time on the 1000, and andrewzta was in 3rd after Petr dropped below him due to a failed challenge. In division 2, many more people solved all three problems, and tteesstt, vpj and LayCurse came out on top. The ProblemsDegreesToRadiansUsed as: Division Two - Level One:
This problem had two very simple parts. The first is calculating the number of degrees as a double rather than the degree-minute-second combination. The second is converting that number to radians. Java even has a built-in function to do this conversion, so the resulting formula is simply Math.toRadians(degrees+minutes/60.0+seconds/3600.0). TagalogDictionaryUsed as: Division Two - Level Two: Used as: Division One - Level One:
This is the second time I've written a problem based on the Tagalog language. Actually, a few people who were up on their Kawigi-trivia guessed that I had written it without even reading the whole problem statement. The first one was TaglishTranslator from SRM 220. The point of the other one was actually a bit of a poke at high-rated TopCoder veterans who complained about certain problems favoring native English-speakers, saying "here's a problem that favors native Tagalog-speakers!" Unfortunately, as far as I know, no native Tagalog-speakers got to that problem during the challenge with enough time to finish it. While native Tagalog-speakers might have had a mild advantage on this problem (a few came and told me they liked the problem later), I really picked it because it's a nice, slightly twisted sorting problem. In spite of the linguistic history I provided at the beginning of this problem, some people still didn't realize that Tagalog is a real language until I was speaking it with the couple of Tagalog-speakers that came to ask me how I knew about their language after the match. In order to write a solution to this problem quickly, it helps to know the standard libraries for sorting in your language. Otherwise, you'll need to write a whole sorting algorithm, where the built-in sorting functions allow you to just implement a comparison. Since the standard lexicographical ordering doesn't work in this case (the k is out of order and the ng messes things up), the way I implemented my comparison function was to normalize the two strings to strings that could be lexicographically compared, and comparing those normalized strings. My clever way of doing this looked like this:
Most of the faster times on this problem also had particularly clever ways to either compare the strings or normalize the strings to be compared automatically. One of the most interesting solutions used a Java library I'm not familiar with - erdal's submission used a RuleBasedCollator to specify the sorting order of substrings. I also liked the method shared by TheRaven and ruk, which involved converting k to c, shifting o and p to p and q, and changing ng to o, then sorting and reverting the changes. For comedic value, Quelloquialism's fastest submission is worth looking at - I'm sure one "z" would have been enough, but why not add 55, just to be safe? ReverseUnnaturalBaseConversionUsed as: Division Two - Level Three:
The hardest part about writing this problem for me was coming up with a clear, concise description of number bases (such that it would be obvious what they meant in the context of both positive and negative bases). My favorite part about this problem is that it is a beautiful generalization of both standard number base conversion and problems like The Moronic Cowmputer from USACO (available at the SPOJ archive, as well as other online judges). The (relatively) simple algorithm people use to convert numbers into positive number bases also works on negative bases if it is implemented carefully. This algorithm basically comes down to something like this: x : number to be converted b : number base s : return string s = "" digit : next digit to prefix s with if x = 0 return "0" if b > 0 and x < 0 addsign = true x = abs(x) while x != 0 digit = x mod abs(b) s = digit + s x = (x-dig)/b if (addsign) s = "-" + s; return s The interesting lines of code are:
Used as: Division One - Level Two:
This problem is based on a conversation with a friend of mine at Microsoft. It's a problem he had floating around in his head, and he posed the problem to me because he knew me to be someone who loves solving a good, algorithmically interesting problem. As such, I let it swim around in my head for a little while, too, and concluded that it's a very good, and practically interesting, dynamic programming problem. However, when I tried to write a solution to this problem, I started to realize that the problem is harder than I thought it would be. I think on the whole, the people who tried to solve it during the match would agree that it was harder than most div 1 500s. There are differences between the "real-world" version of this problem and the "programming contest" version. The real-world version would have to be able to handle longer input strings (probably on the order of 100-120 characters), and many more resource strings (easily thousands). The "real-world" problem would also use non-resource strings in some cases (numbers, user-assigned names for objects, or file names, for instance) and might allow appending. On the other hand, the programming challenge version has pathological and fairly degenerate resource strings. The gist of the problem was that you were given a bunch of strings as resources, created from a combination of normal characters and possibly littered with "%s" tokens, representing places where other strings are inserted, inspired by C's printf family of functions. Then you're given a string, str, which may (or may not) have been constructed by repeatedly calling sprintf with those resource strings and other strings created by using sprintf on other resource strings. The problem is to figure out how many unique ways str could have been constructed from the resources given. The intended way for people to solve the problem was dynamic programming on all substrings. This is a common pattern of dynamic programming that can be effectively done either bottom up (i.e. the filling-up-a-table way) or top-down (the recursive with memoization way). In either case, you start with an NxN table (where N is the length of str), and you try to calculate each entry i,j in the table (where i <= j) no more than once. Each entry represents the number of decompositions of the substring of str from index i to index j, inclusive. To fill the table bottom-up, you should check all length-1 strings, then all length-2 strings, and so on, because longer substrings depend on shorter ones. Then you're left with the problem of figuring out the number of decompositions of a substring given the number of decompositions of its substrings. The easy thing to recognize, perhaps, is that it's the sum of the number of ways to decompose the substring using each resource strings. When I got to this subproblem the first time I tried to write a solution to this problem, I realized that the problem was harder than I thought (and I even proposed that it might be used as a level 3 problem). The reason was that finding the number of decompositions of a substring with a given outer resource string also required dynamic programming. I'm going to have a harder time describing how I solved this subproblem in English, so I'll describe it with a gratuitously commented version of my code. In the following code, substring is the substring being analyzed, start and end are the zero-based indexes of the first and last characters of substring in the original string. resparts is the result of resources[i].split("%s", -1), which divides up the resource on instances of "%s", including an empty string at the end if it ends with "%s". MOD is a constant representing 1000000007. memo is the table I described above: // Make sure the first and last sections of the resource string match the beginning and end of this substring: if (substring.startsWith(resparts[0]) && substring.endsWith(resparts[resparts.length-1])) { // The first dimension is the number of %s's in the current // resource plus the number of literal segments between // those %s's. Even indexes here represent the points ending // with literal sections of the resource string and // odd indexes represent points ending with %s insertions. // Think of it like I've divided the resource string // chunks which are either "%s" or literal substrings with no "%s". // The second dimension is the length of the sub-substring // that has been processed so far. // so tempmemo[x][y] represents the number of ways to construct // the first y characters of substring with the // first x+1 chunks of the resource string. int[][] tempmemo = new int[resparts.length*2-1][substring.length()+1]; // We've already verified that the first substring matches. tempmemo[0][resparts[0].length()] = 1; // for each chunk of the resource string: for (int j=1; j The nested (and interleaved) nature of the dynamic programming makes this a particularly challenging DP problem, and the loads of iteration (especially in the case that the resource strings include lots of %s's) made it so we couldn't use this solution with the typical constraint of "everything can be 50". One way to specifically optimize against the worst cases was to precompute the minimum length of a substring that could match each resource (basically it was the number of %s's + the number of characters not part of a %s), and not do the inner DP unless the resource could possibly match the substring based on that. Still, that will only help on certain kinds of test cases, and in the end, we decided to reduce the constraints to allow the intended solution to pass. I think there a more efficient solution to the problem exists, particularly if one makes optimizations for the common practical cases in the real-world problem, but perhaps even in the most pathological cases. If you can think of some, feel free to share them with me and the community. The higher-scoring solutions during the challenge (like JongMan's, here) used a single dynamic programming scheme rather than the weaved algorithms that were more obvious to me - the state space was typically four-dimensional - the first two dimensions represented the current substring, and the last two represented the resource and which piece of the resource. DrivingAroundUsed as: Division One - Level Three:
The solution to this problem should be both quickly recognized and relatively easy to throw together for people experienced in competitive programming - if the graph were unweighted, that is. It's a well-known fact that we reds try and keep a secret from everyone else that the number of ways to get from point a to point b in a graph in exactly x steps, given an adjacency matrix, is element a,b of the adjacency matrix raised to the xth power. We also try to keep it a secret that the exponentiation can be done in O(n^3logx) time, where n is the size of the matrix. This kind of algorithm is the way to go if x is unreasonably large, as it is in this problem (if the graph were bigger or more general and the time were smaller, there's a much simpler O(E*x) algorithm for this particular problem). The quirk to this problem is that the edges are weighted. When I originally came up with this problem, I thought that the solution would involve expanding each edge into several more nodes (the length of the edge minus one), which makes the size of the matrix explode to unreasonable proportions rather quickly. I thought that maybe if the graph were undirected, I could reuse nodes going both ways, but one day as I was (appropriately, I might add) driving around, I realized that wouldn't work, because you couldn't just turn around any time you wanted in the middle of the road. Then it came to me that there was a different class of nodes that were equivalent - nodes in the middle of a road on the way to the same node. My solution was then to create 5 nodes for every original node - one that represented the original node, and 4 that acted as a sort of "runway" to it. Then I reconstruct the graph with the edge pointing from the first node to either the second node (if the edge length was 1) or a point on that node's runway. I exponentized that matrix, and returned the answer from the nodes in the new graph that corresponded with my start and end times. Cosmin.ro, who was one of the problem's testers, came up with a slightly different but closely related approach. Rather than converting the graph to a square n*l x n*l matrix (where n is the number of nodes and l is the maximum length of an edge), he created an n x n x l x l 4-dimensional matrix. He still expanded it by something like logarithmic exponentiation; he just represented the modified graph differently. |
|