|
Match Editorial |
SRM 174Saturday, December 13, 2003
Match summary
TopCoder served up a gamblers' special on the weekend as, improbably
enough, every Division One problem dealt with chance. The horse to back
was SnapDragon, who galloped across the finish line after less than
thirty minutes' coding. Yarin would have been another good bet,
returning from his Final Four showing at the TopCoder Open (coincidentally
held in a casino) to place a distant second in this match. Half a pace
behind came WishingBone to complete the trifecta. Probabilistic
lunacy extended to the harder problems in Division Two, though the
easiest was a tame deterministic affair. Olexiy cruised through
them all for the division win, with fellow first-timers monsoon
and pigworlds rounding out the top three in a crowded field.
The Problems
CrossWord
Used as: Division Two - Level One:
Value
|
250
|
Submission Rate
|
182 / 221 (82.35%)
|
Success Rate
|
160 / 182 (87.91%)
|
High Score
|
mpbailey for 248.04 points (2 mins 32 secs)
|
Average Score
|
202.60 (for 160 correct submissions)
|
Given an array of strings filled with the characters '.' and 'X', we are
to count the substrings of given length that consist only of '.' and
that do not adjoin a '.' on either side. We can find all substrings
fitting this description by making a single pass over each string.
Upon encountering an 'X', we consider the number of uninterrupted
'.' characters we have seen thus far. To account for cases where a
suitable substring occurs at the very end of a string, we append a
sentinel 'X' to each.
def crossword(board, size):
seen = 0
for row in board:
row += 'X'
count = 0
for ch in row:
if ch == '.':
count += 1
else:
if count == size:
seen += 1
count = 0
return seen
We must not forget to initialize the '.' count after seeing an 'X'.
BirthdayOdds
Used as: Division Two - Level Two:
Value
|
500
|
Submission Rate
|
179 / 221 (81.00%)
|
Success Rate
|
125 / 179 (69.83%)
|
High Score
|
hw_Tim for 495.43 points (2 mins 44 secs)
|
Average Score
|
356.05 (for 125 correct submissions)
|
Used as: Division One - Level One:
Value
|
250
|
Submission Rate
|
138 / 140 (98.57%)
|
Success Rate
|
106 / 138 (76.81%)
|
High Score
|
overwise for 247.15 points (3 mins 3 secs)
|
Average Score
|
217.13 (for 106 correct submissions)
|
Given the number of days in a planetary year and the number of people in
a room, we are to calculate the probability that at least two of them
share a birthday. Oops! That's not exactly what the problem says. It
asks us to calculate the number of people such that the probability of
a shared birthday reaches a given threshold. But the required number
of people in the room will not exceed the number of days in a year by more than one,
so we can start with an empty room and add people to it, calculating
the probability at each increment until we cross the threshold.
The superficial difficulty is that there are many ways for some set of
people to have a birthday in common. There may be one or more pairs of
people sharing a birthday, or several triples, even quadruples or greater.
We can dispose of this red herring by calculating the probability that
no two people share a birthday. There must be a shared birthday in all
other cases, so we subtract this probability from 1. Better yet, we can
leave it alone, aiming instead for 1 minus the threshold probability.
Just what is the probability that n people share no birthday
among m possible days? The first of these n people can be
born on any one of the m days. The second can only be born on one
of m-1 days remaining out of the m days in the year. The
third must be born on one of m-2 out of m days in the year,
and so on down to (m-n+1)/m. We multiply each of
these ratios to obtain the final odds.
def people(threshold, days):
threshold = 1.0 - threshold/100.0
prob = 1.0
i = 0
while (prob > threshold):
prob *= (1.0*days-i)/days
i += 1
return i
Our calculation is simplified by the fact that we are interested in
all possible orderings of birthdays. All cases where person x
is born on March 3rd and person y on July 4th are distinct from
those where x is born on July 4th and y on March 3rd.
ProbabilityTree
Used as: Division Two - Level Three:
Value
|
1000
|
Submission Rate
|
67 / 221 (30.32%)
|
Success Rate
|
43 / 67 (64.18%)
|
High Score
|
pigworlds for 880.76 points (10 mins 45 secs)
|
Average Score
|
580.60 (for 43 correct submissions)
|
We are asked to compute the probability of each event in something
called a probability tree and to report which ones fall within a given
interval. There's some beating around the bush in the problem statement,
but the crucial facts are as follows.
Each node in the probability tree is marked with the probability q
that its event will occur if the event associated with its parent also
occurs, and the probability r that its event will occur even if
that of its parent does not. If we know the probability p that
the parent event will occur, we can calculate this event's probability
with the formula p*q+p*r.
If we don't know the probability of the parent event, we had better
wait until we've calculated it. This suggests an iterative approach in
which we make multiple passes over the nodes of the tree to calculate the
probability for every node whose parent probability is already known. We
make at least one new calculation with each pass until, finally, all
probabilities are known.
def odds(tree, lower, upper):
n = len(tree)
indices = []
probs = [-1] * n
done = [0] * n
probs[0] = float(tree[0])/100.0
if (probs[0] > lower/100.0 and probs[0] < upper/100.0):
indices.append(0)
m = 1
while (1):
if (m == n):
break
for i in range(1, n):
subs = tree[i].split()
parent = int(subs[0])
if (probs[parent] < 0.0 or done[i]):
continue
p1 = float(subs[1])/100.0
p2 = float(subs[2])/100.0
probs[i] = p1*probs[parent] + p2*(1.0-probs[parent])
m += 1
done[i] = 1
if (probs[i] > lower/100.0 and probs[i] < upper/100.0):
indices.append(i)
indices.sort()
return indices
The counter m is incremented with each new calculation until it
reaches n, the number of nodes. We use the strictly-greater and
strictly-less comparisons because an exclusive interval is specified. Note
also that we use a sentinel array to avoid duplicate calculation, and
that we sort the indices immediately before returning them.
RangeGame
Used as: Division One - Level Two:
Value
|
600
|
Submission Rate
|
53 / 140 (37.86%)
|
Success Rate
|
33 / 53 (62.26%)
|
High Score
|
SnapDragon for 503.72 points (12 mins 56 secs)
|
Average Score
|
300.57 (for 33 correct submissions)
|
This problem is almost a generalization of the notorious brainteaser known
as Monty Hall's dilemma. Monty Hall was the host of a game show in which
the grand prize was a Mercedes Benz. If the contestant made it to the
final round of the game, she would confront three closed doors. Behind
one of them was the Mercedes, while the other two concealed a pair of
goats. The contestant would indicate her choice by pointing to one of
the doors. Instead of opening it, however, Monty Hall would open one
of the two other doors to reveal a goat. The contestant would then have
the opportunity to alter her choice of door before the final disclosure.
The question is whether the contestant stands to gain by switching to
the remaining door. One school of thought, a fallacious one, is that
the Mercedes is either behind the initially chosen door or behind the
other unopened one, so the contestant's chance of winning is one half
regardless of whether she changes her choice. Another fallacious line
of reasoning concludes that she has a two-thirds chance by staying with
her first choice, since Monty Hall can choose one of two doors if she
initially picked the Mercedes, but is constrained to only one if she is
standing before a goat.
In fact, the contestant has a two-thirds chance of winning if she
changes her selection. To see why this is so, consider the difference
between a contestant who has a policy of never switching and one who
always does. The non-switcher has a one-third chance of winning, since
she picks a door and sticks with it. The switching contestant, however,
loses only in those cases where the non-switcher would win, and wins
otherwise. An astute contestant will therefore double her chances by
switching, assuming the constraint that Monty Hall consistently reveals
a goat. In real life, the financial pressures of a television show might
make it less likely that the contestant is offered a chance to switch
when her initial choice is a door concealing a goat.
In the RangeGame problem, however, if the prize patterns are {"10", "20",
"30"} and the hint history is {"20"}, the answer is not {10, 30} but {10,
10}. Why the discrepancy? Note that on Monty Hall's game show, the host
will never open the door that is the contestant's initial choice. In RangeGame,
on the other hand, the host is as likely to open the contestant's first choice as
any other non-winning door. This changes the event
space in such a way that there is no longer any advantage to pursuing
the switching strategy.
The problem at hand is essentially asking us to identify the door whose
number is included in the greatest number of intervals. Because ties are
broken in favor of lower numbers, the optimal number will always be the
lowest number in some interval. This optimality principle can be proven by
contradiction. Suppose that the optimal number x is not the lowest
in any interval. Consider x-1, its lesser neighbor. If x-1
is included in as many intervals as x, then x is not
optimal. But if x-1 is included in fewer intervals, then x
must constitute the border of some interval and is therefore its lowest
number. Hence, it cannot be the case that the optimal number is not the
lowest in any interval.
The upshot is that we can restrict our search to the lowest number in each
interval. First, let's delegate the door-pattern parsing to a helper
function. A pattern is formed of space-separated globs. We split each glob
into a pair of consecutive integers, so that the entire pattern is
represented by an integer array. Note the special case where an interval
begins and ends with the same integer.
def intervals(patt):
ret = []
globs = patt.split(' ')
for glob in globs:
parts = glob.split('-')
ret.append(int(parts[0]))
if len(parts) == 1:
ret.append(int(parts[0]))
else:
ret.append(int(parts[1]))
return ret
It's also convenient to have a function that decides whether one sequence
of intervals intersects another at any point. We step through consecutive
pairs of integers in each array, comparing them pairwise for overlap.
def overlap(ar, br):
for ai in range(0, len(ar), 2):
for bi in range(0, len(br), 2):
if (ar[ai] < br[bi] and ar[ai+1] < br[bi]):
continue
if (ar[ai] < br[bi+1] and ar[ai+1] > br[bi+1]):
continue
return 1
return 0
In order to find the optimal door number, we take the lower number in
each interval and double it, as it were, to make a temporary interval
that we can pass to overlap.
def best(doors):
max = -1
maxlap = -1
for door in doors:
for i in intervals(door):
ii = [i, i]
lap = 0
for d in doors:
if (overlap(ii, intervals(d))):
lap += 1
if (lap > maxlap or (lap == maxlap and i < max)):
max = i
maxlap = lap
return max
Finally, we step through the hints one by one, eliminating overlapping
doors and recalculating the optimal door number with each iteration.
def guess(doors, hints):
guesses = [best(doors)]
for hint in hints:
doomed = []
for door in doors:
if overlap(intervals(hint), intervals(door)):
doomed.append(door)
for door in doomed:
doors.remove(door)
guesses.append(best(doors))
return guesses
We use the doomed array to avoid modifying doors while we're
iterating over it.
PointSystem
Used as: Division One - Level Three:
Value
|
800
|
Submission Rate
|
30 / 140 (21.43%)
|
Success Rate
|
21 / 30 (70.00%)
|
High Score
|
SnapDragon for 680.68 points (12 mins 20 secs)
|
Average Score
|
497.37 (for 21 correct submissions)
|
Given the minimum number of points and minimum lead required to
win a game, as well as the probability that the underdog wins the
point in any turn, we are to calculate the probability that the
underdog will eventually defeat his opponent. The key to solving this
problem is to envision the event space as a matrix of possible scores
(x,y) where the underdog has scored x points to
the favorite's y. We can then formulate a recurrence that uses
the probabilities of lower scores to calculate the probability of a
higher score.
If the underdog has chance s of winning any given point, the
odds that he can make the score (1,0) are exactly s, and the
odds of a (0,1) score are 1-s. What are the odds that the score
will become (1,1)? The case where the favorite catches up contributes
(1-s)*s to the probability of this event, and the case
where the underdog is the one who scores the second point contributes
s*(1-s). There are no other ways of reaching the score
(1,1) with the gain of a single point by either player.
In general, the odds of reaching a score (x,y)
are P(x,y) = s*P(x-1,y) +
(1-s)*P(x,y-1). We can put this formula directly
to use by implementing recursion with memoization. Alternatively, we
can start from the lowest scores and build our way up.
def upset(min, lead, skill):
skill = skill/100.0
prob = 0.0
odds = []
for i in range(2001):
odds.append(2001*[0.0])
odds[0][0] = 1.0
for games in range(2000):
for underdog in range(games+1):
favorite = games-underdog
if (favorite >= min and favorite-underdog >= lead):
continue
if (underdog >= min and underdog-favorite >= lead):
prob += odds[underdog][favorite]
continue
odds[underdog+1][favorite] += odds[underdog][favorite]*skill
odds[underdog][favorite+1] += odds[underdog][favorite]*(1.0-skill)
return prob
The two continue clauses are vital to the correct functioning
of this program: we must skip scores that are unattainable because
the game would have ended at a prior point spread. Observe that as
x+y increases, the probability of a true underdog, with
s < .5, attaining a score of (x,y) tends toward
zero. Experimentation shows that calculating the odds of every score up
to x+y=2000 yields sufficient precision for our purposes.
By
Eeyore
TopCoder Member