49075721883_daa84bfb7d_b
TCO22Topcoder Open (TCO)

TCO22 Round 4 Editorial

PreInPostOrder

This task generalized the traditional exercise “restore a binary tree given its, for example, pre-order and in-order printout”.

If you are given the pre-order and the in-order of the tree, the key observations are:

  • The pre-order starts with the root node, so you know the root node’s number.

  • The in-order has the left subtree before the root and the right subtree after the root, so once we know the root (which we do), we know the left and right subtree.

  • After the root, the pre-order continues with the left subtree followed by the right subtree. Once we know their sizes, we can split the pre-order of the whole tree into the pre-orders of the two subtrees.

If you are given the in-order and the post-order of the tree, the reasoning is pretty much the same. In both cases the full tree can always be fully reconstructed.

If you are given the pre-order and the post-order, the situation becomes trickier: these two orders don’t always determine the full tree in an unambiguous way. In particular, there can be multiple different trees that share both the same pre-order and the same post-order. (E.g., all 2^(N-1) trees that are downward-going paths with the same sequence of labels from top to bottom have the exact same pre-order and they also all have the same post-order.)

In our problem you are alternating through all three of the above situations. Whenever the first step was the in-order step in either of the two sequences you have, you can determine the root, determine the split into subtrees, verify whether it’s valid (in each sequence the left subtree must contain the same set of elements) and then you can recurse on each subtree separately. The total number of matching trees is then simply the number of matching left subtrees times the number of matching right subtrees.

Whenever the first step was the post-order in the first sequence and pre-order in the second, you know that the first one ends and the second one starts with the root. Those two values must therefore be the same (if they aren’t, there is no solution). After the verification of the root we can discard it. This leaves us with two sequences: the first one is PreInPost(left subtree) + PreInPost(right subtree), the second one is InPostPre(left subtree) + InPostPre(right subtree). We need to split the sequences into the codes of the left subtree and the right subtree. In order to do this, we simply iterate for all possible places for the split. For each of them we recurse on the left subtree and right subtree separately, multiply the two numbers we get, and we sum that over all choices of where to make the split. Clearly, different choices of where to make the split lead to mutually disjoint sets of valid trees, and thus we get the correct answer.

Each subproblem in the recursion described above can be described by four parameters: the step number (modulo 3) telling us where in the pre-in-post cycle we are, the length L of each of the two sequences we have and the offsets x, y within each of the input arrays at which these two sequences start.

I.e., in each recursive call we are counting the trees that match two sequences: PIP[x:x+L] and IPP[y:y+L].

This gives us O(N^3) possible recursive calls. Each of them can be evaluated in O(N) time if we don’t count the recursive calls made during the evaluation. Thus, once we add memoization / convert this into a dynamic programming solution, we will have a solution with time complexity O(N^4), which is clearly sufficient for the given constraints. 

The code below is the iterative version (DP) of the above solution.


public int reconstruct(int[] PIP, int[] IPP) {
int N = PIP.length;

// precompute the information whether PIP[x:x+L] and IPP[y:y+L]
// contain the exact same set of values

boolean[][][] compatible = new boolean[N+1][N+1][N+1];
for (int z1=0; z1<=N; ++z1) for (int z2=0; z2<=N; ++z2) {
boolean[] difference = new boolean[N];
int differentCount = 0;
compatible[z1][z2][0] = true;
for (int l=1; z1+l<=N && z2+l<=N; ++l) {
int t;
t = PIP[z1+l-1];
difference[t] = !difference[t];
if (difference[t]) ++differentCount; else --differentCount;
t = IPP[z2+l-1];
difference[t] = !difference[t];
if (difference[t]) ++differentCount; else --differentCount;
compatible[z1][z2][l] = (differentCount == 0);
}
}

long MOD = 1_000_000_007;
long[][][][] dp = new long[3][N+1][N+1][N+1];
for (int l=0; l<=N; ++l)
for (int z1=0; z1+l<=N; ++z1)
for (int z2=0; z2+l<=N; ++z2)
for (int off=0; off<3; ++off) {
int noff = (off+1) % 3;

if (!compatible[z1][z2][l]) continue;

if (l == 0) {
dp[off][z1][z2][l] = 1;
continue;
}
if (off==0) {
int root = PIP[z1];
int where2 = z2;
while (IPP[where2] != root) ++where2;
int dl = where2 - z2;
long waysLeft = dp[noff][z1+1][z2][dl];
long waysRight = dp[noff][z1+1+dl][z2+1+dl][l-1-dl];
dp[off][z1][z2][l] += waysLeft * waysRight;
dp[off][z1][z2][l] %= MOD;
}
if (off==1) {
int root = IPP[z2+l-1];
int where1 = z1;
while (PIP[where1] != root) ++where1;
int dl = where1 - z1;
long waysLeft = dp[noff][z1][z2][dl];
long waysRight = dp[noff][z1+1+dl][z2+dl][l-1-dl];
dp[off][z1][z2][l] += waysLeft * waysRight;
dp[off][z1][z2][l] %= MOD;
}
if (off==2) {
int root1 = PIP[z1+l-1];
int root2 = IPP[z2];
if (root1 != root2) continue;
for (int dl=0; dl<=l-1; ++dl) {
if (!compatible[z1][z2+1][dl]) continue;
long waysLeft = dp[noff][z1][z2+1][dl];
long waysRight = dp[noff][z1+dl][z2+1+dl][l-1-dl];
dp[off][z1][z2][l] += waysLeft * waysRight;
dp[off][z1][z2][l] %= MOD;
}
}
}
return (int)dp[0][0][0][N];
}

PoisonedApples

This is one of those problems where it’s really easy to trust your intuition and go with a promising greedy solution… that then turns out to be incorrect. Many of the contestants got lured into doing this during the round. Some managed to resubmit with a correct solution later, others were quickly challenged once the challenge phase started.

Before we get to a full solution, let’s start with some observations and a demonstration how a greedy strategy can fail.

The first observation is that for any pair (B bins, at most P of them are poisoned) there is some minimal value Amin of apples per bin for which a solution exists, and all A > Amin are then also solvable. The first bit follows from the observation that the sequence {1, 2, …, 2^(B-1)} is a valid solution for P=B and thus also for all smaller P, hence for any B and P we have Amin <= 2^(B-1). The second bit follows from the observation that whenever we are given A > Amin, we can simply use the exact same solution we would use for Amin.

The second observation is that if we make any purchase X = { X[0], X[1], …, X[B-1] }, the number of apples in that purchase is simply sum(X). Hence, the amount of coins we will pay is 10*sum(X) + the number of poisoned apples in our purchase. In order to be able to correctly identify the set of poisoned bins, each valid set of poisoned bins must correspond to a different total. Thus, X is a valid solution if and only if:

  • max(X) <= A

  • if we look at all 0-, 1-, …, P-element subsets of X, their sums must all be distinct.

As a special case, if P=B, we are looking for a sequence X such that all 2^B subsets of X have mutually distinct sums.

Now for the demonstration of some greedy strategies failing. Let’s look at the case B=4.

One common greedy strategy is to build X incrementally by always including the smallest number we can without violating the condition “all interesting subsets have mutually distinct sums”.

For P = 1 this actually works. The empty set has sum=0 and the one-element sets are simply the elements of X, so they just all need to be positive and mutually distinct. For B=4 we will end with X={1,2,3,4} which is indeed the optimal solution.

For P = 2 we also need to keep track of all 2-element subsets. These must be all mutually distinct, and also distinct from the individual elements of X. For example, we can start with X={1,2,...} but now we cannot take 3 as the next value. This is because with the information “there are 3 poisoned apples in your bags” we would not know whether these are the 1+2 apples in the first two bags or the 3 apples in the third bag. 

The smallest third element we can use is 4. For X = {1, 2, 4, ?} the 1-element subsets so far have sums 1, 2, 4 and the 2-element subsets have sums 3, 5, 6. The next element must therefore be at least 7. We can easily verify that X = {1, 2, 4, 7} is a valid solution. It still happens to be the optimal one.

The greedy strategy fails for P=3. As for P=2 we get X = {1, 2, 4, ?} but now we also have the three-element subset {1,2,4} with sum 7 and so the greedy strategy must take at least 8 as the next element. The set X = {1, 2, 4, 8} is a valid solution for B=4 and P=3, but it’s not the optimal solution.

A better solution than the one produced by the above greedy strategy is the solution X = {3, 5, 6, 7}. If A=7, i.e., the witch only has 7 apples per bin, we can use this solution but not the one shown above - that one needs at least 8 apples in one of the bins.

We can easily verify that {3, 5, 6, 7} is a valid solution not just for P=3 but also for P=4. The one-element subsets have sums 3, 5, 6, 7, the two-element subsets have sums 8, 9, 10, 11, 12, 13, the three-element subsets have sums 14, 15, 16, 18, and the only four-element subset sums to 21. These are sixteen mutually distinct sums, so each possible setting will produce a different total price of apples.

Finding these optimal sequences turns out to be hard, but luckily B<=8 is still small enough for exhaustive search to work, as long as we are a bit careful about pruning it. The two easiest way to prune the search: First, once you find a solution, you can use it as an upper bound for all the values you’ll try adding to your X in the future. Second, each time you extend X, check whether you haven’t already created two relevant subsets with the same sum. If you did, backtrack immediately.

There are many other ways to speed up the search, but ultimately it’s a game of trade-offs: how much time you spend on optimizing your code versus how long it will then run. 

Below is a sample implementation of the brute force search with only the two prunings described above. On my local machine it was able to precompute all answers (sequentially, on a single core) in about 3 minutes.


#include <bits/stdc++.h


using namespace std;

int B, P;

// size_subsets[x][y] is the list of subsets of {0,...,x-1} with popcount y
vector< vector< vector<int
size_subsets;

vector<bool
seen_sum;
vector<int
current_subset_sum;

int best_maximum;
vector<int
best_solution;

vector<int
current_prefix;


bool viem[11][1025];
int biggest_sum[11];
bool viem2[1025];

bool add_if_possible(int x) {
int have = current_prefix.size();

// try adding x to all the already-existing subsets,
// verify that no new sum hits an old sum (as the old
// sums were distinct, the new ones are too among themselves)

for (int p=0; p<=have; ++p) for (int sub : size_subsets[have][p]) {
if (seen_sum[ current_subset_sum[sub] + x ]) return false;
}

// everything is ok, proceed to add the x
for (int p=0; p<=have; ++p) for (int sub : size_subsets[have][p]) {
int newsub = sub | 1<<have;
current_subset_sum[newsub] = current_subset_sum[sub] + x;
seen_sum[ current_subset_sum[newsub] ] = true;
}
current_prefix.push_back(x);
return true;
}

void remove() {
current_prefix.pop_back();
int have = current_prefix.size();
for (int p=0; p<=have; ++p) for (int sub : size_subsets[have][p]) {
int newsub = sub | 1<<have;
seen_sum[ current_subset_sum[newsub] ] = false;
}
}

void go() {
if (int(current_prefix.size()) == B) {
if (current_prefix.back() < best_maximum) {
best_maximum = current_prefix.back();
best_solution = current_prefix;
cout << "new best " << best_maximum << endl;
for (int x : best_solution) cout << x << " ";
cout << endl;
}
return;
}

int start = 1;
if (!current_prefix.empty()) start = current_prefix.back() + 1;

for (int nxt=start; ; ++nxt) {
int remain = B - int(current_prefix.size());
if (nxt + remain-1
= best_maximum) break;

if (!add_if_possible(nxt)) continue;
go();
remove();
}
}

int main() {
cin
B P;

size_subsets.resize(B+1);
for (int b=0; b<=B; ++b) {
size_subsets[b].resize(b+1);
for (int sub=0; sub<(1<<b); ++sub) {
int pc = __builtin_popcount(sub);
size_subsets[b][pc].push_back(sub);
}
}

current_subset_sum.resize( 1<<B, 0 );
seen_sum.resize( B*(1<<B), false );
seen_sum[0] = true;

current_prefix.clear();
best_maximum = (1<<B) + 7; // a too-big placeholder
best_solution.clear();

go();
cout << best_maximum << endl;
for (int x : best_solution) cout << x << " ";
cout << endl;
}

IntruderAlert

Without the intruder’s sabotage skills this problem would boil down to what’s called the domination number. (In a graph, a dominating set of vertices is any set X such that each vertex in V-X is adjacent to some member(s) of X. This is something similar to a vertex cover, but in vertex cover we are covering all edges whereas here we are covering adjacent vertices. In general, finding the minimal dominating set is NP-hard.)

The possibility of one sabotage places our problem somewhere strictly between double and triple domination:

  • Let N[v] be the closed neighborhood of vertex v, i.e., the set containing v itself and all its neighbors.

  • A set X of vertices is called double-(or triple-)dominating if for each vertex v of the graph at least 2 (or at least 3) of the vertices in N[v] belong to X.

  • In our problem, each triple-dominating set is clearly a valid solution – if each vertex is observed by at least three sensors, two of them will agree and identify the spy even if one other sensor is corrupted.

  • Conversely, each valid solution must be at least double-dominating, because if it isn’t, a vertex v that has at most one sensor can safely be visited by the spy.

  • Double-domination is a necessary condition but clearly not a sufficient one – e.g., both sensors on a path of length 2 are double-dominating but don’t protect it from the spy.

  • Triple-domination is a sufficient condition but clearly not a necessary one – e.g., a path of length > 2 has no triple-dominating set but it still can be protected from the spy. On graphs that do have triple-dominating sets another set smaller than the minimum triple-dominating set can sometimes still protect it.

A pair of conditions that are together both necessary and sufficient for a graph to be protected by sensors placed in the set X of vertices follows:

  • For each vertex v, N[v] contains at least two vertices from X.

  • For each pair of distinct vertices u, v, the union of N[u] and N[v] contains at least three vertices from X.

Sufficience: The first condition makes it impossible for the spy to not be noticed. The second condition implies that there is no pair of vertices u, v for which the spy can make an identical sensor report. 

Necessity: We already know that the first condition is necessary. We saw in an example that if the second condition is violated for any pair u, v, the spy can enter one of them and produce a report in which the two sensors u and v have in common disagree.

Even though we formulated the second condition as “for each pair”, we really only have to check it for pairs u, v that are reasonably close. Indeed: if N[u] and N[v] overlap in at most one vertex, the first condition already implies that their union must contain at least three elements of X. This leaves us with just two relevant cases:

  • If u and v are cells that share a side and both of them are in X, they need to have at least one more neighbor that’s also in X. (In other words, each 4-connected component in X must contain at least three vertices.)

  • If u and v are cells that share a corner, neither of them is in X and both of their common neighbors are in X, they need to have at least one more neighbor who’s in X.

In all other cases we can use the first rule to deduce that the second rule is already satisfied.

We can now use single-exponential dynamic programming (i.e., dynamic programming whose time complexity is only exponential in one dimension of the grid) to find the optimal solution. We will fill the grid one column at a time and each time we’ll be asking the same question: given this partial solution, can it be finished into a full valid solution, and if yes, what is the minimum number of additional sensors that need to be placed in the part that has not been processed yet?

If we are proceeding right to left, each state of the search can be described by giving:

  • one integer specifying the number of leftmost columns that haven’t been processed yet

  • three bitmask specifying the locations of sensors in the three leftmost columns we already processed

For each of these states we ask the above question. In order to answer it, we iterate over all valid possibilities for sensors in the next column. For each of them, we have to check:

  • whether we did not create a component of fewer than three sensors (such components will touch the current column but won’t extend into it, so there are only three possibilities what they look like)

  • whether we did not create the 2x2 pattern of two empty cells u, v and two diagonally adjacent sensors between them such that u, v have no other sensors watching them (again, it’s sufficient to check all 2x2 squares that are in already processed columns and touch the current column)

Equivalently, we could check the original two conditions:

  • for each cell that now has the whole neighborhood but didn’t have it before, does it have at least two sensors?

  • for each pair of close-enough cells that now both have the whole neighborhood but at least one of them did not have it before, do they have at least three distinct sensors that see at least one of them?

The solution we just described isn’t the asymptotically fastest one, there is some clear room for optimizations. However, for the given constraints it should already be fast enough when implemented properly, and the faster versions would probably also be more complicated in terms of implementation.

Ignoring polynomial factors, the time complexity of this solution on a grid with R rows is proportional to 2^(4*R).

The very simple rule of “dominating set, given one faulty sensor” can create really interesting and non-trivial patterns. For example, below is an optimal solution for an empty 5x50 input, as found by the reference solution:


.S.S.SS..S.S.SS..S.S.SS..S.S.SS..S.S.SS..S.S.SS.S.
SS.S.S.SSS.S.S.SSS.S.S.SSS.S.S.SSS.S.S.SSS.S.S.SSS
...S..S....S..S....S..S....S..S....S..S....S..S...
SSSSS.S.SSS.S.S.SSS.S.S.SSS.S.S.SSS.S.S.SSS.S.S.SS
.S..S.S.S..SS.S.S..SS.S.S..SS.S.S..SS.S.S..SS.S.S.