SRM 733 – Cayley’s Formula Derived
SRM 733 was recently held on Topcoder. This week, we'll analyze and breakdown division one's level two problem statement.
That would be "BuildingSpanningTreesDiv1".
The summary of the problem statement is as follows -
Given a complete graph of N vertices with each pair of vertices connected by a single undirected edge, you are given a set of edges. Your task is to determine the number of spanning trees that exist, given that the provided set of edges are part of the spanning tree.
For example, if there are 5 vertices, and the edges given are {1,2}, {3,4}, {4,5}, the graph would look as follows -
Thus, the output for the above input would be 6. As any vertex among {3, 4, 5} can connect with either among {1, 2} and form a spanning tree.
Therefore 3 * 2 = 6 will be the final answer.
Now, the editorial states the following -
"If there are any cycles among the given (green) edges, the answer is zero.
Otherwise, the actual topology of the given edges doesn’t matter, what matters is the size of each connected component.
Let these sizes be s_0, s_1, …, s_{k-1}. The answer is (n)^(k – 2) * s_0 * s_1 * … * s_{k-1}.
The proof can be seen as a generalization of Cayley’s formula on the number of trees with n nodes."
Cayley's formula on the number of trees with n nodes is - (n)^(n-2)
The majority of this article will be dedicated to proving this formula, however it is easy to note how the formula changes to the required variant to solve this problem.
Consider the given problem statement. As certain special edges have to compulsorily be present in the final spanning tree, we can treat each group of nodes connected by these special edges as one "supernode" with multiple ways to connect with other normal edges.
More specifically, a cluster of nodes of size s_i, has s_i ways of connecting to a spanning tree yet to add the current group of nodes to itself. i.e. a group of 3 nodes, has 3 possible ways to have the remaining spanning tree connect with itself. Thus, k groups of nodes of size s_i (0 <= i < k) each, has s_0 * s_1 * s_2 * ... s_{k-1} ways of connecting with each other.
This, is multiplied by the total number of spanning trees that can be formed from n nodes, given there are k groups, which is n^(k-2).
The general formula is obtained when the sizes of all nodes is equal to 1, and the number of groups consequently will be the same as the number of nodes, i.e. k = n.
Having cleared how Cayley's formula can be generalized to solve our problem statement, the question still arises, how is this formula obtained in the first place?
There are many elegant proofs of Cayley's formula. Some involve set theory and functions, while others are based almost entirely on counting and combinatorics, or even constructing recurrence relations and solving them.
This article will elucidate on one of the approaches used, in proving Cayley's formula. -
Prüfer's Method
Personally, I love this approach due to its elegance and simplicity. It proves Cayley's formula by showing that there exists a bijection between a set whose cardinality is n^(n-2) with a graph of n vertices, by uniquely encoding trees generated with a "Prüfer sequence".
After generating this unique Prüfer sequence, it becomes clear that every sequence is unique to a tree and vice versa, and also, the set of all possible Prüfer sequences has a cardinality of n^(n-2), thus directly solving our problem.
So how does the Prüfer sequence uniquely encode a tree? The sequence is generated in the following manner -
All vertices are numbered, before any processing. Now, repeat the following two steps until the tree consists of only two nodes -
Step 1: Consider all leaf vertices and select the leaf node with the lowest number.
Step 2: Append it's neighbour to a set (our Prüfer sequence) and remove the leaf node from the initial tree.
Now that we've obtained our Prüfer sequence, we need to guarantee two aspects before concluding our proof -
The Prüfer sequence is a unique encoding for a given tree.
The cardinality of the set of all Prüfer sequences is n^(n-2).
Let's prove our first assertion – Every Prüfer sequence uniquely encodes a certain tree. We can prove this, if we can determine our initial tree from our final Prüfer sequence, i.e. we have no ambiguity arising from the final Prüfer sequence when backtracking our steps.
Consider the tree in the diagram above, whose Prüfer sequence is {4, 4, 4, 5}. Let us try to derive our source tree using just the sequence, using basic reasoning and observation.
As we remove the leaf node with the smallest number and append its neighbour to our Prüfer sequence, the first leaf node removed had 4 as its neighbour, and the lowest number that is not in the Prüfer sequence must have been removed, therefore we know that this node should be numbered 1.
Thus, we know 1 is connected to 4. Now, we append 1 as well, to our Prüfer sequence, and remove our Prüfer sequence's first element as we have processed it already.
Our Prüfer sequence is now {4, 4, 5, 1}.
The next element in our Prüfer sequence is again 4, and the smallest leaf node yet to be processed is 2. Thus, we know that 2 is connected to 4 as well. Again, append 2 to the end of our Prüfer sequence, remove 4 from the beginning of our Prüfer sequence.
Our Prüfer sequence is now {4, 5, 1, 2}.
Now we process the next number of our Prüfer sequence, the third 4, and we can determine this must be connected to the node numbered 3 as it is the next node that is absent in our Prüfer sequence. Thus 3 is connected to 4. Now, append 3 and remove 4.
Our Prüfer sequence is now {5, 1, 2, 3}.
Now, the smallest node missing is node 4. This must then connect with node 5. Therefore, 4 is connected to 5. Append 4 and remove 5.
Our Prüfer sequence is now {1, 2, 3, 4}.
It is evident that now all elements of our initial Prüfer sequence have been replaced with their initial neighbours. The last number that we never considered, 6, would never have
been removed, and thus would have stayed connected with the last node we processed in our Prüfer sequence, which would be 5 here. Therefore 5 is connected to 6.
And there we have it! An algorithm that allows us to trace back our initial tree given the Prüfer sequence. Note that at no step did we face an ambiguity, thus allowing us state that this is a unique encoding for the given tree.
Now let's prove our second assertion - the cardinality of the set of all Prüfer sequences is n^(n-2). This is very simple to show, as in our Prüfer sequence, we do not process the last two nodes. Thus, as two nodes are not processed, and every other node has one corresponding node in the Prüfer sequence denoting its neighbour, we have shown that the Prüfer sequence is a set of length (n-2), where each element varies from 1 to n (inclusive of both). Thus, there are n^(n-2) possible variants of the set, and thus we have proven our assertion.
Therefore we have shown that a mapping of trees to Prüfer sequences is bijective, and since we have also shown the set of all Prüfer sequences has a cardinality of n^(n-2), we have shown that the number of trees of n nodes we can create will equal n^(n-2).