Centroid of a Tree is a node which if removed from the tree would split it into a ‘forest’, such that any tree in the forest would have at most half the number of vertices in the original tree.
Suppose there are n nodes in the tree. ‘Subtree size’ for a node is the size of the tree rooted at the node.
Let S(v) be size of subtree rooted at node v S(v) = 1 + ? S(u) Here u is a child to v (adjacent and at a depth one greater than the depth of v). Centroid is a node v such that, maximum(n - S(v), S(u1, S(u2, .. S(um) <= n/2 where ui is i'th child to v.
Finding the centroid
Let T be an undirected tree with n nodes. Choose any arbitrary node v in the tree. If v satisfies the mathematical definition for the centroid, we have our centroid. Else, we know that our mathematical inequality did not hold, and from this we conclude that there exists some u adjacent to v such that S(u) > n/2. We make that u our new v and recurse.
We never revisit a node because when we decided to move away from it to a node with subtree size greater than n/2, we sort of declared that it now belongs to the component with nodes less than n/2, and we shall never find our centroid there.
In any case we are moving towards the centroid. Also, there are finitely many vertices in the tree. The process must stop, and it will, at the desired vertex.
- Select arbitrary node v
- Start a DFS from v, and setup subtree sizes
- Re-position to node v (or start at any arbitrary v that belongs to the tree)
- Check mathematical condition of centroid for v
- If condition passed, return current node as centroid
- Else move to adjacent node with ‘greatest’ subtree size, and back to step 4
Theorem: Given a tree with n nodes, the centroid alway exists.
Proof: Clear from our approach to the problem that we can always find a centroid using above steps.
- Select arbitrary node v: O(1)
- DFS: O(n)
- Reposition to v: O(1)
- Find centroid: O(n)
Finding the centroid for a tree is a part of what we are trying to achieve here. We need to think how can we organize the tree into a structure that decreases the complexity for answering certain ‘type’ of queries.
- Make the centroid as the root of a new tree (which we will call as the ‘centroid tree’)
- Recursively decompose the trees in the resulting forest
- Make the centroids of these trees as children of the centroid which last split them.
The centroid tree has depth O(lg n), and can be constructed in O(n lg n), as we can find the centroid in O(n).
Let us consider a tree with 16 nodes. The figure has subtree sizes already set up using a DFS from node 1.
We make node 6 as the root of our centroid, and recurse on the 3 trees of the forest centroid split the original tree into.
NOTE: In the figure, subtrees generated by a centroid have been surrounded by a dotted line of the same color as the color of centroid.
We make the subsequently found centroids as the children to centroid that split them last, and obtain our centroid tree.
NOTE: The trees containing only a single element have the same element as their centroid. We haven’t used color differentiation for such trees, and the leaf nodes represent them.
6 4 1 2 3 5 7 8 9 11 10 12 14 13 15 16
Consider below example problem
Given a weighted tree with N nodes, find the minimum number of edges in a path of length K, or return -1 if such a path does not exist. 1 <= N <= 200000 1 <= length(i;j) <= 1000000 (integer weights) 1 <= K <= 1000000
Brute force solution: For every node, perform DFS to find distance and number of edges to every other node
Time complexity: O(N2) Obviously inefficient because N = 200000
We can solve above problem in O(N Log N) time using Centroid Decomposition.
- Perform centroid decomposition to get a "tree of subtrees"
- Start at the root of the decomposition, solve the problem for each subtree as follows
- Solve the problem for each "child tree" of the current subtree.
- Perform DFS from the centroid on the current subtree to compute the minimum edge count for paths that include the centroid
- Two cases: centroid at the end or in the middle of path
Use a timestamped array of size 1000000 to keep track of which distances from centroid are possible and the minimum edge count for that distance
Take the minimum of the above two
Time complexity of centroid decomposition based solution is O(n log n)
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above