Tarjan’s off-line lowest common ancestors algorithm

Prerequisite : LCA basics, Disjoint Set Union by Rank and Path Compression

We are given a tree(can be extended to a DAG) and we have many queries of form LCA(u, v), i.e., find LCA of nodes ‘u’ and ‘v’.

We can perform those queries in O(N + QlogN) time using RMQ, where O(N) time for pre-processing and O(log N) for answering the queries, where
N = number of nodes and
Q = number of queries to be answered.

Can we do better than this? Can we do in linear(almost) time? Yes.
The article presents an offline algorithm which performs those queries in approximately O(N + Q) time. Although, this is not exactly linear, as there is an Inverse Ackermann function involved in the time complexity analysis. For more details on Inverse Ackermann function see this. Just as a summary, we can say that the Inverse Ackermann Function remains less than 4, for any value of input size that can be written in physical inverse. Thus, we consider this as almost linear.

We consider the input tree as shown below. We will Pre-Process the tree and fill two arrays- child[] and sibling[] according to the below explanation-

Let we want to process these queries- LCA(5,4), LCA(1,3), LCA(2,3)

Now, after pre-processing, we perform a LCA walk starting from the root of the tree(here- node ‘1’). But prior to the LCA walk, we colour all the nodes with WHITE. During the whole LCA walk, we use three disjoint set union functions- makeSet(), findSet(), unionSet().
These functions use the technique of union by rank and path compression to improve the running time. During the LCA walk, our queries gets processed and outputted (in a random order). After the LCA walk of the whole tree, all the nodes gets coloured BLACK.

Tarjan Offline LCA Algorithm steps from CLRS, Section-21-3, Pg 584, 2nd /3rd edition.

Note- The queries may not be processed in the original order. We can easily modify the process and sort them according to the input order.

The below pictures clearly depict all the steps happening. The red arrow shows the direction of travel of our recursive function LCA().





As, we can clearly see from the above pictures, the queries are processed in the following order, LCA(5,4), LCA(2,3), LCA(1,3) which is not in the same order as the input(LCA(5,4), LCA(1,3), LCA(2,3)).

Below is C++ implementation.

// A C++ Program to implement Tarjan Offline LCA Algorithm
#include <bits/stdc++.h>
#define V 5       // number of nodes in input tree
#define WHITE 1   // COLOUR 'WHITE' is assigned value 1
#define BLACK 2   // COLOUR 'BLACK' is assigned value 2
/* A binary tree node has data, pointer to left child
   and a pointer to right child */
struct Node
    int data;
    Node* left, *right;
 subset[i].parent-->Holds the parent of node-'i'
 subset[i].rank-->Holds the rank of node-'i'
 subset[i].ancestor-->Holds the LCA queries answers
 subset[i].child-->Holds one of the child of node-'i'
                    if present, else -'0'
 subset[i].sibling-->Holds the right-sibling of node-'i'
                    if present, else -'0'
 subset[i].color-->Holds the colour of node-'i'
struct subset
    int parent, rank, ancestor, child, sibling, color;
// Structure to represent a query
// A query consists of (L,R) and we will process the
// queries offline a/c to Tarjan's oflline LCA algorithm
struct Query
    int L, R;
/* Helper function that allocates a new node with the
   given data and NULL left and right pointers. */
Node* newNode(int data)
    Node* node = new Node;
    node->data = data;
    node->left = node->right = NULL;
//A utility function to make set
void makeSet(struct subset subsets[], int i)
    if (i < 1 || i > V)
    subsets[i].color = WHITE;
    subsets[i].parent = i;
    subsets[i].rank = 0;
// A utility function to find set of an element i
// (uses path compression technique)
int findSet(struct subset subsets[], int i)
    // find root and make root as parent of i (path compression)
    if (subsets[i].parent != i)
        subsets[i].parent = findSet (subsets, subsets[i].parent);
    return subsets[i].parent;
// A function that does union of two sets of x and y
// (uses union by rank)
void unionSet(struct subset subsets[], int x, int y)
    int xroot = findSet (subsets, x);
    int yroot = findSet (subsets, y);
    // Attach smaller rank tree under root of high rank tree
    // (Union by Rank)
    if (subsets[xroot].rank < subsets[yroot].rank)
        subsets[xroot].parent = yroot;
    else if (subsets[xroot].rank > subsets[yroot].rank)
        subsets[yroot].parent = xroot;
    // If ranks are same, then make one as root and increment
    // its rank by one
        subsets[yroot].parent = xroot;
// The main function that prints LCAs. u is root's data.
// m is size of q[]
void lcaWalk(int u, struct Query q[], int m,
             struct subset subsets[])
    // Make Sets
    makeSet(subsets, u);
    // Initially, each node's ancestor is the node
    // itself.
    subsets[findSet(subsets, u)].ancestor = u;
    int child = subsets[u].child;
    // This while loop doesn't run for more than 2 times
    // as there can be at max. two children of a node
    while (child != 0)
        lcaWalk(child, q, m, subsets);
        unionSet (subsets, u, child);
        subsets[findSet(subsets, u)].ancestor = u;
        child = subsets[child].sibling;
    subsets[u].color = BLACK;
    for (int i = 0; i < m; i++)
        if (q[i].L == u)
            if (subsets[q[i].R].color == BLACK)
                printf("LCA(%d %d) -> %d ",
        else if (q[i].R == u)
            if (subsets[q[i].L].color == BLACK)
                printf("LCA(%d %d) -> %d ",
// This is basically an inorder traversal and
// we preprocess the arrays-> child[]
// and sibling[] in "struct subset" with
// the tree structure using this function.
void preprocess(Node * node, struct subset subsets[])
    if (node == NULL)
    // Recur on left child
    preprocess(node->left, subsets);
    if (node->left != NULL&&node->right != NULL)
        /* Note that the below two lines can also be this-
        subsets[node->data].child = node->right->data;
        subsets[node->right->data].sibling =
        This is because if both left and right children of
        node-'i' are present then we can store any of them
        in subsets[i].child and correspondingly its sibling*/
        subsets[node->data].child = node->left->data;
        subsets[node->left->data].sibling =
    else if ((node->left != NULL && node->right == NULL)
             || (node->left == NULL && node->right != NULL))
        if(node->left != NULL && node->right == NULL)
            subsets[node->data].child = node->left->data;
            subsets[node->data].child = node->right->data;
    //Recur on right child
    preprocess (node->right, subsets);
// A function to initialise prior to pre-processing and
// LCA walk
void initialise(struct subset subsets[])
    // Initialising the structure with 0's
    memset(subsets, 0, (V+1) * sizeof(struct subset));
    // We colour all nodes WHITE before LCA Walk.
    for (int i=1; i<=V; i++)
// Prints LCAs for given queries q[0..m-1] in a tree
// with given root
void printLCAs(Node *root, Query q[], int m)
    // Allocate memory for V subsets and nodes
    struct subset * subsets = new subset[V+1];
    // Creates subsets and colors them WHITE
    // Preprocess the tree
    preprocess(root, subsets);
    // Perform a tree walk to process the LCA queries
    // offline
    lcaWalk(root->data , q, m, subsets);
// Driver program to test above functions
int main()
     We construct a binary tree :-
         2    3
      4    5        */
    Node *root = newNode(1);
    root->left        = newNode(2);
    root->right       = newNode(3);
    root->left->left  = newNode(4);
    root->left->right = newNode(5);
    // LCA Queries to answer
    Query q[] = {{5, 4}, {1, 3}, {2, 3}};
    int m = sizeof(q)/sizeof(q[0]);
    printLCAs(root, q, m);
    return 0;

Output :

LCA(5 4) -> 2
LCA(2 3) -> 1
LCA(1 3) -> 1

Time Complexity : Super-linear, i.e- barely slower than linear. O(N + Q) time, where O(N) time for pre-processing and almost O(1) time for answering the queries.

Auxiliary Space : We use a many arrays- parent[], rank[], ancestor[] which are used in Disjoint Set Union Operations each with the size equal to the number of nodes. We also use the arrays- child[], sibling[], color[] which are useful in this offline algorithm. Hence, we use O(N).
For convenience, all these arrays are put up in a structure- struct subset to hold these arrays.

CLRS, Section-21-3, Pg 584, 2nd /3rd edition

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above

This article is attributed to GeeksforGeeks.org

leave a comment



load comments

Subscribe to Our Newsletter