The Longest Common Extension (LCE) problem considers a string s and computes, for each pair (L , R), the longest sub string of s that starts at both L and R. In LCE, in each of the query we have to answer the length of the longest common prefix starting at indexes L and R.
String : “abbababba”
Queries: LCE(1, 2), LCE(1, 6) and LCE(0, 5)
Find the length of the Longest Common Prefix starting at index given as, (1, 2), (1, 6) and (0, 5).
The string highlighted “green” are the longest common prefix starting at index- L and R of the respective queries. We have to find the length of the longest common prefix starting at index- (1, 2), (1, 6) and (0, 5).
In Set 1, we explained about the naive method to find the length of the LCE of a string on many queries. In this set we will show how a LCE problem can be reduced to a RMQ problem, hence decreasing the asymptotic time complexity of the naive method.
Reduction of LCE to RMQ
Let the input string be S and queries be of the formLCE(L, R). Let the suffix array for s be Suff and the lcp array be lcp.
The longest common extension between two suffixes SL and SR of S can be obtained from the lcp array in the following way.
- Let low be the rank of SL among the suffixes of S (that is, Suff[low] = L).
- Let high be the rank of SR among the suffixes of S. Without loss of generality, we assume that low < high.
- Then the longest common extension of SL and SR is lcp(low, high) = min (low<=k< high)lcp [k].
Proof: Let SL = SL…SL+C…sn and SR = SR…SR+c…sn, and let c be the longest common extension of SL and SR(i.e. SL…SL+C-1 = sn…SR+c-1). We assume that the string S has a sentinel character so that no suffix of S is a prefix of any other suffix of S but itself.
- If low = high – 1 then i = low and lcp[low] = c is the longest common extension of SL and SR and we are done.
- If low < high -1 then select i such lcp[i] is the minimum value in the interval [low, high] of the lcp array. We then have two possible cases:
- If c < lcp[i] we have a contradiction because SL . . . SL+lcp[i]-1 = SR. . . SR+lcp[i]-1 by the definition of the LCP table, and the fact that the entries of lcp correspond to sorted suffixes of S.
- if c > lcp[i], let high = Suff[i], so that Shigh is the suffix associated with position i. Si is such that shigh . . . shigh+lcp[i]-1 = SL . . . SL+lcp[i]-1 and shigh . . . shigh+lcp[i]-1 = SR . . . SR+lcp[i]-1, but since SL . . . SL+c-1 = SR. . . SR+c-1 we have that the lcp array should be wrongly sorted which is a contradiction.
Therefore we have c = lcp[i]
Thus we have reduced our longest common extension query to a range minimum-query over a range in lcp.
- To find low and high, we must have to compute the suffix array first and then from the suffix array we compute the inverse suffix array.
- We also need lcp array, hence we use Kasai’s Algorithm to find lcp array from the suffix array.
- Once the above things are done, we simply find the minimum value in lcp array from index – low to high (as proved above) for each query.
The minimum value is the length of the LCE for that query.
LCE (1, 2) = 1 LCE (1, 6) = 3 LCE (0, 5) = 4
Analysis of Reduction to RMQ method
Time Complexity :
- To construct the lcp and the suffix array it takes O(N.logN) time.
- To answer each query it takes O(|invSuff[R] – invSuff[L]|).
Hence the overall time complexity is O(N.logN + Q. (|invSuff[R] – invSuff[L]|))
Q = Number of LCE Queries.
N = Length of the input string.
invSuff = Inverse suffix array of the input string.
Although this may seems like an inefficient algorithm but this algorithm generally outperforms all other algorithms to answer the LCE queries.
We will give a detail description of the performance of this method in the next set.
Auxiliary Space: We use O(N) auxiliary space to store lcp, suffix and inverse suffix arrays.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.