Given a text string and a pattern string, find all occurrences of the pattern in string.
In the 1st Suffix Tree Application (Substring Check), we saw how to check whether a given pattern is substring of a text or not. It is advised to go through Substring Check 1st.
In this article, we will go a bit further on same problem. If a pattern is substring of a text, then we will find all the positions on pattern in the text.
As a prerequisite, we must know how to build a suffix tree in one or the other way.
Here we will build suffix tree using Ukkonen’s Algorithm, discussed already as below:
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4
Ukkonen’s Suffix Tree Construction – Part 5
Ukkonen’s Suffix Tree Construction – Part 6
Lets look at following figure:
This is suffix tree for String “abcabxabcd$”, showing suffix indices and edge label indices (start, end). The (sub)string value on edges are shown only for explanatory purpose. We never store path label string in the tree.
Suffix Index of a path tells the index of a substring (starting from root) on that path.
Consider a path “bcd$” in above tree with suffix index 7. It tells that substrings b, bc, bcd, bcd$ are at index 7 in string.
Similarly path “bxabcd$” with suffix index 4 tells that substrings b, bx, bxa, bxab, bxabc, bxabcd, bxabcd$ are at index 4.
Similarly path “bcabxabcd$” with suffix index 1 tells that substrings b, bc, bca, bcab, bcabx, bcabxa, bcabxab, bcabxabc, bcabxabcd, bcabxabcd$ are at index 1.
If we see all the above three paths together, we can see that:
- Substring “b” is at indices 1, 4 and 7
- Substring “bc” is at indices 1 and 7
With above explanation, we should be able to see following:
- Substring “ab” is at indices 0, 3 and 6
- Substring “abc” is at indices 0 and 6
- Substring “c” is at indices 2 and 8
- Substring “xab” is at index 5
- Substring “d” is at index 9
- Substring “cd” is at index 8
Can you see how to find all the occurrences of a pattern in a string ?
- 1st of all, check if the given pattern really exists in string or not (As we did in Substring Check). For this, traverse the suffix tree against the pattern.
- If you find pattern in suffix tree (don’t fall off the tree), then traverse the subtree below that point and find all suffix indices on leaf nodes. All those suffix indices will be pattern indices in string