Suffix trees are very useful for many string processing problems. They
are data structures that are built up from strings and "really turn"
the string "inside out" [GK97]. This way they provide answer to a
lot of important questions in linear time (e.g. "Does text t contain
a word w?" in O(|w|), the length of the word). There are many
algorithms that can build a suffix tree in linear time.
The first algorithm was published by Weiner in 1973 [Wei73]. It reads
a string t from right to left and successively inserts suffixes
beginning with the shortest suffix, which leads to a very complex
algorithm. Following Giegerich and Kurtz [GK97] the algorithm "has no
practical value [...], but remains a true historic monument in the
area of string processing." I will therefore not bother with a deeper
study of this early algorithm.
Shortly after Weiner McCreight published his "algorithm M" in
[McC76]. The algorithm is explained in detail in section 3.
A newer idea and a different intuition lead to an on-line algorithm by
Ukkonen [Ukk95] that turns out to perform the same abstract operations
as McCreight's algorithm but with a different control structure, which
leads to the on-line ability but also to a slightly weaker performance
[GK97]. The algorithm is explained in detail in section 4.
I will begin with the explanation of the involved data structures
(2). I will mostly use the notation and definitions of [GK97]. The
next sections will explain the two algorithms (3),(4) and make some
remarks on the implementation (5).
At the end I will present two basic applications of suffix trees,
string assembling (6), and data compression (7), and present some
available alternatives to as well as some other applications for
suffix trees (8).
I will finish with a short conclusion (9).
|