Ferienakademie 1999
Kurs 2: Bäume - Algorithmik und Kombinatorik


Moritz Maaß
Suffix Trees and their Applications

Suffix trees are very useful for many string processing problems. They are data structures that are built up from strings and "really turn" the string "inside out" [GK97]. This way they provide answer to a lot of important questions in linear time (e.g. "Does text t contain a word w?" in O(|w|), the length of the word). There are many algorithms that can build a suffix tree in linear time. The first algorithm was published by Weiner in 1973 [Wei73]. It reads a string t from right to left and successively inserts suffixes beginning with the shortest suffix, which leads to a very complex algorithm. Following Giegerich and Kurtz [GK97] the algorithm "has no practical value [...], but remains a true historic monument in the area of string processing." I will therefore not bother with a deeper study of this early algorithm. Shortly after Weiner McCreight published his "algorithm M" in [McC76]. The algorithm is explained in detail in section 3. A newer idea and a different intuition lead to an on-line algorithm by Ukkonen [Ukk95] that turns out to perform the same abstract operations as McCreight's algorithm but with a different control structure, which leads to the on-line ability but also to a slightly weaker performance [GK97]. The algorithm is explained in detail in section 4. I will begin with the explanation of the involved data structures (2). I will mostly use the notation and definitions of [GK97]. The next sections will explain the two algorithms (3),(4) and make some remarks on the implementation (5). At the end I will present two basic applications of suffix trees, string assembling (6), and data compression (7), and present some available alternatives to as well as some other applications for suffix trees (8). I will finish with a short conclusion (9).

Suffix Tree Construction Applet References
  • Apo85
    Alberto Apostolico. The Myriad Virtues of Subword Trees. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI, pages 85-96. Springer, 1985.
  • GK97
    R. Giegerich and S. Kurtz. From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction. Algorithmica, 19:331-353, 1997.
  • KD95
    S. Rao Kosaraju and Arthur L. Delcher. Large-Scale Assembly of DNA Strings and Space-Efficient Construction of Suffix Trees. In Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 169-177. ACM, 1995.
  • McC76
    Edward M. McCreight. A Space-Economical Suffix Tree Construction Algorithm. J. ACM, 23(2):262-272, apr 1976.
  • MM93
    Udi Manber and Gene Myers. Suffix Arrays: A New Method for On-line String Searches. SIAM J. COMPUT., 22(5):935-948, oct 1993.
  • Sto95
    J. Stoye. Affixbäume. Master's thesis, Universität Bielefeld, may 1995.
  • Ukk95
    E. Ukkonen. On-Line Construction of Suffix Trees. Algorithmica, 14:249-260, 1995.
  • Wei73
    P. Weiner. Linear pattern matching. In IEEE 14th Annual Symposium on Switching and Automata Theory, pages 1-11. IEEE, 1973.