Ferienakademie 1999
Kurs 2: Bäume - Algorithmik und Kombinatorik


Carsten Isert
The Editing Distance Between Trees

The similarity between two given structures can be of vital interest in various problem definitions. Generally, there are many ways to measure the similarity - or distance - between structures like strings, trees, or graphs. In this paper we will be particularly concerned about trees where we can employ measures like the largest common subtree, the smallest common supertree, tree edit distance, transferable ratio, or tree alignment. For the problem of RNA secondary structure analysis, which is the main motivation for dealing with this problem, tree edit distance and tree alignment are good and sensible measures. The RNA secondary structure can be represented by an ordered labeled tree with the five component types: stem (S), hairpin (H), bulge (B), interior loop (I), and multi-branch loop (M). Because different sequences of RNA can produce similar secondary structures it is necessary for understanding their comparative functionality to measure the similarity at the level of the secondary, tree like structure. Additionally, the secondary structure influences translation rates from RNA to proteins. Previously, these structures were represented as parenthesized strings and string distance metrics were used. Although this works in some cases, the problem in trees is harder because ancestor relationships must be preserved and certain prefixes conditions of strings can no longer be exploited. In this paper we will describe, proof, and analyze a simple dynamic programming algorithm for solving the tree edit distance problemn in time O(|T1|*|T_2|*min(depth(T1), leaves(T1))*min(depth(T2),leaves(T2))) and space O(|T1|*|T2|). Certain extensions for improving these results are also mentioned. Furthermore, we will give a short introduction and overview of tree alignment and how it works.

  • Paper (gzipped ps-file, 95kB)
  • Slides (gzipped ps-file, 92kB)
References
  • Kaizhong Zhang and Dennis Shasha: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18:1245-1262, 1989.
  • Dennis Shasha and Kaizhong Zhang: Fast algorithms for the unit cost editing distance between trees. J. Algorithms 11:581-621, 1990.
  • Tao Jiang, Lusheng Wang, and Kaizhong Zhang: Alignment of trees - an alternative to tree edit. Theor. Comput. Sci. 143(1):137-148, 1995.