The similarity between two given structures can be of vital interest
in various problem definitions.
Generally, there are many ways to measure the similarity - or distance
- between structures like strings, trees, or graphs. In this paper we will
be particularly concerned about trees where we can employ measures like
the largest common subtree, the smallest common supertree, tree edit
distance, transferable ratio, or tree alignment.
For the problem of RNA secondary structure analysis, which is the main
motivation for dealing with this problem, tree edit distance
and tree alignment are good and sensible measures.
The RNA secondary structure can be represented by an ordered labeled
tree
with the five component types: stem (S), hairpin (H), bulge (B),
interior loop (I), and multi-branch loop (M).
Because different sequences of RNA can produce similar secondary
structures it is necessary for understanding their comparative
functionality to measure the similarity at the level of the secondary,
tree like structure.
Additionally, the secondary structure influences translation rates from
RNA to proteins.
Previously, these structures were represented as parenthesized strings
and
string distance metrics were used. Although this works in some cases,
the problem in trees is harder because ancestor relationships must be
preserved and certain prefixes conditions of strings can no longer be
exploited.
In this paper we will describe, proof, and analyze
a simple dynamic programming algorithm for solving the tree edit
distance
problemn in time
O(|T1|*|T_2|*min(depth(T1),
leaves(T1))*min(depth(T2),leaves(T2)))
and space
O(|T1|*|T2|).
Certain extensions for improving these results are also mentioned.
Furthermore, we will give a short introduction and overview of
tree alignment and how it works.
References
|