Practical Approximate Pattern Matching with Index Structures
Project
In this research project we apply the whole spectrum of algorithm engineering to index structures for approximate pattern matching. In contrast to the case of exact searching there is a big gap between theory and practice for the approximate case. We want to close this gap with our project. Some of the approaches which yield theoretical advances, seem to be not applicable in practice due to big constants.
A library of pattern matching algorithms would be of great benefit for other researches and computer scientists, providing efficient methods for index-based approximate pattern matching. The library is supposed to be an interface for the ongoing progress in the field of indexes and algorithms. The implementations form a working basis for the further algorithm engineering process, but as well can be used by e.g. biologists to solve real world problems, like analyzing biological data. The library is supposed to offer index structures and search algorithms, developed and optimized espacially considering the following aspects:
- Efficient applicability to for big, real world problem instances, while using general and flexible error models.
- Index structures and search algorithms in secondary memory.
- Distributed index structures and search algorithms.
- Cache-efficient search algorithms.
Members
- Prof. Dr. Ernst W. Mayr (project leader)
- Dr. Hanjo Täubig
- Johannes Krugel
Former Members
- Stefan Eckhardt
- Johannes Nowak
- Moritz Maaß
- Andreas Mühling
- Yuxiang Chen (student assistant)
- André Dau (student assistant)
Student Theses
- 2013-01-14, Bachelor thesis, Nobert Schmidbartl: Fehlertolerante Volltextsuche mit erweiterten Ähnlichkeitsmaßen
- 2012-09-19, Bachelor thesis, Kristina Bayer: Fehlertolerante Suche mittels Backtracking – Ausführung auf dem Enhanced Suffix Array und Abschätzung des Suchaufwands
- 2012-01-16, Bachelor thesis, Johannes Merkle: Space efficient q-gram indexes
- 2011-11-15, Bachelor thesis, Alexander Aumann: Implementation and comparison of suffix tree representations
- 2011-06-15, Master thesis, Tobias Stadler: Construction of compressed indexes for huge texts
- 2011-02-18, System development project, İlker Hatipoğlu: Construction of suffix trees for huge texts
- 2010-11-15, Bachelor thesis, Henrik Poppe: Approximate search in text indexes
- 2010-04-30, Bachelor thesis, André Dau: Analysis of the structure and statistical properties of texts and generation of random texts