Source code and tools
Our index structures and algorithms for aproximate pattern matching were implemented in the software library . The source code is available on GitHub.
We designed and implemented some tools for generating and analyzing test data for approximate pattern matching:
strip_headers | This tool preprocesses text files from the Project Gutenberg and strips for example the headers and footers. | ![]() |
![]() |
![]() |
![]() |
file_statistics | Efficient computation of some text statistics for a given file, including the text length, alphabet size, number of distinct q-grams and empirical entropy. | ![]() |
![]() |
![]() |
![]() |
generate_patterns | This tool extracts substrings from a text and generates a set of search patterns for approximate pattern matching. | ![]() |
![]() |
![]() |
![]() |
tt-analyze | Calculates efficiently some statistical properties of texts and estimates parameters for probability models implemented by tt-generate. | ![]() |
![]() |
![]() |
![]() |
tt-generate | This tool generates random texts using different models (such as markov chain, discrete autoregressive process, uniform distribution, or fibonacci word). | ![]() |
![]() |
![]() |
(Deprecated: The source code of the dissertation project of Johannes Krugel is available for download as zip file and has to be included as sandbox project in the folder sandbox/tum.)