Source code and tools
Our index structures and algorithms for aproximate pattern matching were implemented in the software library . The source code is available on GitHub.
We designed and implemented some tools for generating and analyzing test data for approximate pattern matching:
strip_headers | This tool preprocesses text files from the Project Gutenberg and strips for example the headers and footers. | documentation | source | Win32 | Mac |
file_statistics | Efficient computation of some text statistics for a given file, including the text length, alphabet size, number of distinct q-grams and empirical entropy. | documentation | source | Win32 | Mac |
generate_patterns | This tool extracts substrings from a text and generates a set of search patterns for approximate pattern matching. | documentation | source | Win32 | Mac |
tt-analyze | Calculates efficiently some statistical properties of texts and estimates parameters for probability models implemented by tt-generate. | documentation | source | Win32 | Mac |
tt-generate | This tool generates random texts using different models (such as markov chain, discrete autoregressive process, uniform distribution, or fibonacci word). | documentation | Win32 | Mac |
(Deprecated: The source code of the dissertation project of Johannes Krugel is available for download as zip file and has to be included as sandbox project in the folder sandbox/tum.)