Fakultät für Informatik der Technischen Universität MünchenLehrstuhl für Effiziente Algorithmen |
Das Seminar gliedert sich in 3 Themenbereiche.
In ersten Teil sollen Verfahren zur Datengewinnung aus semi-strukturieren Dokumenten erarbeitet werden.
Der zweite Bereich befaßt sich mit der Überprüfung der gewonnenen Daten aus Teil 1 auf Richtigkeit.
Als letztes geht es um Algorithmen zur exakten und fehlertoleranten Suche in Texten.
Datengenerierung aus Dokumenten
Datenvalidierung
Algorithmen fuer Textdatenbanken
Exakte Suche
[AD98] | Brad Adelberg. NoDoSE A tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. SIGMOD Record 27. 1998 | |
[AK97] | Naveen Ashish , Craig A. Knoblock. Wrapper generation for semi-structured Internet sources. ACM SIGMOD Record December 1997.Volume 26 Issue 4 | |
[AM98] | Gustavo O. Arocena, Alberto O. Mendelzon. WebOQL: Restructuring Documents, Databases, and Webs. Proceedings if the 14th IEEE International Conference on Data Engineering. 1998 | |
[BC92] | Ricardo Baeza-Yates , Gaston H. Gonnet. A new approach to text searching. Communications of the ACM October 1992. Volume 35 Issue 10 | |
[BM77] | Robert S. Boyer , J. Strother Moore. A fast string searching algorithm. Communications of the ACM October 1977. Volume 20 Issue 10 | |
[CM98] | Valter Crescenzi, Giansalvatore Mecca. Grammars Have Exceptions. Information Systems. 1998 | |
[EJN99] | D.W. Embley, Y.S. Jiang, Y.-K. Ng. Record-Boundary Discovery in Web Documents. Proceedings ACM SIGMOD International Conference on Management of Data. 1999. | |
[ETL] | David W. Embley , Cui Tao , Stephen W. Liddle. Automatically Extracting Ontologically Specified Data from HTML Tables of Unknown Structure. Volume 2503, Issue , pp 322-337. Lecture Notes in Computer Science | |
[FR02] | Friedl, Jeffrey E. F. Mastering Regular Expressions. O'REILLY 2002 | |
[GUS97] | Dan Gusfield. Algorithms on strings, trees, and sequences. Cambridge University Press. 1997 | |
[HE89] | Herrmann, Eric C. Mastering Perl 5. Sybex. 1989. | |
[KMP77] | Knuth, D. E., Morris, J. H., Pratt, V.R. Fast pattern matching in strings, SIAM J. Comput. 6, 2 (1977), 323-350 | |
[KU00] | Nicholas Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence Journal 118. 2000 | |
[LM02] | Ming Li, Bin Ma. On The Closest String and Substring Problems. Journal of the ACM. 2002 | |
[LPH00] | Ling Liu, Calton Pu, Wei Han. XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. Proceedings of the 16th IEEE International Conference on Data Engeneering. 2000 | |
[LR02] | Alberto H. F. Laender , Berthier A. Ribeiro-Neto , Altigran S. da Silva , Juliana S. Teixeira. A brief survey of web data extraction tools. ACM SIGMOD Record. June 2002 | |
[MC76] | Edward M. McCreight. A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM (JACM) April 1976. Volume 23 Issue 2 | |
[NA01] | Gonzalo Navarro. A guided tour to approximate string matching. ACM Computing Surveys (CSUR) March 2001. Volume 33 Issue 1 | |
[RLS99] | Berthier Ribeiro-Neto, Alberto H.F. Laender, Altigran S. da Silva. Extracting Semi-Structured Data Through Examples. Proceedings of the 8th ACM International Conference on Information and Knowledge Management. 1999 | |
[UK95] | E. Ukkonen. On-Line Construction of Suffix Trees. Algorithmica, 14, 1995 | |
[W3C] | World Wide Web Consortium. W3C. The Document Object Model. http://www.w3.org/DOM | |
[WC76] | C. K. Wong , Ashok K. Chandra. Bounds for the String Editing Problem. Journal of the ACM Volume 23, Issue 1. January 1976 | |
[WE73] | P. Weiner. Linear pattern matching. IEEE 14th Annual Symposium on Switching and Automata Therory. IEEE, 1973 | |
[WE93] | Wegener, Ingo. Theoretische Informatik. B.G. Teubner. Stuttgart 1993 | |
[WF74] | Robert A. Wagner , Michael J. Fischer. The String-to-String Correction Problem. Journal of the ACM Volume 21, Issue 1. January 1974 | |
[WM92] | Wu, S.. and Manber. U. Agrep - A fast approximate pattern-matching tool. Usenix Winter 1992 Technical Conference (San Francisco Jan. 1992), pp. 153-162 | |
[WM92-1] | Sun Wu , Udi Manber. Fast text searching allowing errors. Communications of the ACM October 1992. Volume 35 Issue 10 | |
Die ausführliche Themenliste ist auch als PostScript-Datei verfügbar.
Stand: 15.01.2003 (Stefan Pfingstl)