Efficient Learning of One-Variable Pattern Languages from Positive Data

Thomas Erlebach, Peter Rossmanith, Hans Stadtherr, Angelika Steger, and Thomas Zeugmann

DOI Technical Report DOI-TR-128
Department of Informatics
Kyushu University
Fukuoka, Japan, December 1996


Abstract. A pattern is a finite string of constant and variable symbols. The language generated by a pattern is the set of all strings of constant symbols which can be obtained from the pattern by substituting non-empty strings for variables. Descriptive patterns are a key concept for inductive inference of pattern languages. A pattern p is descriptive for a given sample if the sample is contained in the language L(p) generated by p and no other pattern having this property generates a proper subset of the language L(p). The best previously known algorithm for computing descriptive one-variable patterns requires time O(n4*log(n)), where n is the size of the sample. We present a simpler and more efficient algorithm solving the same problem in time O(n2*log(n)). In addition, we give a parallel version of this algorithm that requires time O(log(n)) and O(n3/log(n)) processors on an EREW-PRAM. Previously, no parallel algorithm was known for this problem.

Using a modified version of the algorithm computing descriptive one-variable patterns as a subroutine, we devise a learning algorithm for one-variable patterns whose expected total learning time is O(l2*log(l)) provided the sample strings are drawn from the target language according to a probability distribution with expected string length l. The probability distribution must be such that strings of equal length have equal probability, but can be arbitrary otherwise. Furthermore, we show how the algorithm for descriptive one-variable patterns can be used for learning one-variable patterns with a polynomial number of superset queries.


Hans Stadtherr, 1997/09/16