ILK - Publications with abstracts 

2000 | 1999 | 1998 | 1997 | 1996 | 1995 | 1994 | 1993 | 1992

2000 

ILK-0002

Integrating seed names and n-grams for a named entity list and classifier

Author(s): Sabine Buchholz and Antal van den Bosch
Reference: In: Proceedings of LREC-2000, Athens, Greece, June 2000, pp. 1215-1221.

[Postscript, with corrections]

We present a method for building a named-entity list and machine-learned named-entity classifier from a corpus of Dutch newspaper text, a rule-based named entity recognizer, and labeled seed name lists taken from the internet. The seed names, labeled either as PERSON, LOCATION, ORGANIZATION, or ADJECTIVAL name, are looked up in a 83-million word corpus, and their immediate contexts are stored as instances of their label. The latter 8-grams are used by a decision-tree learning algorithm that, after training, (i) can produce high-precision labeling of instances to be added to the seed lists, and (ii) more generally labels new, unseen names. Unlabeled named-entity types are labeled with a precision of 61% and a recall of 56%; aiming at optimizing precision, an overall precision of 83% can be obtained (a top precision of 88% on PERSON). On free text, named-entity token labeling accuracy is 71%.

ILK-0004

Unpacking multi-valued symbolic features and classes in memory-based language learning

Author(s): Antal van den Bosch and Jakub Zavrel
Referemce: In P. Langley (Ed.), Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1055-1062. San Francisco, CA: Morgan Kaufmann, 2000.

[Postscript]

In supervised machine-learning applications to natural language processing, tasks are typically formulated as classification tasks mapping multi-valued features to multi-valued classes. Memory-based or instance-based learning algorithms are suited for such representations, but they are not restricted to them; both features and classes may be unpacked in binary values. We demonstrate in a matrix of empirical tests on a range of natural language learning tasks that when using k=1 in the k-NN classifier kernel, binary unpacking of features and classes tends to be harmful to generalization accuracy. Unpacking features and classes causes the kernel classifier to rely on smaller sets of nearest neighbors, which generally leads to more misclassifications; only when the data is not sparse in the multi-valued case (when the average number of equidistant nearest neighbors is well above a handful), unpacking can lead to improved generalization accuracy.

ILK-0006

A distributed, yet symbolic model of text-to-speech processing

Author(s): Antal van den Bosch and Walter Daelemans
Reference: In P. Broeder and J.M.J. Murre (Eds.), Models of Language Acquisition: inductive and deductive approaches. Oxford University Press, 76-99, 2000.

[Postscript of preprint]

In this paper, a data-oriented model of text-to-speech processing is described. On the basis of a large text-to-speech corpus, the model automatically gathers a distributed, yet symbolic representation of subword-phoneme association knowledge, representing this knowledge in the form of paths in a decision tree. Paths represent context-sensitive rewrite rules which unambiguously map strings of letters onto single phonemes. The more ambiguous the mapping is, the larger the stored context. The knowledge needed for converting a spelling word to its phonemic transcription is thus represented in a distributed fashion: many different paths contribute to the phonemisation of a word, and a single path may contribute to phonemisations of many words. Some intrinsic properties of the data-oriented model are shown to have relations with psycholinguistic concepts such as a language's orthographic depth, and word pronunciation consistency.  

1999 

ILK-9902

Forgetting exceptions is harmful in language learning

Author(s): Walter Daelemans, Antal van den Bosch, and Jakub Zavrel.
Reference: Machine Learning, special issue on natural language learning, 34, pp. 11-43, 1999.

Preprint postscript

We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.

ILK-9903

Recent Advances in Memory-Based Part-of-Speech Tagging

Author(s): Jakub Zavrel and Walter Daelemans.
Reference: VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590-597, 1999.

Postscript, MS-Word

Memory-based learning algorithms are lazy learners. Examples of a task are stored in memory and processing is largely postponed to the time when new instances of the task need to be solved. This is then done by extrapolating directly from those remembered instances which are most similar to the present ones. Using memory-based learning for Part-of-Speech tagging has a number of advantages over traditional statistical POS taggers: (i) there is no need for an additional smoothing component for sparse data, (ii) even low-frequent or exceptional patterns can contribute to generalization, (iii) the use of a weighted similarity metric allows for an easy integration of different information sources, and (iv) both development time and processing speed are very fast (in the order of hours and thousands of words/sec, respectively). In recent work, we have applied the Memory-Based tagger (MBT) to a number of different languages and corpora (English, Dutch, Czech, Swedish, and Spanish). Furthermore, we have performed a controlled experimental comparison of MBT with several other POS tagging algorithms.

ILK-9904

Interpreting knowledge representations in BP-SOM

Author(s): Ton Weijters and Antal van den Bosch.
Reference: Behaviormetrika, 26:1, pp. 107-128, 1999.

Artificial Neural Networks (ANNs) are able, in general and in principle, to learn complex tasks. Interpretation of models induced by ANNs, however, is often extremely difficult due to the non-linear and non-symbolic nature of the models. To enable better interpretation of the way knowledge is represented in ANNs, we present BP-SOM, a neural network architecture and learning algorithm. BP-SOM is a combination of a multi-layered feed-forward network (MFN) trained with the back-propagation learning rule (BP), and Kohonen's self-organising maps (SOMs). The involvement of the SOM in learning leads to highly structured knowledge representations both at the hidden layer and on the SOMs. We focus on a particular phenomenon within trained BP-SOM networks, viz. that the SOM part acts as an organiser of the leaqrning material into instance subsets that tend to be homogeneous with respect to both class labelling and subsets of attribute values. We show that the structured knowledge representation can either be exploited directly for rule extraction, or be used to explain a generic type of checksum solution found by the network for learning M-of-N tasks.

ILK-9906

Toward an exemplar-based computational model for cognitive grammar

Author(s): Walter Daelemans.
Reference: In Johan van der Auwera, Frank Durieux, and Ludo Lejeune (Eds.) English as a Human Language. To honour Louis Goossens. Munchen: LINCOM Europa, 73-82, 1998.

An exemplar-based computational framework is presented which is compatible with Cognitive Grammar. In an exemplar-based approach, language acquisition is modeled as the incremental, data-oriented storage of experiential patterns, and language performance as the extrapolation of information from those stored patterns on the basis of a language-independent information-theoretic similarity metric. We show that this simple architecture works for many aspects of phonological, morphological, and morphosyntactic acquisition and processing. Furthermore, we sketch how the approach may also work for syntactic processing. A central insight of the approach, based on the results of computational modeling experiments, is that abstraction of representations is not only unnecessary to achieve generalization (i.e. to make the system productive, and to make it go `beyond' the learned patterns), but even harmful, and that useful language-independent metrics can be found for defining similarity in the context of language processing.

ILK-9907

Memory-Based Shallow Parsing

Author(s): Walter Daelemans, Sabine Buchholz, Jorn Veenstra.
Reference: To appear in: Proceedings of CoNLL-99, Bergen, Norway, June 12, 1999.

Postscript

We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memory-based modules. The experiments reported in this paper show competitive results, the F-beta value for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% for subject detection and 79.0% for object detection.

ILK-9908

Cascaded Grammatical Relation Assignment

Author(s): Sabine Buchholz, Jorn Veenstra, Walter Daelemans.
Reference: To appear in: Proceedings of EMNLP/VLC-99, University of Maryland, USA, June 21-22, 1999.

Postscript

In this paper we discuss cascaded Memory-Based grammatical relations assignment. In the first stages of the cascade, we find chunks of several types (NP,VP,ADJP,ADVP,PP) and label them with their adverbial function (e.g. local, temporal). In the last stage, we assign grammatical relations to pairs of chunks. We studied the effect of adding several levels to this cascaded classifier and we found that even the less performing chunkers enhanced the performance of the relation finder.

ILK-9909

Memory-based morphological analysis

Author(s): Antal van den Bosch and Walter Daelemans.
Reference: In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, ACL'99, University of Maryland, USA, June 20-26, 1999, pp. 285-292.

Postscript

We present a general architecture for efficient and deterministic morphological analysis based on memory-based learning, and apply it to morphological analysis of Dutch. The system makes direct mappings from letters in context to rich categories that encode morphological boundaries, syntactic class labels, and spelling changes. Both precision and recall of labeled morphemes are over 84% on held-out dictionary test words and estimated to be over 93% in free text.

ILK-9910

Instance-family abstraction in memory-based language learning

Author(s): Antal van den Bosch.
Reference: In: I. Bratko and S. Dzeroski (Eds.), Machine Learning: Proceedings of the Sixteenth International Conference, ICML'99, Bled, Slovenia, June 27-30, 1999, pp. 39-48.

Postscript

Memory-based learning appears relatively successful when the learning data is highly disjunct, i.e., when classes are scattered over many small families of instances in instance space, as in many language learning tasks. Abstraction over borders of disjuncts tends to harm generalization performance. However, careful abstraction in memory-based learning may be harmless when it preserves the disjunctivity of the learning data. We investigate the effect of careful abstraction in a series of language-learning task studies, and a small benchmark-task study. We find that when combined with feature weighting or value-distance metrics, careful abstraction, as implemented in the new FAMBL algorithm, can equal the generalization accuracies of pure memory-based learning, while attaining fair levels of memory compression.

ILK-9911

Machine learning of word pronunciation: the case against abstraction

Author(s): Bertjan Busser, Walter Daelemans, Antal van den Bosch.
Reference: In Proceedings of the Sixth European Conference on Speech Communication and Technology, Eurospeech99, Budapest, Hungary, Sept. 5-10, 1999, pp. 2123-2126.

Postscript

Word pronunciation can be learned by inductive machine learning algorithms when it is represented as a classification task: classify a letter within its local word context as mapping to its pronunciation. On the basis of generalization accuracy results from empirical studies, we argue that word pronunciation, particularly in complex spelling systems such as that of English, should not be modelled in a way that abstracts from exceptions. Learning methods such as decision tree and backpropagation learning, while trying to abstract from noise, also throw away a large number of useful exceptional cases. Our empirical results suggest that a memory-based approach which stores all available word-pronunciation knowledge as cases in memory, and generalises from this lexicon via analogical reasoning, is at all times the optimal modelling method.

ILK-9912

Memory-based language processing

Editor: Walter Daelemans.
Reference: Journal for Experimental and Theoretical Artificial Intelligence, special issue. 11:3, pp. 287-467.

Website

Memory-Based Language Processing (MBLP) views language processing as being based on the direct reuse of previous experience rather than on the use of abstractions extracted from that experience. In such a framework, language acquisition is modeled as the storage of exemplars, and language processing as similarity-based reasoning.

MBLP derives from work in Artificial Intelligence (case-based reasoning, memory-based reasoning, instance-based learning, lazy learning), Linguistics (analogical modeling), Computational Linguistics (example-based machine translation, case-based language processing, data-oriented parsing), and Statistical Pattern Recognition (k-nn models). In recent research, it has been shown that the application of algorithms based on this framework leads to accurate and efficient language models in diverse language processing areas (phonology, morphology, syntax, semantics, discourse).

The idea for this special issue originated at the Corsendonk workshop on memory-based language processing organized by Walter Daelemans and Steven Gillis, December 1997.

ILK-9913

Careful abstraction from instance families in memory-based language learning

Author(s): Antal van den Bosch.
Reference: Journal for Experimental and Theoretical Artificial Intelligence, 11:3, pp. 339-368.

Website, postscript of preprint

Empirical studies in inductive language learning point at pure memory-based learning as a successful approach to many language learning tasks, often performing better than learning methods that abstract from the learning material. The possibility is left open, however, that limited, careful abstraction in memory-based learning may be harmless to generalization, as long as the disjunctivity of language data is preserved. We test this hypothesis, by comparing empirically a range of careful abstraction methods, focusing particularly on methods that (i) generalize instances and (ii) perform oblivious (partial) decision-tree abstraction. These methods are applied to a selection of language learning tasks, and their generalization performance as well as memory item compression rates are collected. On the basis of the results we conclude that when combined with feature weighting or value distance metrics, careful abstraction equals or outperforms pure memory-based learning, yet mainly on small data sets. In the concluding case study involving large data sets, we find that the FAMBL algorithm, a new careful abstractor which merges families of instances, performs close to pure memory-based learning, though it equals it only on three of the six tasks. On the basis of the gathered empirical results, we discuss the incorporation of the notion of instance families, i.e. carefully generalized instances, in memory-based language learning.

ILK-9914

Memory-Based Word Sense Disambiguation

Author(s): Jorn Veenstra, Antal van den Bosch, Sabine Buchholz, Walter Daelemans, Jakub Zavrel.
Reference: Computers and the Humanities, special issue on Senseval, Word Sense Disambiguation, edited by: Adam Kilgarriff and Martha Palmer, 34:1-2, 2000.

postscript of preprint

We describe a memory-based classification architecture for word sense disambiguation and its application to the Senseval evaluation task. For each ambiguous word, a semantic word expert is automatically trained using a memory-based approach. In each expert, selecting the correct sense of a word in a new context is achieved by finding the closest match to stored examples of this task. Advantages of the approach include (i) fast development time for word experts, (ii) easy and elegant automatic integration of information sources, (iii) use of all available data for training the experts, and (iv) relatively high accuracy with minimal linguistic engineering.

 
1998 

ILK-9801

Modularity in inductively-learned word pronunciation systems

Author(s): Van den Bosch, A., Weijters, A., and Daelemans, W.
Reference: In D.M.W. Powers (Ed.), Proceedings of NeMLaP3/CoNLL98, Sydney, Australia, pp. 185-194.

Postscript

In leading morpho-phonological theories and state-of-the-art text-to-speech systems it is assumed that word pronunciation cannot be learned or performed without in-between analyses at several abstraction levels (e.g., morphological, graphemic, phonemic, syllabic, and stress levels). We challenge this assumption for the case of English word pronunciation. Using IGTree, an inductive-learning decision-tree algorithms, we train and test three word-pronunciation systems in which the number of abstraction levels (implemented as sequenced modules) is reduced from five, via three, to one. The latter system, classifying letter strings directly as mapping to phonemes with stress markers, yields significantly better generalisation accuracies than the two multi-module systems. Analyses of empirical results indicate that positive utility effects of sequencing modules are outweighed by cascading errors passed on between modules.


ILK-9802

Do not forget: Full memory in memory-based learning of word pronunciation

Author(s): Van den Bosch, A., and Daelemans, W.
Reference: In D.M.W. Powers (Ed.), Proceedings of NeMLaP3/CoNLL98, Sydney, Australia, pp. 195-204.

Postscript

Memory-based learning, keeping full memory of learning material, appears a viable approach to learning NLP tasks, and is often superior in generalisation accuracy to eager learning approaches that abstract from learning material. Here we investigate three partial memory-based learning approaches which remove from memory specific task instance types estimated to be exceptional. The three approaches each implement one heuristic function for estimating exceptionality of instance types: (i) typicality, (ii) class prediction strength, and (iii) friendly-neighbourhood size. Experiments are performed with the memory-based learning algorithm IB1-IG trained on English word pronunciation. We find that removing instance types with low prediction strength (ii) is the only tested method which does not seriously harm generalisation accuracy. We conclude that keeping full memory of types rather than tokens, and excluding minority ambiguities appear to be the only performance-preserving optimisations of memory-based learning.


ILK-9804

Rapid development of NLP modules with memory-based learning

Author(s): Walter Daelemans, Antal van den Bosch, Jakub Zavrel, Jorn Veenstra, Sabine Buchholz, and Bertjan Busser.
Reference: In Proceedings of ELSNET in Wonderland, pp. 105-113. Utrecht: ELSNET, 1998. Also in R. Basili and M.T. Pazienza (Eds.), ECML-98 TANLPS Workshop Notes, Technische Universitaet Chemnitz, 1998, pp. 1-17.

Postscript

The need for software modules performing natural language processing (NLP) tasks is growing. These modules should perform efficiently and accurately, while at the same time rapid development is often mandatory. Recent work has indicated that machine learning techniques in general, and memory-based learning (MBL) in particular, offer the tools to meet both ends. We present examples of modules trained with MBL on three NLP tasks: (i) text-to-speech conversion, (ii) part-of-speech tagging, and (iii) phrase chunking. We demonstrate that the three modules display high generalization accuracy, and argue why MBL is applicable similarly well to a large class of other NLP tasks.


ILK-9805

Interpretable neural networks with BP-SOM

Author(s): Ton Weijters, Antal van den Bosch, and Jaap van den Herik.
Reference: In C. Nedellec and C. Rouveirol (Eds.), Machine Learning: ECML-98, Lecture Notes in Artificial Intelligence 1398, Berlin: Springer, 406-411, 1998.

Postscript

Interpretation of models induced by artificial neural networks is often a difficult task. In this paper we focus on a relatively novel neural network architecture and learning algorithm, BP-SOM, that offers possibilities to overcome this difficulty. It is shown that networks trained with BP-SOM show interesting regularities, in that hidden-unit activations become restricted to discrete values, and that the BP-SOM part can be exploited for automatic rule extraction.


ILK-9806

Toward inductive lexicons: a case study

Author(s): Walter Daelemans, Gert Durieux, and Antal van den Bosch.
Reference: In: P. Velardi (ed.), Proceedings LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, Granada, Spain, pp. 29-35.

Postscript

Machine learning techniques can be used to make lexicons adaptive. The main problems in adaptation are the addition of lexical material to an existing lexical database, and the recomputation of sublanguage-dependent lexical information when porting the lexicon to a new domain or application. Inductive lexicons combine available lexical information and corpus data to alleviate these tasks. In this paper, we introduce the general methodology for the construction of inductive lexicons, and discuss empirical results on a case study using the approach: prediction of the gender of nouns in Dutch.
 


ILK-9807

Fast NP Chunking Using Memory-Based Learning Techniques

Author(s): Jorn Veenstra.
Reference: In F. Verdenius and W. van den Broek (Eds), Proceedings of Benelearn 1998, Wageningen, the Netherlands, pp. 71-79, 1998.

Postscript

In this paper we discuss the application of Memory-Based  Learning (MBL) to fast NP chunking. We first discuss the application of a fast decision tree variant of MBL (IGTree) on the dataset described in (Ramshaw & Marcus 1995), which consists of roughly 50,000 test and 200,000 train items. In a second series of experiments we used an architecture of two cascaded IGTrees. In the second level of this cascaded classifier we added context predictions as extra features so that incorrect predictions from the first level can be corrected, yielding a 97.2% generalisation accuracy with training and testing times in the order of seconds to minutes. The recall and precision for  predicting NP chunks is respectively 94.3% and 89.0%.
 


ILK-9808

Improving data driven wordclass tagging by system combination

Author(s): Hans van Halteren, Jakub Zavrel, and Walter Daelemans.
Reference: In Proceedings of COLING-ACL '98, August 1998, Montreal, Canada, pp. 491-497.

Postscript

In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. All combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best individual tagger.


ILK-9809

Distinguishing complements from adjuncts using memory-based learning

Author(s): Sabine Buchholz
Reference: In B. Keller (Ed.), Proceedings of the ESSLLI-98 Workshop on Automated Acquisition of Syntax and Parsing, pp.41-48.

Postscript

The automatic distinction between complements and adjuncts, i.e. between subcategorized and non-subcategorized constituents, is crucial for the automatic acquisition of subcategorization lexicons from corpora. In this paper we present memory-based learning experiments for the task of distinguishing complements from adjuncts. Data is extracted from the part-of-speech tagged and parsed version of the Wall Street Journal Corpus. Memory-based learning algorithms classify test instances by using the class of the most similar training instance. By providing the algorithm with different subsets of features in the data, we can explore the importance of different features. By using only syntactic information about the category itself and its neighboring constituents, we achieve an accuracy of 91.6% for the complement adjunct distinction, which corresponds to 89.7% correctly classified subcategorization frames. The error analysis shows that whereas at the level of constituents, PPs are most difficult to classify (23% errors), at the level of frames it is the ditransitive frame that has the highest error rate (97%).


ILK-9810

TreeTalk-D: a machine learning approach to Dutch word pronunciation

Author(s): Bertjan Busser.
Reference: In P. Sojka, V. Matousek, K. Pala, and I. Kopecek (Eds.) (1998) Proceedings TSD Conference, pp. 3-8, Masaryk University, Czech Republic.

Postscript

We present experimental results concerning the application of the IGTree decision-tree learning algorithm to Dutch word pronunciation. We evaluate four different Dutch word pronunciation systems configured to test the utility of modularization of grapheme-to-phoneme transcription (G) and stress prediction (S). Both training and testing data are extracted from the CELEX II lexical database. Experiments yield full word transcription accuracies (stressed and syllabified phonetic transcription) of roughly 75%, and 97% accuracy on G at the letter level. The best system performs G and S in sequence, using a context of four letters left and right per grapheme-phoneme mapping.


ILK-9811

Unsupervised learning of subcategorisation information and its application in a parsing subtask

Author(s): Sabine Buchholz.
Reference: In H. La Poutre and H.J. van den Herik (Eds.) (1998), Proceedings of the Tenth Netherlands/Belgium Conference on Artificial Intelligence (NAIC'98), CWI, Amsterdam, pp. 7-16.

Postscipt

This paper is about two aspects of subcategorisation in NLP. First, it is about the automatic extraction of subcategorisation information from corpora. More specifically, we are concerned with unsupervised learning of subcategorisation information from tagged text by means of hierarchical clustering. The second aspect of the paper is the usage of this subcategorisation information for parsing, especially for the distinction between complements and adjuncts. We show that the information learned by unsupervised clustering can be exploited by a memory-based learner, to improve upon the complement-adjunct distinction. We compare the improvement gained by the use of this unsupervised information (1%) to that of different representations of subcategorisation information extracted from the tree-bank annotation (maximum 1.5%). The unsupervised information thus achieves two thirds of the improvement that can be obtained from the hand-crafted treebank information.

1997 

ILK-9701

IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms.

Author(s): Walter Daelemans, Antal van den Bosch, Ton Weijters
Reference: D. Aha (ed.) Artificial Intelligence Review, special issue on Lazy Learning, 1996.

Postscript

We describe the IGTree learning algorithm, which compresses an instance base into a tree structure. The concept of information gain is used as a heuristic function for performing this compression. IGTree produces trees that, compared to other lazy learning approaches, reduce storage requirements and the time required to compute classifications. Furthermore, we obtained similar or better generalization accuracy with IGTree when trained on two complex linguistic tasks, viz. letter--phoneme transliteration and part-of-speech-tagging, when compared to alternative lazy learning and decision tree approaches (viz., IB1, information-gain-weighted IB1, and C4.5). A third experiment, with the task of word hyphenation, demonstrates that when the mutual differences in information gain of features is too small, IGTree as well as information-gain-weighted IB1 perform worse than IB1. These results indicate that IGTree is a useful algorithm for problems characterized by the availability of a large number of training instances described by symbolic features with sufficiently differing information gain values.


ILK-9702

Memory-Based Learning: Using Similarity for Smoothing

Author(s): Jakub Zavrel & Walter Daelemans
Reference: to appear in: Proc. of 35th Annual Meeting of the ACL, Madrid, July 1997
Postscript

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and allows the easy integration of diverse information sources, such as rich lexical representations. 


ILK-9703

Resolving PP Attachment Ambiguities with Memory-Based Learning

Author(s): Jakub Zavrel, Walter Daelemans & Jorn Veenstra
Reference: submitted to: Proc. of Computational Linguistics in the Netherlands 1996 (CLIN96)

Postscript

In this paper we describe the application of Memory-Based Learning to the problem of Prepositional Phrase attachment disambiguation. We compare the Memory-Based Learning method, which keeps examples in memory and generalizes by using intelligent similarity metrics, with a number of recently proposed statistical methods that are well suited for large numbers of features. We evaluate our methods on a common benchmark dataset that was first used in Ratnaparkhi et al (1994). Our method compares favorably to previous methods, and is well-suited for incorporating various unconventional representations for word patterns such as value difference metrics and Lexical Space. 


ILK-9704

TreeTalk-D: a Machine Learning Approach to Dutch Grapheme-to-Phoneme Conversion

Author: G.J.Busser.

This paper focuses on applying the IGTree learning algorithms to Grapheme-to-Phoneme Conversion (GPC) for Dutch. The architecture of the implemented system will be discussed with respect to linguistic theory. The system exhibits state-of-the-art performance.
Contact the author for the current draft version. 


ILK-9706

Empirical Learning of Natural Language Processing Tasks.

Author(s): Daelemans, W., A. van den Bosch, and T. Weijters.
Reference: M. van Someren and G. Widmer (eds.) Machine Learning: ECML-97, Lecture Notes in Artificial Intelligence 1224, Berlin: Springer, 337-344, 1997.

Postscript

Language learning has thus far not been a hot application for machine-learning (ML) research. This limited attention for work on empirical learning of language knowledge and behaviour from text and speech data seems unjustified. After all, it is becoming apparent that empirical learning of Natural Language Processing (NLP) can alleviate NLP's all-time main problem, viz. the knowledge acquisition bottleneck: empirical ML methods such as rule induction, top down induction of decision trees, lazy learning, inductive logic programming, and some types of neural network learning, seem to be excellently suited to automatically induce exactly that knowledge that is hard to gather by hand. In this paper we address the question why NLP is an interesting application for empirical ML, and provide a brief overview of current work in this area. 


ILK-9707

Data Mining as a Method for Linguistic Analysis: Dutch Diminutives.

Author(s): Daelemans, W., P. Berck, & S. Gillis.
Reference: Folia Linguistica , XXXI/1-2, 57-75, 1997.

[see early workshop proceedings version]

 We propose to use data mining techniques (inductive techniques for the automatic acquisition of comprehensible knowledge from data) as a method in linguistic analysis. In the past, such techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks. In this paper we show that they can also assist in linguistic theory formation by providing a new tool for the evaluation of linguistic hypotheses, for the extraction of rules from corpora, and for the discovery of useful linguistic categories. By applying a rule induction method to a particular linguistic task (diminutive formation in Dutch) we show that data mining techniques can be used to test linguistic hypotheses about this morphological process, and to discover interesting morphological and phonological rules and categories. 


ILK-9708

A Feature-Relevance Heuristic for Indexing and Compressing Large Case Bases.

Author(s): Daelemans, W., A. van den Bosch, and J. Zavrel.
Reference: M. van Someren and G. Widmer (eds.) 9th European Conference on Machine Learning -- Poster Papers. Prague: Laboratory of Intelligent Systems, 29-38, 1997.

 Postscript

 This paper reports results with IGTree, a formalism for indexing and compressing large case bases in Instance-Based Learning (IBL) and other lazy-learning techniques. The concept of information gain (entropy minimisation) is used as a heuristic feature-relevance function for performing the compression of the case base into a tree. IGTree reduces storage requirements and the time required to compute classifications considerably for problems where current IBL approaches fail for complexity reasons. Moreover, generalisation accuracy is often similar, for the tasks studied, to that obtained with information-gain-weighted variants of lazy learning, and alternative approaches such as C4.5. Although IGTree was designed for a specific class of problems --linguistic disambiguation problems with symbolic (nominal) features, huge case bases, and a complex interaction between (sub)regularities and exceptions-- we show in this paper that the approach has a wider applicability when generalising it to TRIBL, a hybrid combination of IGTree and IBL. 


ILK-9709

Skousen's Analogical Modeling Algorithm: A comparison with Lazy Learning.

Author(s): Daelemans, W., S. Gillis, and G. Durieux.
Reference: D. Jones and H. Somers (eds.) New Methods in Language Processing., London: University College Press, 3-15, 1997.

 [see workshop proceedings version]

 We provide a qualitative and empirical comparison of Skousen's Analogical Modeling algorithm (AM) with Lazy Learning (LL) on a typical Natural Language Processing task. AM incorporates an original approach to feature selection and to the handling of symbolic, unordered feature values. More specifically, it provides a method to dynamically compute an optimally-sized set of nearest neighbours (the analogical set) for each test item, on the basis of which the most plausible category can be selected. We investigate the algorithm's generalisation accuracy and its tolerance to noise and compare it to Lazy Learning techniques on a primary stress assignment task in Dutch. The latter problem is typical for a large amount of classification problems in Natural Language Processing. It is shown that AM is highly successful in performing the task: it outperforms Lazy Learning in its basic scheme. However, LL can be augmented so that it performs at least as well as AM and becomes as noise tolerant as well.

1996 

Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion

Author(s): Walter Daelemans, Antal van den Bosch
Reference: Van Santen, J., R. Sproat, J. Olive, and J. Hirschberg (eds.) Progress in Speech Synthesis. New York: Springer Verlag, 77-90, 1996.

Postscript

 We describe an approach to grapheme-to-phoneme conversion which is both language-independent and data-oriented. Given a set of examples (spelling words with their associated phonetic representation) in a language, a grapheme-to-phoneme conversion system is automatically produced for that language which takes as its input the spelling of words, and produces as its output the phonetic transcription according to the rules implicit in the training data. We describe the design of the system, and compare its performance to knowledge-based and alternative data-oriented approaches.


Abstraction Considered Harmful: Lazy Learning of Language Processing.

Author(s): Walter Daelemans
Reference: van den Herik, J. and T. Weijters (eds.) Benelearn-96. Proceedings of the 6th Belgian-Dutch Conference on Machine Learning. MATRIKS: Maastricht, The Netherlands, 3-12, 1996.

Postscript

No Abstract 


Morphological Analysis as Classification: an Inductive-Learning Approach.

Author(s): Antal van den Bosch, Walter Daelemans and Ton Weijters
Reference: Oflazer, K. and H. Somers (eds.) NeMLaP-2. Proceedings of the Second International Conference on New Methods in Language Processing, Ankara, Turkey, 79-89, 1996.

Postscript

Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm ib1-ig performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and processing, and being language-independent.


Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules

Author(s): Walter Daelemans, Peter Berck and Steven Gillis.
Reference: Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), Copenhagen, Denmark, 95-100, 1996.

Postscript

 We describe a case study in the application of symbolic machine learning techniques for the discovery of linguistic rules and categories. A supervised rule induction algorithm is used to learn to predict the correct diminutive suffix given the phonological representation of Dutch nouns. The system produces rules which are comparable to rules proposed by linguists. Furthermore, in the process of learning this morphological task, the phonemes used are grouped into phonologically relevant categories. We discuss the relevance of our method for linguistics and language technology.


Artificial Intelligence Models of Language Processing

Author(s): Walter Daelemans and Koen De Smedt.. In: T. Dijkstra and K. De Smedt (eds.)
Reference: Computational Psycholinguistics: AI and Connectionist models of human language processing. London: Taylor & Francis, 24-48, 1996.

No abstract 


MBT: A Memory-Based Part of Speech Tagger-Generator

Author(s): Walter Daelemans, Jakub Zavrel, Peter Berck and Steven Gillis. Reference: E. Ejerhed and I. Dagan (eds.) Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, 14-27, 1996.

Postscript

We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpora size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.
 
1995 

A Computational Model of P&P: Dresher and Kaye (1990) revisited.

Author(s): Steven Gillis, Gert Durieux, Walter Daelemans.
Reference: M. Verrips & F. Wijnen (eds.) Approaches to Parameter Setting. Amsterdam Studies in Child Language Development, vol 4, 135-173, 1995.

Postscript

 Language acquisition research in the Universal Grammar tradition has witnessed a wealth of studies focusing on various aspects of phonology and syntax. The concept of parameter setting as the core of acquisition is at the heart of these studies. As a methodology, computational modeling has hardly given rise to experimental studies that actually implement the theoretical constructs invoked by and utilized in acquisition studies. Nevertheless, computer modeling is a powerful tool for studying highly complex phenomena such as the intricate interactions between the language acquisition data and the process of parameter setting. A notable exception to this situation is Dresher & Kaye's (1990) computational model YOUPIE that incorporates a UG approach to the acquisition of a phonological subsystem, i.e. stress assignment as it is treated in metrical phonology. We analyze this model focusing mainly on the learning theory incorporated in the model, i.e. the way in which UG mediates between the data and the grammar constructed by the learner. This investigation will focus on two aspects of the learning theory. First of all, the requirements formulated with respect to the learning theory will be evaluated against their implementation in the actual model. We will note several mismatches between the two. Secondly, we present an empirical test of the model. The model's production component is used to generate a 'language' for each possible parameter setting. Then, the model's learning component is used to acquire the grammar of each individual language. The outcome of the experiment reveals several problems in empirical coverage of the model, and relates some of them to inherent design choices.


The Profit of Learning Exceptions.

Author(s): Antal van den Bosch, Ton Weijters, Jaap van den Herik, Walter Daelemans
Reference: Proceedings of the 5th Belgian-Dutch Conference on Machine Learning, BENELEARN'95, p. 118-126, 1995.

Postscript

For many classification tasks, the set of available task instances can be roughly divided into regular instances and exceptions. We investigate three learning algorithms that apply a different method of learning with respect to regularities and exceptions, viz. (i) back-propagation, (ii) cascade back-propagation (a constructive version of back-propagation), and (iii) information-gain tree (an inductive decision-tree algorithm). We compare the bias of the algorithms towards learning regularities and exceptions, using a task-independent metric for the typicality of instances. We have found that information-gain tree is best capable of learning exceptions. However, it outperforms back-propagation and cascade back-propagation only when trained on very large training sets.


Linguistics as Data Mining: Dutch Diminutives

Author(s): Walter Daelemans, Peter Berck and Steven Gillis
Reference: Andernach, T., M. Moll, and A. Nijholt (eds). CLIN V, Papers from the Fifth CLIN Meeting, 59-72, 1995.

Postscript

There are several different ways data mining (the induction of knowledge from data) can be applied to the problem of natural language processing. In the past, data mining techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks. In this paper, we show that they can also assist in linguistic theory formation by providing a new tool for the evaluation of linguistic hypotheses, for the extraction of rules from corpora, and for the discovery of useful linguistic categories. Applying Quinlan's C4.5 inductive machine learning method to a particular linguistic task (diminutive formation in Dutch) we show that data mining techniques can be used (i) to test linguistic hypotheses about this process, and (ii) to discover interesting linguistic rules and categories.


Memory-Based Lexical Acquisition and Processing.

Author(s): Walter Daelemans
Reference: P. Steffens (ed.) Machine Translation and the Lexicon, Springer Lecture Notes in Artificial Intelligence 898, 85-98, 1995.

Postscript

Current approaches to computational lexicology are knowledge-based (competence-oriented) and try to abstract away from specific formalisms, domains, and applications. This results in severe complexity, acquisition and reusability bottlenecks. As an alternative, we propose a particular performance-oriented approach to Natural Language Processing based on automatic memory-based learning of linguistic (lexical) tasks. The consequences of the approach for computational lexicology are discussed, and the application of the approach to a number of lexical acquisition and disambiguation tasks in phonology, morphology and syntax is described.

1994 

Measuring the Complexity of Writing Systems

Author(s): Antal van den Bosch, Alain Content, Walter Daelemans, and Beatrice De Gelder
Reference: Preprint of paper in Journal of Quantitative Linguistics, 1, 3, 178-188, 1994.

Postscript

We propose a quantitative operationalisation of the complexity of a writing system. This complexity, also referred to as orthographic depth, plays a crucial role in psycholinguistic modelling of reading aloud (and learning to read aloud) in several languages. The complexity of a writing system is expressed by two measu> 


Transfer interrupted!

etter-phoneme alignment and that of the complexity of grapheme-phoneme correspondences. We present the alignment problem and the correspondence problem as tasks to three different data-oriented learning algorithms, and submit them to English, French and Dutch learning and testing material. Generalisation performance metrics are used to propose for each corpus a two-dimensional writing system complexity value.


Default Inheritance in an Object-Oriented Representation of Linguistic Categories.

Author(s): Walter Daelemans and Koen De Smedt
Reference: International Journal Human-Computer Studies 41, 149-177, 1994.

We describe an object-oriented approach to the representation of linguistic knowledge. Rather than devising a dedicated grammar formalism, we explore the use of powerful but domain-independent object-oriented languages. We use default inheritance to organize regular and exceptional behavior of linguistic categories. Examples from our work in the areas of morphology, syntax and the lexicon are provided. Special attention is given to multiple inheritance, which is used for the composition of new categories out of existing ones, and to structured inheritance, which is used to predict, among other things, to which rule domain a word form belongs.


A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion.

Author(s): Walter Daelemans and Antal van den Bosch
Reference: Proceedings of the ESCA-IEEE conference on Speech Synthesis, New York, 199-203, 1994.

Postscript

We report on an implemented grapheme-to-phoneme conversion architecture. Given a set of examples (spelling words with their associated phonetic representation) in a language, a grapheme-to-phoneme conversion system is automatically produced for that language which takes as its input the spelling of words, and produces as its output the phonetic transcription according to the rules implicit in the training data. This paper describes the architecture and focuses on our solution to the alignment problem: given the spelling and the phonetic trancription of a word (often differing in length), these two representations have to be aligned in such a way that grapheme symbols or strings of grapheme symbols are consistently associated with the same phonetic symbol. If this alignment has to be done by hand, it is extremely labour-intensive.

 


Are Children 'lazy Learners'? A Comparison of Natural and Machine Learning of Stress.

Author(s): Steven Gillis, Walter Daelemans and Gert Durieux
Reference: Ram, A. and Eiselt, K. (eds.) Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, Georgia Institute of Technology, Atlanta, USA, Hillsdale: Lawrence Erlbaum Associates, 369-374, 1994.

Postscript

Do children acquire rules for main stress assignment or do they learn stress in an exemplar-based way? In the language acquisition literature, the former approach has been advocated without exception: although they hear most words produced with their appropriate stress pattern, children are taken to extract rules and do not store stress patterns lexically. The evidence for a rule-based approach is investigated and it will be argued that in the literature this approach is preferred due to an extremely simplified interpretation of exemplar-based models. We will report experiments showing that Instance-Based Learning, an exemplar-based model, makes the same kinds of stress related errors in production that children make:

  1. the amount of production errors is related to metrical markedness, and
  2. stress shifts and errors with respect to the segmental and syllabic structure of words typically take the form of a regularization of stress patterns.
Instance Based Learning belongs to a class of Lazy Learning algorithms. In these algorithms, no explicit abstractions in the form of decision trees or rules are derived; abstraction is driven by similarity during performance. Our results indicate that at least for this domain, this kind of lazy learning is a valid alternative to rule-based learning. Moreover the results plead for a reanalysis of language acquisition data in terms of exemplar-based models.

 


Skousen's Analogical Modeling Algorithm: a Comparison with Lazy Learning

Author(s): Walter Daelemans, Steven Gillis and Gert Durieux.
Reference: Jones, D. (ed.) Proceedings of the International Conference on New Methods in Language Processing (NeMLaP), UMIST: Manchester, 1-7, 1994.

Postscript

We provide a qualitative and empirical comparison of Skousen's Analogical Modeling algorithm (AM) with Lazy Learning (LL) on a typical Natural Language Processing task. AM incorporates an original approach to feature selection and to the handling of symbolic, unordered feature values. More specifically, it provides a method to dynamically compute an optimally-sized set of nearest neighbours (the analogical set) for each test item, on the basis of which the most plausible category can be selected. We investigate the algorithm's generalisation accuracy and its tolerance to noise and compare it to Lazy Learning techniques on a primary stress assignment task in Dutch. The latter problem is typical for a large amount of classification problems in Natural Language Processing. It is shown that AM is highly successful in performing the task: it outperforms Lazy Learning in its basic scheme. However, LL can be augmented so that it performs at least as well as AM and becomes as noise tolerant as well.

 


The Acquisition of Stress: a Data-Oriented Approach.

Author(s): Walter Daelemans, Steven Gillis and Gert Durieux. Computational
Reference: Linguistics 20 (3), special issue on Computational Phonology (Steven Bird guest ed.), 421-451, 1994.

A data-oriented (empiricist) alternative to the currently pervasive (nativist) Principles and Parameters approach to the acquisition of stress assignment is investigated. A similarity based algorithm, viz. an augmented version of Instance Based Learning is used to learn the system of main stress assignment in Dutch. In this non-trivial task a comprehensive lexicon of Dutch monomorphemes is used instead of the idealized and highly simplified description of the empirical data used in previous approaches. It is demonstrated that a similarity-based learning method is effective in learning the complex stress system of Dutch. The task is accomplished without the a priori knowledge assumed to pre-exist in the learner in a Principles and Parameters framework. A comparison of the system's behavior with a consensus linguistic analysis (in the framework of Metrical Phonology) shows that ease of learning correlates with decreasing degrees of markedness of metrical phenomena. It is also shown that the learning algorithm captures subregularities within the stress system of Dutch which cannot be described without going beyond some of the theoretical assumptions of metrical phonology.

 


Learnability and Markedness: Dutch Stress Assignment.

Author(s): Steven Gillis, Walter Daelemans, Gert Durieux and Antal van den Bosch.
Reference: Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, Boulder Colorado, USA, Hillsdale: Lawrence Erlbaum Associates, 452-457, 1993.

Postscript

 This paper investigates the computational grounding of learning theories developed within a metrical phonology approach to stress assignment. In current research the Principles and Parameters approach to learning stress is pervasive. We point out some inherent problems associated with this approach in learning the stress system of Dutch. The paper focuses on two specific aspects of the learning task: we empirically investigate the effect of input encodings on learnability, and we examine the possibility of a data-oriented approach as an alternative to the Principles and Parameters approach. We show that a data-oriented similarity-based machine learning technique (Instance-Based Learning), working on phonemic input encodings is able to learn metrical phonology abstractions based on concepts like syllable weight, and that in addition, it is able to extract generalizations which cannot be expressed within a metrical framework.

 


Tabtalk: Reusability in Data-Oriented Grapheme-to-Phoneme Conversion.

Author(s): Walter Daelemans and Antal van den Bosch.
Reference: Proceedings of Eurospeech, Berlin, 1459-1466, 1993.

Postscript

 In the traditional (knowledge-based) approach to the design of grapheme-to-phoneme modules in text-to-speech systems, it is claimed that various explicitly coded, language-specific, linguistic knowledge sources are necessary for a good performance. Due to knowledge acquisition bottlenecks, this implies long development cycles. As an alternative, we propose to use inductive methods from machine learning in a simple combined Trie Search and Similarity-Based Reasoning approach and show that, for Dutch, its performance is better than that of the knowledge-based approach and backpropagation learning. Furthermore, we show that our approach is reusable for any language for which a training corpus exists.

1993 

Data-Oriented Methods for Grapheme-to-Phoneme Conversion

Author(s): Antal van den Bosch and Walter Daelemans.
Reference: Proceedings of the Sixth conference of the European chapter of the ACL, ACL, 45-53, 1993.

Postscript

It is traditionally assumed that various sources of linguistic knowledge and their interaction should be formalised in order to be able to convert words into their phonemic representations with reasonable accuracy. We show that using supervised learning techniques, based on a corpus of transcribed words, the same and even better performance can be achieved, without explicit modeling of linguistic knowledge. In this paper we present two instances of this approach. A first model implements a variant of instance-based learning, in which a weighed similarity metric and a database of prototypical exemplars are used to predict new mappings. In the second model, grapheme-to-phoneme mappings are looked up in a compressed text-to-speech lexicon (table lookup) enriched with default mappings. We compare performance and accuracy of these approaches to a connectionist (backpropagation) approach and to the linguistic knowledge-based approach.

 


Learnability and Markedness in Data-Driven Acquisition of Stress

Author(s): Walter Daelemans, Steven Gillis, Gert Durieux and Antal van den Bosch.
Reference: T. Mark Ellison and James M. Scobbie (eds) Computational Phonology. Edinburgh Working Papers in Cognitive Science 8, 1993, 157-178.

Postscript

 This paper investigates the computational grounding of learning theories developed within a metrical phonology approach to stress assignment. In current research, the Principles and Parameters approach to learning stress is pervasive. We point out some inherent problems associated with this approach in learning the stress system of a particular language by setting parameters (the case of Dutch), which is shown to be an inherently noisy problem. The paper focuses on two aspects of this problem: we empirically examine the effect of input encodings on learnability, and we investigate the possibility of a data-oriented approach as an alternative to the principles and parameters approach. We show that data-oriented similarity-based machine learning techniques like Backpropagation Learning, Instance-Based Learning and Analogical Modeling working on phonemic input encodings

  1. are able to learn metrical phonology abstractions based on concepts like syllable weight,
  2. that their performance can be related to various degrees of markedness of metrical phenomena, and
  3. that in addition, they are able to extract generalizations which cannot be expressed within the metrical framework without recourse to lexical marking.
We also provide a quantitative comparison of the performance of the three algorithms investigated.

1992 

Generalization Performance of Backpropagation Learning on a Syllabification Task.

Author(s): Walter Daelemans and Antal van den Bosch.
Reference: M.F.J. Drossaers and A. Nijholt (eds.) Connectionism and Natural Language Processing. Proceedings Third Twente Workshop on Language Technology, 27-38, 1992.

Postscript

 We investigated the generalization capabilities of backpropagation learning in feed-forward and recurrent feed-forward connectionist networks on the assignment of syllable boundaries to orthographic representations in Dutch (hyphenation). This is a difficult task because phonological and morphological constraints interact, leading to ambiguity in the input patterns. We compared the results to different symbolic pattern matching approaches, and to an exemplar-based generalization scheme, related to a k-nearest neighbour approach, but using a similarity metric weighed by the relative information entropy of positions in the training patterns. Our results indicate that the generalization performance of backpropagation learning for this task is not better than that of the best symbolic pattern matching approaches, and of exemplar-based generalization.

Last update: 18 Aug 99