Entwicklung einer lexikographischen Datenbank fuer die Verben des Deutschen (Development of a lexicographic database for German verbs)

Sabine Buchholz

MA thesis in computational linguistics; University of the Saarland, July 1996; Written in German (205 pages, including 51 pages of programs and 10 screenshots)

This MA thesis is about a database which contains syntactic information on German verbs in such a way as to enable its reuse in NLP-systems. The paper consists of a theoretical and a practical part. In the theoretical half, an overview is given about what is considered syntactic information in literature on this topic from different backgrounds (theoretical linguistics, computational linguistics, lexicography). Apart from the choice of auxiliary in the perfect tense and the possibilities for passivization, the main focus here is on the valency of a verb. To ensure theory-neutrality, this concept is given a rather wide meaning, comprising NPs in all four cases, PPs, adverbs and verbal complements, phenomena such as expletives, reflexives and predicatives as well as more semantic features, e.g. the different kinds of "free" datives, the distinction into temporal, local and other adverbs and the control in control verbs. The practical half of the paper describes the design and implementation of the resulting database in the database management system Microsoft Visual Foxpro. To fill the new database with data, all the syntactic information from the old German NLP-lexicon SADAW (containing about 12.000 verbs, next to other parts of speech) is then imported into the new database. The difficulties of this step are described in detail. Finally, a graphical user interface to the database is implemented in Visual Foxpro. It displays the information which is available on the chosen verb in the database by means of standard German sentences. This frees the user from having to know about the internal representation of the data in the database and enables her to use her native speaker competence of German. At the same time, the sentences define tests as to how the information in the database is to be interpreted, thereby guaranteeing its consistency and enabling its reuse.