Possible approaches
Ideally, participants of the shared task will be people who have already worked in dependency parsing (possibly even have a suitable parser) as well as those who do not have practical experience yet. To help especially the latter group, this page tries to give an overview of how the task of dependency parsing could be approached by using machine learning. We also provide references and (still to be done) pointers to available software.
Caveat: This text was written to the best of my knowledge. Please let me know any errors or omissions. Sabine.
Nivre
The algorithm is a dependency version of a shift-reduce parser. The shift step is similar to that in a shift reduce parser for context free grammars: the leftmost token of the remaining input is shifted onto the stack. There is also a reduce, which just pops one token from the stack. In addition, there are two ways to introduce labelled dependency relations. Either the dependent is the token on top of the stack and its head is the next input token. After recording the dependency, the dependent is reduced. Or the dependent is the next input token and its head is on top of the stack. Then, after recording the dependency, the dependent is shifted. The final parsing result will be a labelled projective dependency structure. Non-projective structures have to be projectivized for training, and additional information is encoded in the dependency labels to allow de-projectivization after predicting structures for test sentences. The algorithm is implemented in MaltParser.
Update In the latest version (0.3), input and output formats are compatible with the CoNLL-X shared task format, with fields separated by a single tab (not blanks). Note, however, that currently only the fields FORM, POSTAG and DEPREL can be used as features in the current version. In a later version, the plan is to incorporate LEMMA, CPOSTAG and FEATS as well.
Covington
An incremental parsing algorithm with backtracking, assuming a grammar is given. Can derive non-projective structures but can also be restricted to projective structures only. This algorithm is also implemented in MaltParser.
Matsumoto and colleagues
In this approach, dependency structures are constructed bottom-up by deciding repeatedly (from left to right) for each pair of adjacent remaining words whether one of these words is the dependent of the other (and which one). If so, a dependency relation is recorded and the dependent is deleted from the input. The original algorithm produces unlabelled dependencies but it can easily be modified to predict a label together with the dependency decision. For non-projective structures, the same projectivization/de-projectivization approach as for Nivre's parser can be used. The parser described in the Kudo & Matsumoto papers is called CaboCha and can be downloaded.
McDonald et al. plus labelling
For each pair of a word x and another word y in the sentence, a score is computed of how likely it is that x has y as its head. These scores are then used in a dynamic programming parser to find the most likely unlabelled, projective or non-projective dependency structure of the sentence. The output of that step could then form the input to a function labeller, which assigns a function to each word. The algorithm is implemented in MSTParser.
Eisner
Eisner's Model C is similar to the generative model used by Collins and Charniak. However, he also proposes some alternative models (A, B, D). The original description is for unlabelled dependency parsing, but later papers (??) describe an extension for labelled dependencies. As with Nivre's and Matsumoto's approach, projectivization/de-projectivization can be used to cope with non-projective structures.
inspired by Eckhard Bick's constraint grammar (CG) approach
Constraint Grammar uses hand-written rules to determine the syntactic functions of words. Functions often incorporate the direction of the dependency, indicated by the '>' and '<' symbols. E.g. >N could mean a premodifier in an NP, and P< an argument of a preposition (in a language where arguments follow heads). The output of the CG parser is then used as input into either a phrase structure parser or a dependency specifier. The latter decides for each word which other word is its head (see http://corp.hum.sdu.dk/arboretum_about.html). So a machine learner could be trained to first predict the function and the direction. The output of that step would then form the input to a head finder, which determines the actual head of each word. It depends on how that second steps is implemented whether the result is always projective or can be non-projective.
Collins et al. plus labelling
Collins' parser is a well-known parser for treebanks that have a phrase structure annotation. To train that parser on the Czech Dependency Treebank, Collins et al. transformed the dependency structure into a phrase structure by creating non-terminals and labelling them based on the coarse-grained part-of-speech of the head (e.g. if the head is a noun, the non-terminal label becomes NP, if the head is an adjective, the label becomes AP). For scoring, the phrase structure trees predicted for the test set have to be converted back to dependencies. This will only result in unlabelled dependencies, or in labels based on the phrase structure annotation (such as: PP to noun head in NP). Proper dependency functions would have to be assigned in an additional labelling step. To derive non-projective structures, Collins' model 3 would have to be used to predict gaps.
References
Bick
-
More information forthcoming; for now see http://beta.visl.sdu.dk/~eckhard/Artikeloversigt.html
-
Bikel
on English, Chinese; software description: Daniel M. Bikel. Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine, HLT 2002.
Charniak
on English, technical report: Eugene Charniak, A Maximum-Entropy-Inspired parser. Technical Report CS-99-12, Brown University, August. 1999.
on English, paper: Eugene Charniak, A Maximum-Entropy-Inspired Parser, NAACL 2000
Chiang
on Chinese: Daniel M. Bikel and David Chiang. 2000. Two Statistical Parsing Models Applied to the Chinese Treebank in the proceedings of the Second Chinese Language Processing Workshop
Collins
on English, paper: Michael John Collins, A New Statistical Parser Based on Bigram Lexical Dependencies, ACL 1996
on English, paper: Michael Collins, Three Generative, Lexicalised Models for Statistical Parsing, ACL 1997
on English, thesis: Head-Driven Statistical Models for Natural Language Parsing. PhD Dissertation, University of Pennsylvania, 1999.
on Czech; after transforming dependency trees to phrase structure trees; using case morphology: Michael Collins; Jan Hajic; Lance Ramshaw; Christoph Tillmann, A Statistical Parser for Czech, ACL 1999
on German; alternative probability model: sister head: Amit Dubey; Frank Keller, Probabilistic Parsing for German Using Sister-Head Dependencies, ACL 2003
on Spanish; using some morphology: Brooke Cowan and Michael Collins. Morphology and Reranking for the Statistical Parsing of Spanish, EMNLP 2005.
on French: Abhishek Arun, Frank Keller, Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French, ACL 2005
Covington
Michael A. Covington, Parsing discontinuous constituents with dependency grammar, Computational Linguistics 16:234-236, 1990
Michael A. Covington, A fundamental Algorithm for Dependency Parsing, Proceedings of the 39th Annual ACM Southeast Conference, ed. John A. Miller and Jeffrey W. Smith, pp. 95-102, ACM, 2001
Michael A. Covington, A free-word-order dependency parser in Prolog, 2003
Eisner
on English; paper: Eisner, Jason M, Three new probabilistic models for dependency parsing: An exploration. Proceedings of COLING-96, pp. 340-345, Copenhagen, August. (1996).
on English; technical report: Eisner, Jason M, An empirical comparison of probability models for dependency grammar. Technical report IRCS-96-11, Institute for Research in Cognitive Science, Univ. of Pennsylvania. 18 pp. 1996
algorithm; book chapter: Eisner, Jason (2000). Bilexical grammars and their cubic-time parsing algorithms. In Harry Bunt and Anton Nijholt (eds.), Advances in Probabilistic and Other Parsing Technologies, pp. 29-62. Kluwer, October.
Matsumoto
on Japanese, probabilistic: Taku Kudo; Yuji Matsumoto, Japanese Dependency Structure Analysis Based on Support Vector Machines, EMNLP/VLC 2000
on Japanese, deterministic: Taku Kudo, Yuji Matsumoto, Japanese Dependency Analysis using Cascaded Chunking, CoNLL 2002
on English, deterministic: Hiroyasu Yamada and Yuji Matsumoto, Statistical dependency analysis with Support Vector Machines, In Proc. 8th International Workshop on Parsing Technologies (IWPT), pp.195-206, April 2003.
McDonald
on English and Czech, projective: Ryan McDonald, Koby Crammer and Fernando Pereira, Online Large-Margin Training of Dependency Parsers, 43rd Annual Meeting of the Association for Computational Linguistics, 2005.
on Czech, non-projective: R. McDonald, F. Pereira, K. Ribarov and J. Hajic, Non-Projective Dependency Parsing using Spanning Tree Algorithms, HLT-EMNLP, 2005.
Nivre
on English, projective: Nivre, J. and Scholz, M., Deterministic Dependency Parsing of English Text. In Proceedings of COLING 2004, August 23-27, 2004, Geneva, Switzerland, pp. 64-70, 2004.
Scholz, M., Deterministic Dependency Parsing of English Text. Master's thesis, Växjö University, February 2005, Reports from MSI, Report 05011.on Swedish, projective: Nivre, J., Hall, J. and Nilsson, J., Memory-Based Dependency Parsing. In Ng, H. T. and Riloff, E. (eds.) Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL), May 6-7, 2004, Boston, Massachusetts, pp. 49-56, 2004.
on Czech, pseudo-projective: Nivre, J. and Nilsson, J., Pseudo-Projective Dependency Parsing. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 99-106, 2005.
Ratnaparkhi
on English: Adwait Ratnaparkhi, A Linear Observed Time Statistical Parser Based on Maximum Entropy Models, EMNLP 1997
Other Relevant Papers
Peter Neuhaus; Norbert Bröker, The Complexity of Recognition of Linguistically Adequate Dependency Grammars, ACL 1997
D. Lin. A Dependency-based Method for Evaluating Broad-Coverage Parsers. IJCAI 1995
also: Lin, 2000
More information
Provided by Masayuki ASAHARA (many thanks!) from NAIST in Japan. It is listed here "as is" until I find time to format it properly:> - Yuchang Cheng's work > Currently, we are developping a Chinese dependency analysis. > A recent work is in last year's SIGHAN workshop. > The parsing technique is based on Nivre's work: > > Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto > Chinese Deterministic Dependency Analyzer: Examining Effects > of Global > Features and Root Node Finder > SIGHAN-2005 > http://acl.ldc.upenn.edu/I/I05/I05-3003.pdf > > - Taku Kudo's work: CaboCha > his dependency analyzer released in the following web page > (unfortunately in Japanese): > http://www.chasen.org/~taku/software/cabocha/ > > His article in ACL-2003 is also dependency analyzer related work: > > Taku Kudo, Yuji Matsumoto > Fast Methods for Kernel-based Text Analysis > ACL-2003 > > - Isozaki's work > > Prof. Isozaki developed an English dependency analyzer. > The parsing technique is based on Yamada & Matsumoto's work: > > Hideki Isozaki, Hideto Kazawa, Tsutomu Hirao > A Deterministic Word Dependency Analyzer Enhanced With > Preference Learning COLING-2004 > > - Sassano's work > Sassano developed several Japanese dependency analyzers. > His parsing algorithm is similar with Nivre's model. > I heard that he found the algorithm without seeing Nivre's work. > > Manabu Sassano > Using a Partially Annotated Corpus to Build a Dependency Parser for > Japanese > IJCNLP-2005 > > Manabu Sassano > Linear-Time Dependency Analysis for Japanese > Coling-2004 > > - Uchimoto & Sekine's work > They also developed Japanese dependency analyzers. > > Kiyotaka Uchimoto, Satoshi Sekine, Hiroshi Isahara > Japanese dependency structure analysis based on maximum > entropy models EACL-1999 > > Satoshi Sekine, Kiyotaka Uchimoto, Hitoshi Isahara > Backward Beam Search Algorithm for Dependency Analysis of > Japanese COLING-2000 > > Satoshi Sekine > Japanese Dependency Analysis using a Deterministic Finite > State Transducer COLING-2000 > > - KNP > Participants in CoNLL-X shared task may by interested only in machine > learning-based parser. > But, there is a famous Japanese dependency analyzer --KNP-- > developed in > Univ. of Tokyo. > The parser is based on hand crafted rules. > http://www.kc.t.u-tokyo.ac.jp/nl-resource/knp.html