Obtain data
We had agreements with many treebank providers allowing us to distribute versions of the treebanks in our data format to shared task participants. Now that the shared task is passed, this is no longer possible. Those who wish to use the shared task data in the future will have to acquire it from a variety of sources. This page explains how. If anything is unclear or you experience problems with any of the steps, please contact us (conll06st@uvt.nl). We hope that the resources we created will continue to be accessible and useful for multilingual dependency parsing research.
- Danish, Dutch, Portuguese, Swedish: These data sets can be downloaded here. That page also explains our overall data organisation.
- Arabic, Czech: These data sets are distributed by the Linguistic Data Consortium (LDC). Please contact them and ask for catalogue numbers LDC2006E01 (CoNLL Shared Task Training Data) and LDC2006E02 (CoNLL Shared Task Test Data). Our contact person at LDC for the shared task was Tony Castelletto. Note: These data sets require paying a license fee or being an LDC member in the relevant years.
- Bulgarian: Please contact Kiril Simov of the BulTreebank and ask for the following files: conll06_data_bulgarian_bultreebank_train.tar.bz2, conll06_data_bulgarian_bultreebank_test_blind.tar.bz2, conll06_data_bulgarian_bultreebank_test.tar.bz2
- Spanish: Please contact Ma Antonia Martí (amarti at ub.edu) of the Cast3LB treebank and ask for the following files: conll06_data_spanish_cast3lb_train.tar.bz2, conll06_data_spanish_cast3lb_test_blind.tar.bz2, conll06_data_spanish_cast3lb_test.tar.bz2
- German: If you have a valid TIGER treebank license, go to the password-protected treebank download page and download the file conll06_data_german_tiger_train+test.tar.bz2.
- Japanese: Verbmobil treebank:
Please acquire the license as explained on http://www.sfs.uni-tuebingen.de/en/tuebajs.shtml and download the following files: conll06_data_japanese_verbmobil_train.tar.bz2, conll06_data_japanese_verbmobil_test_blind.tar.bz2, conll06_data_japanese_verbmobil_test.tar.bz2
- Slovene: Please contact Tomaz Erjavec (tomaz.erjavec at ijs.si) of the SDT and ask for the following files: conll06_data_slovene_sdt_1.0_train.tar.bz2, conll06_data_slovene_sdt_1.0_test_blind.tar.bz2, conll06_data_slovene_sdt_1.0_test.tar.bz29
- Chinese: Sinica treebank: Please contact Academia Sinica and ask for the following files: conll06_data_chinese_sinica_train_v1.2.tar.bz2, conll06_data_chinese_sinica_test.tar.bz2. Our contact persons at Academia Sinica for the shared task were Yu-Ming Hsieh and Keh-Jiann Chen. Note: The Sinica treebank requires paying a license fee.
- Turkish: If you have a valid METU-Sabanci treebank license, go to the password-protected treebank download page and download the files: conll06_data_turkish_metu_sabanci_1.0_test_blind.tar.bz2, conll06_data_turkish_metu_sabanci_1.0_test.tar.bz2, conll06_data_turkish_metu_sabanci_1.0_train.tar.bz2
Still to do: Release software and instructions for creating these data sets yourself from the original treebanks.