Identifying Named Entities in Text Databases from the Natural History Domain.

Author(s): Caroline Sporleder, Marieke van Erp, Tijn Porcelijn, Antal van den Bosch and Pim Arntzen

Reference: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC-06), pp. 1742-1745, Genoa, Italy, 2006.

Abstract: In this paper, we investigate whether it is possible to bootstrap a named entity tagger for textual databases by exploiting the database structure to automatically generate domain and database-specific gazetteer lists. We compare three tagging strategies: (i) using the extracted gazetteers in a look-up tagger, (ii) using the gazetteers to automatically extract training data to train a database-specific tagger, and (iii) using a generic named entity tagger. Our results suggest that automatically built gazetteers in combination with a look-up tagger lead to a relatively good performance and that generic taggers do not perform particularly well on this type of data.

[pdf]   [Publications]   [Home]