Identifying Named Entities in Text Databases from the Natural History Domain.
Author(s): Caroline Sporleder, Marieke van Erp, Tijn Porcelijn, Antal van den Bosch and Pim Arntzen
Reference: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC-06), pp. 1742-1745, Genoa, Italy, 2006.
In this paper, we investigate whether it is possible to bootstrap a named
entity tagger for textual databases by exploiting the database structure to
automatically generate domain and database-specific gazetteer lists. We
compare three tagging strategies: (i) using the extracted gazetteers in a
look-up tagger, (ii) using the gazetteers to automatically extract training
data to train a database-specific tagger, and (iii) using a generic named
entity tagger. Our results suggest that automatically built gazetteers in
combination with a look-up tagger lead to a relatively good performance and
that generic taggers do not perform particularly well on this type of data.