Spotting the 'Odd-one-out': Data-Driven Error Detection and Correction in Textual Databases.
Author(s): Caroline Sporleder, Marieke van Erp, Tijn Porcelijn and Antal van den Bosch
Reference: Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM-06), Trento, Italy, 2006.
We present two methods for semi-automatic detection and
correction of errors in textual databases.
The first method (horizontal correction) aims at
correcting inconsistent values within a database
record, while the second (vertical correction) focuses on
values which were entered in the wrong column. Both methods
are data-driven and
We utilise supervised machine learning, but the training data is
obtained automatically from the database; no manual annotation is
required. Our experiments show that a significant proportion of errors
can be detected by the two methods. Furthermore,
both methods were found to lead to a precision that is high
enough to make semi-automatic error correction feasible.