Spotting the 'Odd-one-out': Data-Driven Error Detection and Correction in Textual Databases.

Author(s): Caroline Sporleder, Marieke van Erp, Tijn Porcelijn and Antal van den Bosch

Reference: Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM-06), Trento, Italy, 2006.

Abstract: We present two methods for semi-automatic detection and correction of errors in textual databases. The first method (horizontal correction) aims at correcting inconsistent values within a database record, while the second (vertical correction) focuses on values which were entered in the wrong column. Both methods are data-driven and language-independent. We utilise supervised machine learning, but the training data is obtained automatically from the database; no manual annotation is required. Our experiments show that a significant proportion of errors can be detected by the two methods. Furthermore, both methods were found to lead to a precision that is high enough to make semi-automatic error correction feasible.

[pdf]   [Publications]   [Home]