Timpute
Data correction through imputation

Timpute is a perl package based on the TiMBL software that self-corrects the contents of each cell in a database based on the rest of the database. Timpute is essentially a wrapper that processes the database and passes it piece by piece to TiMBL, whose output is parsed into a csv file again.

Features

  • Accepts csv databases
  • Auto-detection of zero or maximal entropy features
  • Choice between the generation of corrected columns vs. "arrogant" feature value replacement

Timpute is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.

About

Timpute is being developed as part of the MITCH (Mining for Information in Texts from Cultural Heritage), as part of ongoing research into automated cleaning and enrichment of textual databases. Timpute was originally conceived by  Antal van den Bosch and Caroline Sporleder, and is programmed by Steve Hunt. The MITCH project is funded by NWO, the Netherlands Organisation for Scientific Research, as part of the CATCH programme.

You can read about Timpute in the following paper:

  

Download and installation

To install, please follow these basic instructions:

  • Timpute relies on an installed version of Timbl version 6.1 (preferably 6.1.2).
  • The tarball will unpack ('tar zxvf timpute-0.3.tar.gz') in a directory called 'timpute-0.3'.

The easiest method to invoke Timpute is have it run with default settings and replace every cell in every column with its corrections. The required file format at this time is comma seperated values, with the first row containing column names, and every cell contained within doublequotes. To run Timpute on the sample file included, invoke

    ./timpute.pl -f reptile.csv -o reptile_timputed.csv -p

The command above specifies the input file as reptile.csv and the output file as reptile_timputed.csv, which will contain altered data cells changed by Timpute. The -p option specifies that Timpute should replace the contents of a cell if Timpute disagrees with the original value.

More options are listed by typing ./timpute.pl --help . See also the following files included in the package:

This is very much a beta version and as such may contain bugs or improperly working features. Comments or bug reports are welcome at: s.j.hunt@uvt.nl

Archived versions

http://software.ticc.uvt.nl/

Antal.vdnBosch@uvt.nl | Last update: