Paramsearch: Wrapped progressive sampling for algorithmic parameter optimization
Download

Paramsearch is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License (version 3) as published by the Free Software Foundation

Documents

What does it do

On the basis of a data file containing a list of examples of some classification task, where each example is represented by a list of feature values and a class label, paramsearch searches for a combination of algorithmic parameters of a machine learning algorithm that it estimates to do well on unseen material from the same source as the input instance base. Paramsearch implements two heuristics for search in multi-dimensional algorithmic parameter spaces:

  1. cross-validated classifier wrapping, recombining parameter settings pseudo-exhaustively, for small data sets (less than 1000 instances);
  2. wrapped progressive sampling for larger data sets (>=1000 instances).

Contact

Comments and questions are welcome; please direct them to Antal.vdnBosch (at) uvt.nl.

Paramsearch works with

TiMBL TiMBL algorithms: IB1, IGTree, TRIBL2
Fambl Family-based learning
Ripper Rule learning, by William Cohen
C4.5 Decision tree induction, by J. Ross Quinlan
SNoW Sparse Networks of Winnows, by Dan Roth and colleagues. Also works with the perceptron implemented in SNoW
Maxent Maximum entropy toolkit, by Zhang Le
SVM-Light Support vector machines, by Thorsten Joachims

Graphical illustration of paramsearch learning curves

Paramsearch draws multiple learning curves in a metaphorical mountaineering competition. Each curve represents the generalization performance on heldout data of one combination of parameter settings at increasing amounts of training data. In the end, one setting wins, as other lower competing setting combinations are removed from the competition at regular intervals.

Antal.vdnBosch@uvt.nl | Last update: Tue Dec 5 2006