knngraph is a set of rather trivial command line tools, written in Python , to visualize the nearest neighbours relations in a Timbl instance base.

Timbl (Tilburg Memory-based learner) is a machine learning program based on the principle of k nearest neighbour (knn) classification. That is, it can infer the class of new, unseen instances by looking for similarity with instances seen earlier, its so called nearest neigbours. I have played around for some time with ways to visualize the nearest neigbour relations among a given set of instances. Below is an example of the type of graphics that can be generated with the knngraph tools (with some manual tweaking of the colors though). This graph is produced on the basis of the training instances of the diminutive example, which comes with the Timbl distribution. The nodes represent the instances, labeled according the five classes (E, J, K, P, T) which occur as diminutive suffices of Dutch nouns. The arcs represent the nearest neigbour relations among the instances. These were obtained by both training and testing Timbl on the diminutive set (using default settings, except for k=2), and extracting the nearest neigbours from the output file (using Timbl's +vn option). The lay out and drawing of the graph was done by dot, which is part of the graphviz package.

WARNING: LARGE FILE! (7.8 MB) YOUR BROWSER MAY NOT BE ABLE TO RENDER THIS.

Another nice idea I have is to make the graph clickable. Graphviz can, in addition to images, also produce image maps for use in html, allowing you to associate arbitrary URL's with predefined regions of the image. This means that, in principle, you set things up in such way that clicking on a node shows the features of the corresponding instance in another frame, and clicking on an arc shows the corresponding distance. This seems to be a really nice way to explore an instance base. Unfortunately, the browsers I've tried so far (Safari, Mozilla, IE) cannot handle images and image maps this large. One alternative may be to chop up the graph in smaller, unconnected subgraphs. If anyone is interested in writing a special purpose tool for inspecting instance bases along these lines, please contact me.