ILK Home People Publications MBLP book TiMBL MBT Other software Demos ROLAQUAD MITCH Other research projects Contact Links Nederlands
  

Introduction

The UvT Expert Collection is a test collection for use in 'enterprise search' experiments, similar to the TREC W3C collection. It was harvested from the Webwijs system developed at Tilburg University (UvT) in the Netherlands. Webwijs is a publicly accessible database of UvT employees who are involved in research or teaching; currently, Webwijs contains information about 1168 experts, each of whom has a page with contact information and, if made available by the expert, a research description and publications list. In addition, each expert can select expertise areas from a list of 1491 topics and can suggest new topics that need to be approved by the Webwijs editor. The majority of the collection was crawled in October 2006.

The UvT Expert Collection collection is interestingly different from the W3C collection in a number of ways:

  • it is clean, heterogeneous, structured, and focused, but comprises a limited number of documents
  • it contains hierarchical information from an organization
  • it is bilingual (English and Dutch)
  • the list of expertise areas of an individual are provided by the employees themselves
Check out the documentation for more detailed information about the contents and structure of the collection.

Obtaining the collection

Where possible the collection was pre-converted from proprietary formats to XML documents in such a way as to exactly preserve document structure and collection inter-relationships (in accordance with Hawking's recommendations). As a result, the compressed collection is 87MB in size (316MB uncompressed). Researchers can gain download access to the collection by registering here. By downloading the corpus you agree to the disclaimer. If you publish results obtained using the resources made available here, please include the following citation:

  • Broad Expertise Retrieval in Sparse Data Environments, K. Balog, T. Bogers, L. Azzopardi, M. de Rijke, and A. van den Bosch. In SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 551-558, 2007. [PDF]

 

© 2006, 2007 Tilburg University, Toine Bogers | Last update: Fri May 4 2007