================================================================================
DE VRIES SUMMARIES
================================================================================

Version 1.0
October 2007

Sandra de Vries & Erwin Marsi
Dept of Communication & Information Sciences
Tilburg University
The Netherlands
e.c.marsi@uvt.nl

*** If you use this data, we would be interested to hear about it!  ***


--------------------------------------------------------------------------------
Abstract
--------------------------------------------------------------------------------

This is a collection of Dutch summarization data. The source text
consist of 6 short texts. For each text, the "clauses" were ranked on a
three point scale (very important, less important, unimportant) by 25
participants. In addition, the participants wrote as summary of the text. 


--------------------------------------------------------------------------------
Thesis directory
--------------------------------------------------------------------------------

This readme file provides an overview of the data. Details (e.g. the
exact instruction given to participants) are in the file
thesis/scriptie.pdf which is the thesis in Dutch. If you are not a
member of the select group of speakers of this endangered language, we
are happy to answer your questions :-)


--------------------------------------------------------------------------------
Texts directory
--------------------------------------------------------------------------------

The directory contains the six source texts. Each text is in a
separate file. Character encoding in utf-8. Texts are segmented in
Elementary Discourse Units (EDU's) according to Rhetorical Structure
Theory. Each line consist of a segment number, a tab char, and the
segment s's text.


--------------------------------------------------------------------------------
Rankings directory
--------------------------------------------------------------------------------

The units were ranked on a three point scale (very important, less
important, unimportant) by 25 participants. All participants were
native speakers of Dutch, varying in sex, age and level of
education. Each ranking is in a separate file.  Each line starts with
the participants number (1 to 25) followed by N rankings (depending on
the length of the source text). Columns are tab delimited.


--------------------------------------------------------------------------------
Abstracts directory
--------------------------------------------------------------------------------

This directory contains the raw texts (i.e. not tokenized and without
end-of-sentence boundaries) of the summaries produced by the same 25
subjects. Character encoding is utf-8.















