Thiago Castro Ferreira
PhD student at:
TiCC - University of Tilburg
Supervised by:
Emiel Krahmer /
Sander Wubben
Crucial for the coherence of the produced text
Given a context with two players: John and Benner
He had a good game at the plate for Hamilton A’s-Forcini.
He went 2-3, drove in one and scored one run.
vs.
Benner had a good game at the plate for Hamilton A’s-Forcini.
He went 2-3, drove in one and scored one run.
What is the most consistent text?
First decision of REG models
Goal: Take choices similar to human ones
Limitation: Available corpora have a unique referring expression for each situation
Is this choice wrong? Depends...
The use of a description does not necessarily mean that the use of a proper name is wrong.
Collection of more than one referring expression per situation
Link to the experiment
36 texts
12 news texts, 12 product reviews and 12 encyclopedic texts
78 participants
~ 20 per text
9588 referring expressions collected
Annotated according to 5 referential forms
Higher variation when the referent is...
Old in the text, and new in the sentence
Object of the sentence
Distant from its previous reference
Naive Bayes
$P(r_{k} \mid f) \propto P(r_{k}) \prod\limits_{i}^{|f|} P(f_{i} \mid r_{k})$
for each of $k$ referential forms
PoS: $f$ composed by the part-of-speech information from the previous and latter words
NB: $f$ composed by referential statuses, syntactic position and categorical recency
JSD | Accuracy | F-Score | |
---|---|---|---|
Random | 0.64 | 0.19 | 0.26 |
PoS | 0.36 | 0.67 | 0.66 |
NB | 0.31 | 0.76 | 0.74 |
NB+PoS | 0.33 | 0.72 | 0.73 |
JSD: Jensen–Shannon divergence
Accuracy and F-Score: measured based on the major referential form of each situation
Considerable amount of individual variation in the choice of referential form
Linguistic factors can distinguish situations with similar distributions of referential forms.
Future work: Besides referential statuses, syntactic position and recency, is there any other factor?