sábado, 29 de diciembre de 2012

ParaMetric: An Automatic Evaluation Metric for Paraphrasing

TECNICAS DE PARAFRASEO

There are a number of established methods for
extracting paraphrases from data. We describe
the following methods in this section and evaluate
them in the next:
• Pang et al. (2003) used syntactic alignment to
merge parse trees of multiple translations,
• Quirk et al. (2004) treated paraphrasing as
monolingual statistical machine translation,
• Bannard and Callison-Burch (2005) used
bilingual parallel corpora to extract paraphrases.
S
NP VP
NN
persons
AUX
were
CD
12
VP
VB
killed
S
NP VP
NN
people
VB
died
CD
twelve
VB
NP VP
CD NN
12
twelve
people
persons
...
were
...
died
...
killed
AUX VP
BEG END
12
twelve
people
persons
died
were killed
Tree 1 Tree 2
+
Parse Forest
Word Lattice
Merge
Linearize
Figure 2: Pang et al. (2003) created word graphs
by merging parse trees. Paths with the same start
and end nodes are treated as paraphrases.
Pang et al. (2003) use multiple translations to
learn paraphrases using a syntax-based alignment
algorithm, illustrated in Figure 2. Parse trees were
merged into forests by grouping constituents of the
same type (for example, the two NPs and two VPs
are grouped). Parse forests were mapped onto finite
state word graphs by creating alternative paths
for every group of merged nodes. Different paths
within the resulting word lattice are treated as paraphrases
of each other. For example, in the word lattice
in Figure 2, people were killed, persons died,
persons were killed, and people died are all possible
paraphrases of each other.
Quirk et al. (2004) treated paraphrasing as
“monolingual statistical machine translation.”
They created a “parallel corpus” containing pairs
of English sentences by drawing sentences with a
low edit distance from news articles that were written
about the same topic on the same date, but published
by different newspapers. They formulated
the problem of paraphrasing in probabilistic terms
in the same way it had been defined in the statistical
machine translation literature:
ˆ e2 = argmax
e2
p(e2|e1)
= argmax
e2
p(e1|e2)p(e2)
101




Challenges for Evaluating Paraphrases




Automatically
There are several problems inherent to automatically
evaluating paraphrases. First and foremost,
developing an exhaustive list of paraphrases for
any given phrase is difficult. Lin and Pantel (2001)
illustrate the difficulties that people have generating
a complete list of paraphrases, reporting that
they missed many examples generated by a system
that were subsequently judged to be correct. If
a list of reference paraphrases is incomplete, then
using it to calculate precision will give inaccurate
numbers. Precision will be falsely low if the system
produces correct paraphrases which are not in
the reference list. Additionally, recall is indeterminable
because there is no way of knowing how
many correct paraphrases exist.
There are further impediments to automatically
evaluating paraphrases. Even if we were able to
come up with a reasonably exhaustive list of paraphrases
for a phrase, the acceptability of each paraphrase
would vary depending on the context of
the original phrase (Szpektor et al., 2007). While
lexical and phrasal paraphrases can be evaluated
by comparing them against a list of known paraphrases
(perhaps customized for particular contexts),
this cannot be naturally done for structural
paraphrases which may transform whole sentences.
We attempt to resolve these problems by having
annotators indicate correspondences in pairs
of equivalent sentences. Rather than having people
enumerate paraphrases, we asked that they perform
the simper task of aligning paraphrases. After
developing these manual “gold standard alignments”
we can gauge how well different automatic
paraphrases are at aligning paraphrases within
equivalent sentences. By evaluating the performance
of paraphrasing techniques at alignment,
rather than at matching a list of reference paraphrases,
we obviate the need to have a complete
set of paraphrases.
We describe how sets of reference paraphrases
can be extracted from the gold standard alignments.
While these sets will obviously be fragmentary,
we attempt to make them more complete
by aligning groups of equivalent sentences rather
than only pairs. The paraphrase sets that we extract
are appropriate for the particular contexts. Moreover
they may potentially be used to study structural
paraphrases, although we do not examine that

No hay comentarios:

Publicar un comentario