We apply the
<term>
boosting method
</term>
to parsing the
<term>
Wall Street Journal treebank
</term>
.
#8159We apply the boosting method to parsing theWall Street Journal treebank.
measure(ment),14-8-J05-1003,ak
The new
<term>
model
</term>
achieved 89.75 %
<term>
F-measure
</term>
, a 13 % relative decrease in
<term>
F-measure error
</term>
over the
<term>
baseline model ’s
</term>
score of 88.2 % .
#8214The new model achieved 89.75% F-measure, a 13% relative decrease inF-measure error over the baseline model’s score of 88.2%.
measure(ment),6-8-J05-1003,ak
The new
<term>
model
</term>
achieved 89.75 %
<term>
F-measure
</term>
, a 13 % relative decrease in
<term>
F-measure error
</term>
over the
<term>
baseline model ’s
</term>
score of 88.2 % .
#8206The new model achieved 89.75%F-measure, a 13% relative decrease in F-measure error over the baseline model’s score of 88.2%.
model,18-8-J05-1003,ak
The new
<term>
model
</term>
achieved 89.75 %
<term>
F-measure
</term>
, a 13 % relative decrease in
<term>
F-measure error
</term>
over the
<term>
baseline model ’s
</term>
score of 88.2 % .
#8218The new model achieved 89.75% F-measure, a 13% relative decrease in F-measure error over thebaseline model ’s score of 88.2%.
model,2-3-J05-1003,ak
A second
<term>
model
</term>
then attempts to improve upon this initial
<term>
ranking
</term>
, using additional
<term>
features
</term>
of the
<term>
tree
</term>
as evidence .
#8056A secondmodel then attempts to improve upon this initial ranking, using additional features of the tree as evidence.
model,2-8-J05-1003,ak
The new
<term>
model
</term>
achieved 89.75 %
<term>
F-measure
</term>
, a 13 % relative decrease in
<term>
F-measure error
</term>
over the
<term>
baseline model ’s
</term>
score of 88.2 % .
#8202The newmodel achieved 89.75% F-measure, a 13% relative decrease in F-measure error over the baseline model’s score of 88.2%.
model,25-11-J05-1003,ak
We argue that the method is an appealing alternative — in terms of both simplicity and efficiency — to work on
<term>
feature selection methods
</term>
within
<term>
log-linear ( maximum-entropy ) models
</term>
.
#8295We argue that the method is an appealing alternative—in terms of both simplicity and efficiency—to work on feature selection methods withinlog-linear ( maximum-entropy ) models.
model,34-7-J05-1003,ak
The method combined the
<term>
log-likelihood under a baseline model
</term>
( that of Collins [ 1999 ] ) with evidence from an additional 500,000
<term>
features
</term>
over
<term>
parse trees
</term>
that were not included in the original
<term>
model
</term>
.
#8198The method combined the log-likelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000 features over parse trees that were not included in the originalmodel.
model,40-4-J05-1003,ak
The strength of our approach is that it allows a
<term>
tree
</term>
to be represented as an arbitrary set of
<term>
features
</term>
, without concerns about how these
<term>
features
</term>
interact or overlap and without the need to define a
<term>
derivation
</term>
or a
<term>
generative model
</term>
which takes these
<term>
features
</term>
into account .
#8115The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or agenerative model which takes these features into account.
other,10-3-J05-1003,ak
A second
<term>
model
</term>
then attempts to improve upon this initial
<term>
ranking
</term>
, using additional
<term>
features
</term>
of the
<term>
tree
</term>
as evidence .
#8064A second model then attempts to improve upon this initialranking, using additional features of the tree as evidence.
other,10-4-J05-1003,ak
The strength of our approach is that it allows a
<term>
tree
</term>
to be represented as an arbitrary set of
<term>
features
</term>
, without concerns about how these
<term>
features
</term>
interact or overlap and without the need to define a
<term>
derivation
</term>
or a
<term>
generative model
</term>
which takes these
<term>
features
</term>
into account .
#8085The strength of our approach is that it allows atree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account.
other,11-2-J05-1003,ak
The
<term>
base parser
</term>
produces a set of
<term>
candidate parses
</term>
for each
<term>
input sentence
</term>
, with associated
<term>
probabilities
</term>
that define an initial
<term>
ranking
</term>
of these
<term>
parses
</term>
.
#8039The base parser produces a set of candidate parses for eachinput sentence, with associated probabilities that define an initial ranking of these parses.
other,14-3-J05-1003,ak
A second
<term>
model
</term>
then attempts to improve upon this initial
<term>
ranking
</term>
, using additional
<term>
features
</term>
of the
<term>
tree
</term>
as evidence .
#8068A second model then attempts to improve upon this initial ranking, using additionalfeatures of the tree as evidence.
other,16-2-J05-1003,ak
The
<term>
base parser
</term>
produces a set of
<term>
candidate parses
</term>
for each
<term>
input sentence
</term>
, with associated
<term>
probabilities
</term>
that define an initial
<term>
ranking
</term>
of these
<term>
parses
</term>
.
#8044The base parser produces a set of candidate parses for each input sentence, with associatedprobabilities that define an initial ranking of these parses.
other,16-9-J05-1003,ak
The article also introduces a new
<term>
algorithm
</term>
for the
<term>
boosting approach
</term>
which takes advantage of the
<term>
sparsity
</term>
of the
<term>
feature space
</term>
in the
<term>
parsing data
</term>
.
#8242The article also introduces a new algorithm for the boosting approach which takes advantage of thesparsity of the feature space in the parsing data.
other,17-3-J05-1003,ak
A second
<term>
model
</term>
then attempts to improve upon this initial
<term>
ranking
</term>
, using additional
<term>
features
</term>
of the
<term>
tree
</term>
as evidence .
#8071A second model then attempts to improve upon this initial ranking, using additional features of thetree as evidence.
other,19-4-J05-1003,ak
The strength of our approach is that it allows a
<term>
tree
</term>
to be represented as an arbitrary set of
<term>
features
</term>
, without concerns about how these
<term>
features
</term>
interact or overlap and without the need to define a
<term>
derivation
</term>
or a
<term>
generative model
</term>
which takes these
<term>
features
</term>
into account .
#8094The strength of our approach is that it allows a tree to be represented as an arbitrary set offeatures, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account.
other,19-9-J05-1003,ak
The article also introduces a new
<term>
algorithm
</term>
for the
<term>
boosting approach
</term>
which takes advantage of the
<term>
sparsity
</term>
of the
<term>
feature space
</term>
in the
<term>
parsing data
</term>
.
#8245The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of thefeature space in the parsing data.
other,21-2-J05-1003,ak
The
<term>
base parser
</term>
produces a set of
<term>
candidate parses
</term>
for each
<term>
input sentence
</term>
, with associated
<term>
probabilities
</term>
that define an initial
<term>
ranking
</term>
of these
<term>
parses
</term>
.
#8049The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initialranking of these parses.
other,23-7-J05-1003,ak
The method combined the
<term>
log-likelihood under a baseline model
</term>
( that of Collins [ 1999 ] ) with evidence from an additional 500,000
<term>
features
</term>
over
<term>
parse trees
</term>
that were not included in the original
<term>
model
</term>
.
#8187The method combined the log-likelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000features over parse trees that were not included in the original model.