W11-1212 |
Collection of Application to Parallel
|
Article Extraction
|
in Wikipedia . </title> Alexandre
|
N04-3001 |
problem by incorporating a new
|
article extraction
|
module using machine learning
|
N03-4008 |
translated for clustering after the
|
article extraction
|
phase . We use simple and fast
|
N04-3001 |
between the articles . If the
|
article extraction
|
component finds a title it is
|
W15-3701 |
results of annotation for people
|
article extraction
|
and matching . System segmentation
|
W15-3701 |
article seg - mentation , people
|
article extraction
|
, and article matching . For
|
N04-3001 |
six major phases : crawling ,
|
article extraction
|
, clustering , sum - marization
|
N04-3001 |
Title and date extraction The
|
article extraction
|
component also determines a title
|
P15-1084 |
Wikipedia . 3.1 Content Retrieval
|
Article Extraction
|
: Wikipedia provides an API7
|
W15-3701 |
article segmentation , people
|
article extraction
|
, and article matching was evaluated
|
W15-3701 |
segmentation evaluation . 4.2 Person
|
Article Extraction
|
We estimated the number of person
|
N04-3001 |
learning techniques . The new
|
article extraction
|
module parses HTML into blocks
|
W15-3701 |
segmentation and evaluation . 3.2 People
|
Article Extraction
|
We use the 15th edition gender
|
W15-3701 |
recall and precision for person
|
article extraction
|
for each edition are computed
|
N03-4008 |
Caption " , or " Other " . The
|
article extraction
|
component has been trained and
|
W15-3701 |
identified by the annotator . son
|
article extraction
|
, and matching . Note that the
|
N03-4008 |
the HTML pages , we use a new
|
article extraction
|
component using language-independent
|