other,11-1-P03-1051,bq We approximate <term> Arabic 's rich morphology </term> by a <term> model </term> that a <term> word </term> consists of a sequence of <term> morphemes </term> in the <term> pattern </term><term> prefix * - stem-suffix * </term> ( * denotes zero or more occurrences of a <term> morpheme </term> ) .
other,20-5-P03-1051,bq To improve the <term> segmentation </term><term> accuracy </term> , we use an <term> unsupervised algorithm </term> for automatically acquiring new <term> stems </term> from a 155 million <term> word </term> <term> unsegmented corpus </term> , and re-estimate the <term> model parameters </term> with the expanded <term> vocabulary </term> and <term> training corpus </term> .
hide detail