lr,34-5-P03-1051,bq | expanded <term> vocabulary </term> and <term> | training corpus | </term> . The resulting <term> Arabic word | #4740 To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus . |