Use of linguistic features for improving English-Persian SMT

Zakieh Shakeri, Neda Noormohammadi, Shahram Khadivi, Noushin Riahi; Proceedings of KONVENS 2012 (Main track: oral presentations), pp. 165-173, September 2012.


In this paper, we investigate the effects of using linguistic information for improvement of statistical machine translation for English-Persian language pair. We choose POS tag as helping linguistic feature. A monolingual Persian corpus with POS tags is prepared and variety of tags is chosen to be small. Using the POS tagger trained on this corpus, we apply factored translation model .We also create manual reordering rules that try to harmonize the order of words in Persian and English languages. In the experiments, factored translation model shows better performance as compared to unfactored model. Also using the manual rules,which just contain few local reordering rules, increases the BLEU score as compared to monotone distortion model.

[pdf] [bibtex]