The Role of Large Monolingual Corpora in Improving Machine Translation Quality

Authors

  • Tayebeh Mosavi Miangah

Abstract

Collocations, recurrent combinations of words whose co-occurrence probability is higher than chance, are frequent in natural languages. Since bilingual dictionaries do not offer proper equivalents for most of such collocations, the majority of machine translation systems perform poorly when faced with collocations, and as a result, their output quality decreases to a high degree. Monolingual corpora have recently been used in solving various linguistic problems including natural language processing, statistical machine translation, language teaching and the like. The present study describes the process of creating and exploiting a large monolingual corpus of Persian. This corpus enables us to solve the ambiguity problem of English collocations when translating into Persian with an English-Persian machine translation system. Using such a corpus as a target language corpus and an English-Persian bilingual dictionary, we study the efficiency of this corpus in finding the most appropriate Persian equivalents for English collocations in order to enhance the output quality of a machine translation system. The results of the experiment on a test corpus were very encouraging and achieved 90.83% success.`

Published

2010-07-27

How to Cite

Mosavi Miangah, T. (2010). The Role of Large Monolingual Corpora in Improving Machine Translation Quality. Iranian Journal of Translation Studies, 8(29). Retrieved from https://journal.translationstudies.ir/ts/article/view/223

Issue

Section

Academic Research Paper