The Role of Large Monolingual Corpora in Improving Machine Translation Quality
Abstract
Collocations, recurrent combinations of words whose co-occurrence probability is higher than chance, are frequent in natural languages. Since bilingual dictionaries do not offer proper equivalents for most of such collocations, the majority of machine translation systems perform poorly when faced with collocations, and as a result, their output quality decreases to a high degree. Monolingual corpora have recently been used in solving various linguistic problems including natural language processing, statistical machine translation, language teaching and the like. The present study describes the process of creating and exploiting a large monolingual corpus of Persian. This corpus enables us to solve the ambiguity problem of English collocations when translating into Persian with an English-Persian machine translation system. Using such a corpus as a target language corpus and an English-Persian bilingual dictionary, we study the efficiency of this corpus in finding the most appropriate Persian equivalents for English collocations in order to enhance the output quality of a machine translation system. The results of the experiment on a test corpus were very encouraging and achieved 90.83% success.`Published
2010-07-27
How to Cite
Mosavi Miangah, T. (2010). The Role of Large Monolingual Corpora in Improving Machine Translation Quality. Iranian Journal of Translation Studies, 8(29). Retrieved from https://journal.translationstudies.ir/ts/article/view/223
Issue
Section
Academic Research Paper
License
Copyright Licensee: Iranian Journal of Translation Studies. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0 license).