Automatic Translation of Source Language Noun Phrases Using a Raw Corpus of Target Language

  • Tayebeh Mosavi Miangah


A common problem concerning translation between every two languages is the existence of a ‎sequence of nouns and adjectives for which several translations are possible, or noun phrases ‎which are lexically or structurally ambiguous. The situation becomes even more critical when some ‎of these nouns or adjectives in the sequence are also ambiguous regarding their part of speech. ‎That is, an adjective may also be a noun or vice versa. This paper is an attempt to present an ‎approach for analyzing noun phrase structure based on large unannotated corpora. The proposed ‎method is a language independent one, an automatic approach based on co-occurrence ‎frequencies of data in the raw corpora, and thus can be considered as an unsupervised learning ‎method. The performance of the method has been evaluated through an experiment in which a ‎sample corpus of 280 lexically ambiguous words (ambiguous between noun and adjective) in the ‎framework of noun phrases are tested. The results obtained from this experiment show that the ‎total error rate considering all types of ambiguous noun phrases is as low as 7.2%, with the overall ‎program accuracy of 92.8% which is very promising.‎