Automatic Translation of Source Language Noun Phrases Using a Raw Corpus of Target Language
AbstractA common problem concerning translation between every two languages is the existence of a sequence of nouns and adjectives for which several translations are possible, or noun phrases which are lexically or structurally ambiguous. The situation becomes even more critical when some of these nouns or adjectives in the sequence are also ambiguous regarding their part of speech. That is, an adjective may also be a noun or vice versa. This paper is an attempt to present an approach for analyzing noun phrase structure based on large unannotated corpora. The proposed method is a language independent one, an automatic approach based on co-occurrence frequencies of data in the raw corpora, and thus can be considered as an unsupervised learning method. The performance of the method has been evaluated through an experiment in which a sample corpus of 280 lexically ambiguous words (ambiguous between noun and adjective) in the framework of noun phrases are tested. The results obtained from this experiment show that the total error rate considering all types of ambiguous noun phrases is as low as 7.2%, with the overall program accuracy of 92.8% which is very promising.