An Efficient Method to Add Chunker Rules in Persian to English Rule-based Apertium Machine Translation System

Authors

  • Pariya Razmdideh 📧 Vali-e-Asr University of Rafsanjan
  • Abbas Ali Ahangar University of Sistan and Baluchestan
  • Seyed Mojtaba Sabbagh-Jafari Vali-e-Asr University of Rafsanjan
  • Gholamreza Haffari Monash University

Abstract

Rule-based machine translation (RBMT) captures linguistic information about the source and target languages. This information is retrieved from (bilingual) dictionaries and grammar rules. This paper proposes an active learning (AL) method to grow structural transfer rules at the chunker level. To this end, two sets of experiments are performed based on two types of sentences extracted from Mizan English-Persian Parallel Corpus which are selected manually and randomly. The results show adding newly written chunker rules to the transformation file using pool-based AL technique improves translation system more compared to a random chunker rule selection baseline.

Keywords:

Pool-based active learning, Rule-based machine translation, Apertium, Chunker rules

Author Biographies

Pariya Razmdideh, Vali-e-Asr University of Rafsanjan

Assistant Professor of Linguistics, Vali-e-Asr University of Rafsanjan, Iran;

Abbas Ali Ahangar, University of Sistan and Baluchestan

Associate Professor of Linguistics, University of Sistan and Baluchestan, Iran;

Seyed Mojtaba Sabbagh-Jafari, Vali-e-Asr University of Rafsanjan

Assistant Professor of Computer Engineering, Vali-e-Asr University of Rafsanjan, Iran;

Gholamreza Haffari, Monash University

Associate Professor at Faculty of Information Technology, Monash University, Australia;

References

Anvari, H., & Ahmadi Givi, H. (2016). Persian Language Grammar (1). Fifth edition. Fatemi Publication.

Chen, A., Schein, L., & Ungar, M. (2006). An empirical study of the behaviour of active learning for word sense disambiguation. In Proceedings of HLT-NAACL06.

Esplà-Gomis, M., Carrasco, R. C., Sánchez-Cartagena, V. M., & Forcada, M. L. (2016). Assisting non-expert speakers of under-resourced languages in assigning stems and inflectional paradigms to new word entries of morphological dictionaries. Language Research, 1-29.

Esplà-Gomis, M., Sánchez-Cartagena, V. M., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Forcada, M. L., & Carrasco, RC. (2014). An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknown words. In Proceedings of the 17th Annual Conference of the EAMT. Dubrovnik, Croatia, 19-29.

Esplà-Gomis, M., Sánchez-Cartagena, V. M., & Pérez-Ortiz, J. A. (2011a). Enlarging monolingual dictionaries for machine translation with active learning and non-expert users. In Proceedings of Recent Advances in NLP. Hissar, Bulgaria, 339– 346.

Farshidvard, Kh. (2005). Today detailed grammar: based on new linguistics including novel researches about phonetics, morphology and contemporary Persian syntax and comparing it with English and French grammatical rules. Sokhan publication.

Forcada, M. L., Bonev, B. I., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez, G. R., Sánchez-Martínez, F., Armentano-Pller, C., Montava, M. A., & Tyers. F. M. (2010). Documentation of the Open-Source Shallow-Transfer Machine translation Platform Apertium. Departament de Llenguatges i Sistemes Informàtics Universitat d’Alacant.

Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., & Tyers, F. M. (2011) Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 127-144.

Haffari, Gh., & Sarkar, A. (2009). Active learning for multilingual statistical machine translation, In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. Suntec. Singapore, 181–189.

Kamyar, T., & Omrani, G. (2006). Persian Language Grammar. Samt publication.

Khanlari, P. (1972). Persian Language Grammar. Tous Publication.

Lewis, D., & Gale. W. (1994). A sequential algorithm for training text classifiers. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval: ACM/Springer, 3–12.

Mahootian, S. (1997). Persian (Descriptive Grammars). London: Routledge.

McCallum, A., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In Proceedings of ICML, 359–367.

Meshkatadini, M. (2013). Persian Language Grammar based on Transformational Theory. Ferdowsi University of Mashhad Press (FUMP).

Papineni, K., Roukos, S., Ward, T., & Zhu, WJ. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on ACL. Philadelphia, Pennsylvania, USA, 311–318.

Popović, M., & Ney, H. (2007). Word error rates: Decomposition over POS classes and applications for error analysis. In Proceedings of Workshop on ACL.

Sánchez-Cartagena, V. M, Esplá-Gomis, M., Sánchez-Martíez, F., & Pérez-Ortiz, J. A. (2012a). Choosing the correct paradigm for unknown words in rule-based machine translation systems. In Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation. Gothenburg, Sweden, 27–39.

Sánchez-Cartagena, V. M., Esplá-Gomis, M., & Pérez-Ortiz J. A. (2012). Source Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation. In Proceedings of the Eight International Conference on LanguageInternational Conference on Language Resources and Evaluation. Istanbul, Turkey, 3422–3429.

Santner, T. J., William, B. J., & Notze, W. I. (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics.

Settles, B. (2010). Active Learning Literature Survey. Computer Science Technical Report 1648. University of Wisconsin-Madison.

Shen, D., Zhang, J., Zhou, G., Su, J., & Tan, C. (2003). Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain. In Proceedings of the ACL Workshop on Natural Language Processing. Biomedicine.

Supreme Council of Information and Communication Technology. (2013). Mizan English-Persian Parallel Corpus.Tehran. I.R. Iran. Retrieved from the website: http://dadegan.ir/catalog/mizan. Accessed 20 February 2016.

Thompson, C. A., Califf, M. E., & Mooney, R. J. (1999). Active Learning for Natural Language Parsing and Information Extraction. In Proceedings of the Sixteenth International Machine Learning Conference. Bled, Slovenia, 406-414

https://stackoverflow.com/questions/40542523/nltk-corpus-level-bleu-vs-sentence-level-bleu-score. Accessed 12 March 2017.

https://svn.code.sf.net/p/apertium/svn/incubator/apertium-pes-eng. Accessed 6 July 2017.

Published

2019-07-05

How to Cite

Razmdideh, P., Ahangar, A. A., Sabbagh-Jafari, S. M., & Haffari, G. (2019). An Efficient Method to Add Chunker Rules in Persian to English Rule-based Apertium Machine Translation System. Iranian Journal of Translation Studies, 17(65), 54–73. Retrieved from https://journal.translationstudies.ir/ts/article/view/629

Issue

Section

Academic Research Paper