An Efficient Method to Add Chunker Rules in Persian to English Rule-based Apertium Machine Translation System

پریا رزم‌دیده; عباس علی آهنگر; سید مجتبی صباغ جعفری; غلام‌رضا حفاری

نویسندگان

پریا رزم‌دیده ^📧 دانشگاه ولیعصر رفسنجان
عباس علی آهنگر دانشگاه سیستان و بلوچستان
سید مجتبی صباغ جعفری دانشگاه ولیعصر رفسنجان
غلام‌رضا حفاری دانشگاه موناش

چکیده

ترجمۀ ماشینی قاعده-بنیاد اطلاعات زبان‌شناختیِ زبان‌های مبدا و مقصد را ضبط می‌کند. این اطلاعات از واژه‌نامه‌های (دو زبانه) و قواعد دستوری بازیابی شده‌اند. این پژوهش یک روش یادگیری فعال مخزن‌بنیاد در افزودن قواعد انتقال ساختاری در سطح عبارت پیشنهاد می‌دهد. بدین منظور، دو مجموعه آزمایش براساس دو نوع جمله از پیکرۀ انگلیسی-فارسی موازی میزان که به‌صورت دستی و تصادفی انتخاب می‌شوند، انجام می‌شود. نتایج به‌دست آمده نشان می‌دهد که قوانین تازه نوشته‌شدۀ عبارت‌ساز به فایل قواعد موجود با استفاده از روش یادگیری فعال منجر به تقویت بیشتر نظام ترجمه ماشینی حاضر نسبت به افزودن قواعد عبارت‌ساز به صورت تصادفی می‌گردد.

کلمات راهنما:

یادگیری فعال مخزن‌بنیاد, ترجمۀ ماشینی قاعده-بنیاد, اپرتیوم, قواعد عبارت‌ساز

بیوگرافی نویسندگان

پریا رزم‌دیده، دانشگاه ولیعصر رفسنجان

استادیار زبان‌شناسی، دانشگاه ولیعصر رفسنجان، ایران؛

عباس علی آهنگر، دانشگاه سیستان و بلوچستان

دانشیار زبان‌شناسی، دانشگاه سیستان و بلوچستان، ایران؛

سید مجتبی صباغ جعفری، دانشگاه ولیعصر رفسنجان

استادیار علوم کامپیوتر، دانشگاه ولیعصر رفسنجان، ایران؛

غلام‌رضا حفاری، دانشگاه موناش

دانشیار دانشکدۀ فناوری اطلاعات، دانشگاه موناش، استرالیا؛

مراجع

Anvari, H., & Ahmadi Givi, H. (2016). Persian Language Grammar (1). Fifth edition. Fatemi Publication.

Chen, A., Schein, L., & Ungar, M. (2006). An empirical study of the behaviour of active learning for word sense disambiguation. In Proceedings of HLT-NAACL06.

Esplà-Gomis, M., Carrasco, R. C., Sánchez-Cartagena, V. M., & Forcada, M. L. (2016). Assisting non-expert speakers of under-resourced languages in assigning stems and inflectional paradigms to new word entries of morphological dictionaries. Language Research, 1-29.

Esplà-Gomis, M., Sánchez-Cartagena, V. M., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Forcada, M. L., & Carrasco, RC. (2014). An eﬃcient method to assist non-expert users in extending dictionaries by assigning stems and inﬂectional paradigms to unknown words. In Proceedings of the 17th Annual Conference of the EAMT. Dubrovnik, Croatia, 19-29.

Esplà-Gomis, M., Sánchez-Cartagena, V. M., & Pérez-Ortiz, J. A. (2011a). Enlarging monolingual dictionaries for machine translation with active learning and non-expert users. In Proceedings of Recent Advances in NLP. Hissar, Bulgaria, 339– 346.

Farshidvard, Kh. (2005). Today detailed grammar: based on new linguistics including novel researches about phonetics, morphology and contemporary Persian syntax and comparing it with English and French grammatical rules. Sokhan publication.

Forcada, M. L., Bonev, B. I., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez, G. R., Sánchez-Martínez, F., Armentano-Pller, C., Montava, M. A., & Tyers. F. M. (2010). Documentation of the Open-Source Shallow-Transfer Machine translation Platform Apertium. Departament de Llenguatges i Sistemes Informàtics Universitat d’Alacant.

Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., & Tyers, F. M. (2011) Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 127-144.

Haffari, Gh., & Sarkar, A. (2009). Active learning for multilingual statistical machine translation, In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. Suntec. Singapore, 181–189.

Kamyar, T., & Omrani, G. (2006). Persian Language Grammar. Samt publication.

Khanlari, P. (1972). Persian Language Grammar. Tous Publication.

Lewis, D., & Gale. W. (1994). A sequential algorithm for training text classiﬁers. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval: ACM/Springer, 3–12.

Mahootian, S. (1997). Persian (Descriptive Grammars). London: Routledge.

McCallum, A., & Nigam, K. (1998). Employing EM in pool-based active learning for text classiﬁcation. In Proceedings of ICML, 359–367.

Meshkatadini, M. (2013). Persian Language Grammar based on Transformational Theory. Ferdowsi University of Mashhad Press (FUMP).

Papineni, K., Roukos, S., Ward, T., & Zhu, WJ. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on ACL. Philadelphia, Pennsylvania, USA, 311–318.

Popović, M., & Ney, H. (2007). Word error rates: Decomposition over POS classes and applications for error analysis. In Proceedings of Workshop on ACL.

Sánchez-Cartagena, V. M, Esplá-Gomis, M., Sánchez-Martíez, F., & Pérez-Ortiz, J. A. (2012a). Choosing the correct paradigm for unknown words in rule-based machine translation systems. In Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation. Gothenburg, Sweden, 27–39.

Sánchez-Cartagena, V. M., Esplá-Gomis, M., & Pérez-Ortiz J. A. (2012). Source Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation. In Proceedings of the Eight International Conference on LanguageInternational Conference on Language Resources and Evaluation. Istanbul, Turkey, 3422–3429.

Santner, T. J., William, B. J., & Notze, W. I. (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics.

Settles, B. (2010). Active Learning Literature Survey. Computer Science Technical Report 1648. University of Wisconsin-Madison.

Shen, D., Zhang, J., Zhou, G., Su, J., & Tan, C. (2003). Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain. In Proceedings of the ACL Workshop on Natural Language Processing. Biomedicine.

Supreme Council of Information and Communication Technology. (2013). Mizan English-Persian Parallel Corpus.Tehran. I.R. Iran. Retrieved from the website: http://dadegan.ir/catalog/mizan. Accessed 20 February 2016.

Thompson, C. A., Califf, M. E., & Mooney, R. J. (1999). Active Learning for Natural Language Parsing and Information Extraction. In Proceedings of the Sixteenth International Machine Learning Conference. Bled, Slovenia, 406-414

https://stackoverflow.com/questions/40542523/nltk-corpus-level-bleu-vs-sentence-level-bleu-score. Accessed 12 March 2017.

https://svn.code.sf.net/p/apertium/svn/incubator/apertium-pes-eng. Accessed 6 July 2017.