روشی کارآمد در افزودن قواعد عبارتساز به نظام ترجمه ماشینی قاعده-بنیاد فارسی به انگلیسیِ اپرتیوم
چکیده
ترجمۀ ماشینی قاعده-بنیاد اطلاعات زبانشناختیِ زبانهای مبدا و مقصد را ضبط میکند. این اطلاعات از واژهنامههای (دو زبانه) و قواعد دستوری بازیابی شدهاند. این پژوهش یک روش یادگیری فعال مخزنبنیاد در افزودن قواعد انتقال ساختاری در سطح عبارت پیشنهاد میدهد. بدین منظور، دو مجموعه آزمایش براساس دو نوع جمله از پیکرۀ انگلیسی-فارسی موازی میزان که بهصورت دستی و تصادفی انتخاب میشوند، انجام میشود. نتایج بهدست آمده نشان میدهد که قوانین تازه نوشتهشدۀ عبارتساز به فایل قواعد موجود با استفاده از روش یادگیری فعال منجر به تقویت بیشتر نظام ترجمه ماشینی حاضر نسبت به افزودن قواعد عبارتساز به صورت تصادفی میگردد.
کلمات راهنما:
یادگیری فعال مخزنبنیاد, ترجمۀ ماشینی قاعده-بنیاد, اپرتیوم, قواعد عبارتسازمراجع
Anvari, H., & Ahmadi Givi, H. (2016). Persian Language Grammar (1). Fifth edition. Fatemi Publication.
Chen, A., Schein, L., & Ungar, M. (2006). An empirical study of the behaviour of active learning for word sense disambiguation. In Proceedings of HLT-NAACL06.
Esplà-Gomis, M., Carrasco, R. C., Sánchez-Cartagena, V. M., & Forcada, M. L. (2016). Assisting non-expert speakers of under-resourced languages in assigning stems and inflectional paradigms to new word entries of morphological dictionaries. Language Research, 1-29.
Esplà-Gomis, M., Sánchez-Cartagena, V. M., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Forcada, M. L., & Carrasco, RC. (2014). An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknown words. In Proceedings of the 17th Annual Conference of the EAMT. Dubrovnik, Croatia, 19-29.
Esplà-Gomis, M., Sánchez-Cartagena, V. M., & Pérez-Ortiz, J. A. (2011a). Enlarging monolingual dictionaries for machine translation with active learning and non-expert users. In Proceedings of Recent Advances in NLP. Hissar, Bulgaria, 339– 346.
Farshidvard, Kh. (2005). Today detailed grammar: based on new linguistics including novel researches about phonetics, morphology and contemporary Persian syntax and comparing it with English and French grammatical rules. Sokhan publication.
Forcada, M. L., Bonev, B. I., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez, G. R., Sánchez-Martínez, F., Armentano-Pller, C., Montava, M. A., & Tyers. F. M. (2010). Documentation of the Open-Source Shallow-Transfer Machine translation Platform Apertium. Departament de Llenguatges i Sistemes Informàtics Universitat d’Alacant.
Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., & Tyers, F. M. (2011) Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 127-144.
Haffari, Gh., & Sarkar, A. (2009). Active learning for multilingual statistical machine translation, In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. Suntec. Singapore, 181–189.
Kamyar, T., & Omrani, G. (2006). Persian Language Grammar. Samt publication.
Khanlari, P. (1972). Persian Language Grammar. Tous Publication.
Lewis, D., & Gale. W. (1994). A sequential algorithm for training text classifiers. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval: ACM/Springer, 3–12.
Mahootian, S. (1997). Persian (Descriptive Grammars). London: Routledge.
McCallum, A., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In Proceedings of ICML, 359–367.
Meshkatadini, M. (2013). Persian Language Grammar based on Transformational Theory. Ferdowsi University of Mashhad Press (FUMP).
Papineni, K., Roukos, S., Ward, T., & Zhu, WJ. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on ACL. Philadelphia, Pennsylvania, USA, 311–318.
Popović, M., & Ney, H. (2007). Word error rates: Decomposition over POS classes and applications for error analysis. In Proceedings of Workshop on ACL.
Sánchez-Cartagena, V. M, Esplá-Gomis, M., Sánchez-Martíez, F., & Pérez-Ortiz, J. A. (2012a). Choosing the correct paradigm for unknown words in rule-based machine translation systems. In Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation. Gothenburg, Sweden, 27–39.
Sánchez-Cartagena, V. M., Esplá-Gomis, M., & Pérez-Ortiz J. A. (2012). Source Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation. In Proceedings of the Eight International Conference on LanguageInternational Conference on Language Resources and Evaluation. Istanbul, Turkey, 3422–3429.
Santner, T. J., William, B. J., & Notze, W. I. (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics.
Settles, B. (2010). Active Learning Literature Survey. Computer Science Technical Report 1648. University of Wisconsin-Madison.
Shen, D., Zhang, J., Zhou, G., Su, J., & Tan, C. (2003). Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain. In Proceedings of the ACL Workshop on Natural Language Processing. Biomedicine.
Supreme Council of Information and Communication Technology. (2013). Mizan English-Persian Parallel Corpus.Tehran. I.R. Iran. Retrieved from the website: http://dadegan.ir/catalog/mizan. Accessed 20 February 2016.
Thompson, C. A., Califf, M. E., & Mooney, R. J. (1999). Active Learning for Natural Language Parsing and Information Extraction. In Proceedings of the Sixteenth International Machine Learning Conference. Bled, Slovenia, 406-414
https://stackoverflow.com/questions/40542523/nltk-corpus-level-bleu-vs-sentence-level-bleu-score. Accessed 12 March 2017.
https://svn.code.sf.net/p/apertium/svn/incubator/apertium-pes-eng. Accessed 6 July 2017.
چاپشده
ارجاع به مقاله
شماره
نوع مقاله
مجوز
Copyright Licensee: Iranian Journal of Translation Studies. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0 license).