Validation of Automated Scores of Human Translation of Legal Texts against Non-test Indicators of Translation Competence

سمیه کرمی; داریوش نژادانصاری; اکبر حسابی

نویسندگان

سمیه کرمی دانشگاه اصفهان
داریوش نژادانصاری ^📧 دانشگاه اصفهان
اکبر حسابی دانشگاه اصفهان

چکیده

در این مطالعه به بررسی میزان اعتبار ابزارهای خودکار ارزیابی کیفیت ترجمه در نمره گذاری آزمون مترجمی رسمی در جمهوری اسلامی ایران به عنوان نمونه‌ای از متون حقوقی پرداخته شده است. بدین منظور، از شاخص‌های غیر آزمونی توانش ترجمه‌ای شرکنندگان در آزمون، از جمله نمرات آنها در دروس ترجمه اسناد و مدارک ۱و ۲، میانگین نمرات شرکنندگان در تمامی دروس ترجمه عملی در دوره کارشناسی و در نهایت معدل کل آنها در دوره‌ی کارشناسی رشته مترجمی زبان انگلیسی بهره گرفته شده است. با اینکه همبستگی میان نمرات خودکار و شاخص‌های غیرآزمونی توانش ترجمه‌ای بسیار پایین بود، اما میان نمرات مصححین متخصص انسانی و نمرات خودکار متونی که به عنوان متون نمونه از آزمون مترجمی رسمی گرفته شده‌اند، همبستگی بسیار بالا و معناداری وجود دارد. بنابراین، بر اساس تمامی داده‌های گردآوری شده و تحلیل‌های صورت گرفته می‌توان در نهایت این گونه نتیجه گرفت که مجموعه‌ ابزارهای خودکار ارزیابی کیفیت ترجمه “-PER, -TERp-A, BLUE-1, NIST-1, ROUGE-1, GTM-1”(که تمام انواع تکنینک‌های مبتنی بر شباهت واژگانی میان ترجمه‌های شرکت کنندگان و ترجمه‌های مرجع از جمله اختلاف ویرایشی، دقت، یادآوری، و معیار Fرا شامل می‌شود) را می‌توان به عنوان مجموعه فراارزیابی بهینه برای نمره گذاری آزمون‌های مترجمی رسمی در نظر گرفت.

کلمات راهنما:

نمره‌دهی خودکار, ارزیابی کیفیت ترجمه, روایی, ابزارهای خودکار ارزیابی کیفیت ترجمه مبتنی بر شباهت واژگانی, آزمون مترجمان رسمی قوۀ قضائیه در جمهوری اسلامی ایران

بیوگرافی نویسندگان

سمیه کرمی، دانشگاه اصفهان

دانشجوی دکتری ترجمه، گروه زبان و ادبیات انگلیسی، دانشکده زبان‌های خارجی، دانشگاه اصفهان، اصفهان، ایران؛

داریوش نژادانصاری، دانشگاه اصفهان

عضو هیئت علمی گروه زبان و ادبیات انگلیسی، دانشکدۀ زبان‌های خارجی، دانشگاه اصفهان، اصفهان، ایران؛

اکبر حسابی، دانشگاه اصفهان

عضو هیئت علمی گروه زبان و ادبیات انگلیسی، دانشکدۀ زبان‌های خارجی، دانشگاه اصفهان، اصفهان، ایران؛

مراجع

Banerjee, S., & Lavie, A. (2005). METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Proceedings of ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization.
Bowker, L. (2001). Towards a Methodology for a Corpus-Based Approach to Translation Evaluation. Meta: Translators' Journal, 46(2), 345-364.
Chapelle, C., Enright, M., & Jamieson, J. (2008). Test score interpretation and use. In C. Chapelle, M. Enright, & J. Jameison (Eds.), Building a validity argument for the Test of English as a Foreign LanguageTM. New York, NY: Routledge
Chatzikoumi, E. (n.d.). How to evaluate machine translation: A review of automated and human metrics. Natural Language Engineering, 1-25.
Coughlin, D. (2003). Correlating Automated and Human Assessments of Machine Translation Quality. Proceedings of Machine Translation Summit IX (pp. 23–27).
Doddington, G. (2002). Automatic Evaluation of Machine Translation Quality Using N-gram Co- Occurrence Statistics. Proceedings of the 2nd International Conference on Human Language Technology, (pp. 138–145).
Fellbaum, C. (Ed.). (1998). WordNet. An Electronic Lexical Database. The MIT Press
Gime ́nez Linares, J. A. (2009). Empirical Machine Translation and its Evaluation. TALP Research Center. Universitat Polite`cnica de Catalunya, Barcelona, Spain.
Giménez, J., & Màrquez, L. (2010). Linguistic measures for automatic machine translation evaluation. Mach Translat, 24, 209–240.
Gonz`alez, M., & Gim ́enez, J. (2014). Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation, Technical Manual, Version 3.0. Retrieved from TALP Research Center Project Management:
http://nlp.lsi.upc.edu/asiya/Asiya_technical_manual_v3.0.pdf
Kane, M. (2001). Validating high-stakes testing programs. Educational Measurement: Issues and Practice, 21(1), 31–35.
Levenshtein, V. I. (1966). Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 8(10), 707–710.
Lin, C.-Y., & Och, F. J. (2004). Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics.
M`arquez, L. (2013). Automatic Evaluation of Machine Translation Quality. Invited talk at Dialogue 2013. Bekasovo Resort, Russia: TALP Research Center, Technical University of Catalonia (UPC).
Melamed, I. D., Green, R., & Turian, J. (2003). Precision and Recall of Machine Translation. Proceedings of the Joint Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL).
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3-62.
Nießen, S., Och, F. J., Leusch, G., & Ney, H. (2000). An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research. Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC).
Olohan, M. (2004). Introducing Corpora in Translation Studies. New York: Routledge.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), (pp. 311-318). Philadelphia.
Saldanha, G., & O'Brien, S. (2014). Research Methodologies in Translation Studies. London and New York: Routledge, Taylor and Francis Group.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A Study of Translation Edit Rate with Targeted Human Annotation. Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA) (pp. 223–231).
Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., & Sawaf, H. (1997). Accelerated DP based Search for Statistical Translation. Proceedings of European Conference on Speech Communication and Technology.
Williams, J., & Chesterman, A. (2002). The Map: A Beginner's Guide to Doing Research in Translation Studies. Manchester: St. Jerome Publishing.
Weigle, S. C. (2011). Validation of Automated Scores of TOEFL iBT® Tasks Against Nontest Indicators of Writing Ability. TOEFL iBT® Research Report. ETS, Georgia State University, Atlanta.
Yang, Y., Buckendahl, C. W., Jusziewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15, 391–412.