11. ترجمه از قبل واحدهای
•Tokenization
•Compounding & Clitics & segmentation
[https://www.ibm.com/…]
پاک برفکن
1- wiper
2- Snow clean do
3- Snow eraser
جان+ا
دعوا+یمان
چهار+م
He’d -> he would | he had
She’s -> she is | she has
Compounding Clitics
[Zhou, et al, 2012]
7/31
12. ترجمه از قبل واحدهای
•Tokenization
•Compounding & Clitics & segmentation
•Stop word removal
[https://www.ibm.com/…]
[Zhou, et al, 2012]
7/31
13. ترجمه از قبل واحدهای
•Tokenization
•Compounding & Clitics & segmentation
•Stop word removal
•Stemming or lemmatization
[https://www.ibm.com/…]
[Zhou, et al, 2012]
7/31
14. ترجمه از قبل واحدهای
•Tokenization
•Compounding & Clitics & segmentation
•Stop word removal
•Stemming or lemmatization
•Term expansion
[https://www.ibm.com/…]
[Zhou, et al, 2012]
7/31
39. ماشین ترجمه
•(MT Systems)Machine Translation
•اخیر های سال در روش ترین محبوب
•ها روش:
•Neural MT, Statistical MT, Hybrid MT, Rule based MT
•دقت99%گوگل باAPIدرCLEF 2009
•“Can we take this as meaning that Google is going to solve the cross-language
translation resource quandary?”
[Brown, et al, 1990; Lopez, 2008]
14/31
72. منابع
[1] D. Zhou,T. Brailsford, M. Turan, V. Wade, and H. Ashman “Translation Techniques in Cross-
Language Information Retrieval,” ACM Comput. Surv, vol. 45, no. 44, 2012.
[2] A. Shakery and C. Zhai, “Leveraging comparable corpora for cross-lingual information
retrieval in resource-lean language pairs,” Inf. Retr. Boston., vol. 16, no. 1, pp. 1–29, Feb. 2013.
[3] E. Agirre, G. M. Di Nunzio, N. Ferro, T. Mandl, and C. Peters, CLEF 2008 : Ad Hoc Track
Overview. 2008.
[4] L. Ballestems and W. B. Croft, “Phrasal Translation and Query Expansion Techniques for
Cross-Language Information Retrieval,” no. Mi, pp. 84–91.
[5] P. F. Brown, J. Cocke, S. A. Della Pietra, V. J. Della Pietra, F. Jelinek, J. D. Lafferty, R. L.
Mercer, P. S. Roossin, and T. J. Watson, “A STATISTICALAPPROACH TO MACHINE
TRANSLATION,” vol. 16, no. 2, pp. 79–85, 1990.
30/31
73. منابع
[6] D. A. Hull, “Using Structured Queries for Disambiguation in Cross-Language Information
Retrieval Background : Cross-Language,” 1997.
[7] S. Karimi, A. Turpin, and F. Scholer, “English to Persian Transliteration,” pp. 255–266, 2006.
[8] A. Lopez, “Statistical Machine Translation,” vol. 40, no. 3, pp. 1–49, 2008.
[9] D. Maupertuis, “Across Languages : A Dictionary-Based Information Approach to
Multilingual,” pp. 49–57.
[10] Y. Wu, M. Schuster, Z. Chen, Q. V Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q.
Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, Ł. Kaiser, S. Gouws, Y. Kato, T.
Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A.
Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean, “Google ’ s Neural Machine
Translation System : Bridging the Gap between Human and Machine Translation,” pp. 1–23.
31/31