SlideShare a Scribd company logo
Bilingual Semantic Text Similarity
MT Evaluation
(Bilingual Evaluation)
Submitted By
Binod Kumar Mishra, Pursotam Kumar, Vaibhavi Kailash Kothadi
Mentor by
Ms. Ananya Mukherjee, Dr. Manish Shrivastava
11th Advanced Summer School on NLP
(IASNLP-2022)
IIIT Hyderabad
Contents
• Introduction
• Problem Statement
• Data
• Experiments
• Result & Analysis
• Conclusion & Future Work
1. Introduction
• Bilingual Semantic Text similarity [1]
• Measures the equivalence of meanings between two languages.
• With the bilingual distributed representation of words we try to capture semantic meaning.
• It can be divided into three phases, Preprocessing, Similarity Measure, and Score estimation
• It can be used in various NLP applications like Question-Answer, Machine Translation, and Information Extraction.
• Word level embedding and Sentence level embedding
• Word Level Embedding: Word2Vec, Glove, Fasttext, Elmo, BERT [2-8]
• Sentence Level Embedding: SBERT, LaBSE, LASER [9-11]
• Fine Tune COMET machine translation evaluation model[12-13]
• Pearson correlation using Normalized score and other sentence embedding approaches
2. Problem Statement
• Word level embedding vs Sentence level embedding
• Pre-trained COMET model vs Fine-tune COMET model
• Compare various sentence encoders using Pearson Correlation
3. Data
• We have collected dataset from [14] containing 1,00,000 parallel English and translated Hindi sentences to
fine-tune COMET machine translation model .
4. Experiment
Fig 1: Fine-tune COMET machine translation evaluation model
5. Result & Analysis
5. Result & Analysis (Cont.)
5. Result & Analysis (Cont.)
6. Conclusion & Future Work
In this report, we focus on three important things
• Embedding approach
• Fine tune COMET machine translation evaluation model
• Compare Correlation Pearson value between LaBSE, LASER, SBERT, and COMET. And concluded that LaBSE and SBERT give the
best result.
• The future work is that if we Fine-tune models with huge data then it may be possible that we get a better
result.
References
1. Shajalal, M. and M. Aono, Semantic textual similarity between sentences using bilingual word semantics. Progress in Artificial Intelligence, 2019. 8(2): p. 263-
272.
2. Mikolov, T., et al., Efficient estimation of word representations in vector space. 2013.
3. Mikolov, T., et al. Distributed representations of words and phrases and their compositionality. in Advances in neural information processing systems. 2013.
4. Pennington, J., R. Socher, and C.D. Manning. Glove: Global vectors for word representation. in Proceedings of the 2014 conference on empirical methods in
natural language processing (EMNLP). 2014.
5. Joulin, A., et al., Fasttext. zip: Compressing text classification models. 2016.
6. Bojanowski, P., et al., Enriching word vectors with subword information. 2017. 5: p. 135-146.
7. Ulčar, M. and M.J.a.p.a. Robnik-Šikonja, High quality ELMo embeddings for seven less-resourced languages. 2019.
8. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
9. Reimers, N. and I.J.a.p.a. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks. 2019.
10. Feng, F., et al., Language-agnostic bert sentence embedding. 2020.
11. Artetxe, M. and H.J.T.o.t.A.f.C.L. Schwenk, Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. 2019. 7: p. 597-610.
12. Rei, R., et al., COMET: A neural framework for MT evaluation. 2020.
13. Agirre, E., et al. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. 2015. Denver, Colorado: Association for
Computational Linguistics.
14. Kunchukuttan, A., et al., Ai4bharat-indicnlp corpus: Monolingual corpora and word embeddings for indic languages. 2020.
THANK YOU

More Related Content

Similar to ppt for IASNLP.pptx

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisClassification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
SHAILENDRA KUMAR SINGH
 
Sentiment analysis on Bangla conversation using machine learning approach
Sentiment analysis on Bangla conversation using machine  learning approachSentiment analysis on Bangla conversation using machine  learning approach
Sentiment analysis on Bangla conversation using machine learning approach
IJECEIAES
 
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
IJECEIAES
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
Leiden University
 
Cross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian languageCross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian language
ICDEcCnferenece
 
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
IJECEIAES
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
Lisa Graves
 
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun CompoundsUsing Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
idescitation
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
Ali Kabbadj
 
Journal of Computer Science Research | Vol.1, Iss.3
Journal of Computer Science Research | Vol.1, Iss.3Journal of Computer Science Research | Vol.1, Iss.3
Journal of Computer Science Research | Vol.1, Iss.3
Bilingual Publishing Group
 
Multi-modal NLP Systems in Healthcare
Multi-modal NLP Systems in HealthcareMulti-modal NLP Systems in Healthcare
Multi-modal NLP Systems in Healthcare
Jekaterina Novikova, PhD
 
INFT2060 Applied Artificial Intelligence.docx
INFT2060 Applied Artificial Intelligence.docxINFT2060 Applied Artificial Intelligence.docx
INFT2060 Applied Artificial Intelligence.docx
write4
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
ijnlc
 
Eurocall 2014 titova
Eurocall 2014 titovaEurocall 2014 titova
Eurocall 2014 titova
Moscow State University
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET Journal
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
csandit
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
IAESIJAI
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
eSAT Publishing House
 

Similar to ppt for IASNLP.pptx (20)

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisClassification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
 
Sentiment analysis on Bangla conversation using machine learning approach
Sentiment analysis on Bangla conversation using machine  learning approachSentiment analysis on Bangla conversation using machine  learning approach
Sentiment analysis on Bangla conversation using machine learning approach
 
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
Cross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian languageCross-domain sentiment analysis of the natural Romanian language
Cross-domain sentiment analysis of the natural Romanian language
 
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
Decision-Making Model for Student Assessment by Unifying Numerical and Lingui...
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
 
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun CompoundsUsing Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Journal of Computer Science Research | Vol.1, Iss.3
Journal of Computer Science Research | Vol.1, Iss.3Journal of Computer Science Research | Vol.1, Iss.3
Journal of Computer Science Research | Vol.1, Iss.3
 
Multi-modal NLP Systems in Healthcare
Multi-modal NLP Systems in HealthcareMulti-modal NLP Systems in Healthcare
Multi-modal NLP Systems in Healthcare
 
INFT2060 Applied Artificial Intelligence.docx
INFT2060 Applied Artificial Intelligence.docxINFT2060 Applied Artificial Intelligence.docx
INFT2060 Applied Artificial Intelligence.docx
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
Eurocall 2014 titova
Eurocall 2014 titovaEurocall 2014 titova
Eurocall 2014 titova
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 

Recently uploaded

MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 

Recently uploaded (20)

MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 

ppt for IASNLP.pptx

  • 1. Bilingual Semantic Text Similarity MT Evaluation (Bilingual Evaluation) Submitted By Binod Kumar Mishra, Pursotam Kumar, Vaibhavi Kailash Kothadi Mentor by Ms. Ananya Mukherjee, Dr. Manish Shrivastava 11th Advanced Summer School on NLP (IASNLP-2022) IIIT Hyderabad
  • 2. Contents • Introduction • Problem Statement • Data • Experiments • Result & Analysis • Conclusion & Future Work
  • 3. 1. Introduction • Bilingual Semantic Text similarity [1] • Measures the equivalence of meanings between two languages. • With the bilingual distributed representation of words we try to capture semantic meaning. • It can be divided into three phases, Preprocessing, Similarity Measure, and Score estimation • It can be used in various NLP applications like Question-Answer, Machine Translation, and Information Extraction. • Word level embedding and Sentence level embedding • Word Level Embedding: Word2Vec, Glove, Fasttext, Elmo, BERT [2-8] • Sentence Level Embedding: SBERT, LaBSE, LASER [9-11] • Fine Tune COMET machine translation evaluation model[12-13] • Pearson correlation using Normalized score and other sentence embedding approaches
  • 4. 2. Problem Statement • Word level embedding vs Sentence level embedding • Pre-trained COMET model vs Fine-tune COMET model • Compare various sentence encoders using Pearson Correlation
  • 5. 3. Data • We have collected dataset from [14] containing 1,00,000 parallel English and translated Hindi sentences to fine-tune COMET machine translation model .
  • 6. 4. Experiment Fig 1: Fine-tune COMET machine translation evaluation model
  • 7. 5. Result & Analysis
  • 8. 5. Result & Analysis (Cont.)
  • 9. 5. Result & Analysis (Cont.)
  • 10. 6. Conclusion & Future Work In this report, we focus on three important things • Embedding approach • Fine tune COMET machine translation evaluation model • Compare Correlation Pearson value between LaBSE, LASER, SBERT, and COMET. And concluded that LaBSE and SBERT give the best result. • The future work is that if we Fine-tune models with huge data then it may be possible that we get a better result.
  • 11. References 1. Shajalal, M. and M. Aono, Semantic textual similarity between sentences using bilingual word semantics. Progress in Artificial Intelligence, 2019. 8(2): p. 263- 272. 2. Mikolov, T., et al., Efficient estimation of word representations in vector space. 2013. 3. Mikolov, T., et al. Distributed representations of words and phrases and their compositionality. in Advances in neural information processing systems. 2013. 4. Pennington, J., R. Socher, and C.D. Manning. Glove: Global vectors for word representation. in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. 5. Joulin, A., et al., Fasttext. zip: Compressing text classification models. 2016. 6. Bojanowski, P., et al., Enriching word vectors with subword information. 2017. 5: p. 135-146. 7. Ulčar, M. and M.J.a.p.a. Robnik-Šikonja, High quality ELMo embeddings for seven less-resourced languages. 2019. 8. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. 9. Reimers, N. and I.J.a.p.a. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks. 2019. 10. Feng, F., et al., Language-agnostic bert sentence embedding. 2020. 11. Artetxe, M. and H.J.T.o.t.A.f.C.L. Schwenk, Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. 2019. 7: p. 597-610. 12. Rei, R., et al., COMET: A neural framework for MT evaluation. 2020. 13. Agirre, E., et al. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. 2015. Denver, Colorado: Association for Computational Linguistics. 14. Kunchukuttan, A., et al., Ai4bharat-indicnlp corpus: Monolingual corpora and word embeddings for indic languages. 2020.