SlideShare a Scribd company logo
1 of 28
Download to read offline
English To Hindi Statistical Machine
Translation System
Presented By
Nakul Sharma
Assistant Professor (IT)
SAE
Pune
Agenda

Introduction

Machine Translation Approaches

My Experiences

Design and Implementation of Statistical Machine
Translation System.

Conclusion

Q/A Session

Practical Sessions for demonstating the SMT concepts
Introduction

Machine Translation employs machines in
converting one natural language to another.

Statistical Machine Translation (SMT) involves
using statistical methods (Bayers Theorem,
Probability, etc.) in undertaking this conversion.
Machine Translation Approaches

Direct Based MT

Rule Based MT

Corpus Based MT

Knowledge Based MT
Why only Linux for SMT
1. There is no license issues
2. Softwares are available for free
3. Softwares can be used directly for research.
Basic Prerequisities
Linux commands
File structure
Installation of software on linux.
Knowledge about computational linguistics and
machine translation tasks
My Experiences with Linux (Ubuntu)
Installing Softwares

Installing Softwares
 sudo apt-get install <Name_of_Software> provided it
is available in the repository...
Downloading the binaries and then reading the
README files and and completing the installation.
 Using synaptic pakage manager (for certain
dependencies). Ubuntu 10 above do not have
synaptic package manager, so install them seperately.
 Using Ubuntu Software Center
My Experiences With SMT
Softwares

Statistical Machine Translation relies heavily on
the following:-
 Development of high quality parallel corpus
 Algorithms applied in LM/TM/Decoder
Transition from Windows

Linux Quite different.

Errors are sometimes difficult to understand
(Especially Windows Users)

Errors can occur during installation or they can
occur while training/running the software.
General Issues with Open Source
Softwares

Sometimes give unexpected results.

Quite dynamic in nature.

Documentation is too vague and when the
software/os version gets updated it is difficult to
find the clue as to what is to be done.

It is sometimes a challenge to run/install the
softwares as the person gets lost in the
dependencies.
SMT Software for Windows ?

Well, not so easy but some generic solutions do
exist:-
 Using Vmware to install and run Ubuntu system

Updating the system is slow (unless you are using high
configuration machines)

Moses for Win 7 (online support for running Moses for
Windows 7)

Cygwin for windows.
Latest Updates
Lots of changed since 2011 (Time of ME Thesis).
Moses has undergone a revamp of major
functionalities
SRILM is now available in 1.7 version
GIZA++ not much change is there
Effect of the Changes in
Implementation
The older methods may not work very accurately.
For example in case there is change in the directory
structure then the same command may not run
properly.
However, the generic steps remain the same. These
change only after a major version is released...
Design and Implementation of
Statistical Machine
Translation System
LM
TM
Decoder
SMT System Involves
1. Downloading the softwares
2. Installing the softwares
3. Preparing the corpus
4. Training the softwares
5. Testing the Softwares
6. Developing applications using softwares
7. Deploying the applications
Language Model
SRI 's LM
Predictes the probabilities of a target sentence
Translation Model
The Translation Model (TM) computes the
probability of source sentence ‘S’, for a given
target sentence ‘T’. (conditional probability)
GIZA++ for TM
Decoder
The decoder maximizes the probability of the
generated sentence.
Moses software is used for decoder.
Preperation of Data
Development of parallel corpus text having
following contents:-
1. One sentence per line.
2. All sentences of parallel corpus need to be in
lowercased.
3. Try to include simple sentences instead of long
complicated sentences at least initially.
Preperation of Data
1. Tokenizing the Corpus
2. Filtering out long sentences
3. Lowercasing data
All the above is done using training scripts available in
moses folder
Language Model (Command will
change according to the software
being used)
./ngram-count –order 3 –text
corpus_new4.lowercased.hi –lm hindi.lm –write
count.cnt (According to ME thesis, But for the
latest see LM documentation)
Translation Model
GIZA++ Training as given in Moses Manual
Moses
./train-factored-phrase-model.perl -scripts-root-dir
/home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts-
20110405-1055/ -root-dir . --corpus corpus_new5.loweredcased -f en -e hi
-lm
0:3:/home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-
scripts/scripts-20110405-1055/training/hindi_lm5.lm>& training_new5.out &
Few Names of Stalwards in SMT
Philip Kohen
Hieu Hong—Moses Specific
Christopher D Manning
Acknowledgements
Prof. Dr. Prateek Bhatia (ME Thesis Guide)
Mother and Father
References

My own experience in developing SMT system

Prateek Bhatia, Nakul Sharma, “English to Hindi
Statistical Machine Translation System”, ME
Thesis, Thapar University, Patiala, Punjab.

Softwares (Moses, GIZA++, SRILM).
Thank You
Any Questions or Comments...
Please feel free to drop me email at
nakul777@gmail.com.

More Related Content

What's hot

What's hot (20)

Intro to Multitasking
Intro to MultitaskingIntro to Multitasking
Intro to Multitasking
 
Operating Systems
Operating SystemsOperating Systems
Operating Systems
 
Automation macro recorder_ a_task_automation_tool
Automation macro recorder_ a_task_automation_toolAutomation macro recorder_ a_task_automation_tool
Automation macro recorder_ a_task_automation_tool
 
Fundamentals of operating system
Fundamentals of operating systemFundamentals of operating system
Fundamentals of operating system
 
Silberschatz / OS Concepts
Silberschatz /  OS Concepts Silberschatz /  OS Concepts
Silberschatz / OS Concepts
 
Linux optimization strategy plan by shiv
Linux optimization strategy plan by shivLinux optimization strategy plan by shiv
Linux optimization strategy plan by shiv
 
Batch operating system
Batch operating system Batch operating system
Batch operating system
 
operating system lecture notes
operating system lecture notesoperating system lecture notes
operating system lecture notes
 
Os Question Bank
Os Question BankOs Question Bank
Os Question Bank
 
Os solved question paper
Os solved question paperOs solved question paper
Os solved question paper
 
100-E
100-E100-E
100-E
 
Introduction to Operating Systems
Introduction to Operating SystemsIntroduction to Operating Systems
Introduction to Operating Systems
 
Multiprocessor scheduling 2
Multiprocessor scheduling 2Multiprocessor scheduling 2
Multiprocessor scheduling 2
 
Multiprocessor Scheduling
Multiprocessor SchedulingMultiprocessor Scheduling
Multiprocessor Scheduling
 
Operating System
Operating SystemOperating System
Operating System
 
Operating System
Operating SystemOperating System
Operating System
 
Operating System
Operating SystemOperating System
Operating System
 
Register
RegisterRegister
Register
 
Unit v
Unit vUnit v
Unit v
 
Revant Rastogi
Revant Rastogi Revant Rastogi
Revant Rastogi
 

Viewers also liked

Hindi and english language course in india
Hindi and english language course in indiaHindi and english language course in india
Hindi and english language course in indiaRosemounts
 
Best Institute for English & Hindi Language Courses in India
Best Institute for English & Hindi Language Courses in IndiaBest Institute for English & Hindi Language Courses in India
Best Institute for English & Hindi Language Courses in IndiaRosemounts
 
Books In A World Of Spoken Words
Books In A World Of Spoken WordsBooks In A World Of Spoken Words
Books In A World Of Spoken WordsMichael Bhaskar
 
Verbal tenses bach
Verbal tenses bachVerbal tenses bach
Verbal tenses bachmarelecodi
 
English (as a second language) learning at rural india
English (as a second language) learning at rural indiaEnglish (as a second language) learning at rural india
English (as a second language) learning at rural indiaAmit Jain
 
हिन्दी अँग्रेज़ी सब्दावली English hindi glossary
हिन्दी अँग्रेज़ी सब्दावली English hindi glossaryहिन्दी अँग्रेज़ी सब्दावली English hindi glossary
हिन्दी अँग्रेज़ी सब्दावली English hindi glossaryS P Singh
 
Role Of English In India
Role Of English In IndiaRole Of English In India
Role Of English In IndiaKrupali Lewade
 
Learning spoken Language.
Learning spoken Language.Learning spoken Language.
Learning spoken Language.RoberAgainst
 
Phrasal verbs list -200 most common
Phrasal verbs list -200 most commonPhrasal verbs list -200 most common
Phrasal verbs list -200 most commonAntonio Minharro
 
500 real-english-phrases
500 real-english-phrases500 real-english-phrases
500 real-english-phrasescagius81
 
AULA 09 - AULA DE REDACAO - EDITORIAL - OK
AULA 09 - AULA DE REDACAO - EDITORIAL  - OKAULA 09 - AULA DE REDACAO - EDITORIAL  - OK
AULA 09 - AULA DE REDACAO - EDITORIAL - OKMarcelo Cordeiro Souza
 

Viewers also liked (20)

Hindi and english language course in india
Hindi and english language course in indiaHindi and english language course in india
Hindi and english language course in india
 
Numbers in hindi
Numbers in hindiNumbers in hindi
Numbers in hindi
 
Best Institute for English & Hindi Language Courses in India
Best Institute for English & Hindi Language Courses in IndiaBest Institute for English & Hindi Language Courses in India
Best Institute for English & Hindi Language Courses in India
 
Week 4 - Political Economy
Week 4 - Political EconomyWeek 4 - Political Economy
Week 4 - Political Economy
 
Books In A World Of Spoken Words
Books In A World Of Spoken WordsBooks In A World Of Spoken Words
Books In A World Of Spoken Words
 
Verbal tenses bach
Verbal tenses bachVerbal tenses bach
Verbal tenses bach
 
Education
EducationEducation
Education
 
English (as a second language) learning at rural india
English (as a second language) learning at rural indiaEnglish (as a second language) learning at rural india
English (as a second language) learning at rural india
 
हिन्दी अँग्रेज़ी सब्दावली English hindi glossary
हिन्दी अँग्रेज़ी सब्दावली English hindi glossaryहिन्दी अँग्रेज़ी सब्दावली English hindi glossary
हिन्दी अँग्रेज़ी सब्दावली English hindi glossary
 
Hinglish
HinglishHinglish
Hinglish
 
Rapidex english speaking course
Rapidex english speaking courseRapidex english speaking course
Rapidex english speaking course
 
Role Of English In India
Role Of English In IndiaRole Of English In India
Role Of English In India
 
Learning spoken Language.
Learning spoken Language.Learning spoken Language.
Learning spoken Language.
 
Words (Synonyms & Antonyms)
Words (Synonyms & Antonyms)Words (Synonyms & Antonyms)
Words (Synonyms & Antonyms)
 
Indian English
Indian EnglishIndian English
Indian English
 
Indian english
Indian englishIndian english
Indian english
 
Editorial gênero
Editorial gêneroEditorial gênero
Editorial gênero
 
Phrasal verbs list -200 most common
Phrasal verbs list -200 most commonPhrasal verbs list -200 most common
Phrasal verbs list -200 most common
 
500 real-english-phrases
500 real-english-phrases500 real-english-phrases
500 real-english-phrases
 
AULA 09 - AULA DE REDACAO - EDITORIAL - OK
AULA 09 - AULA DE REDACAO - EDITORIAL  - OKAULA 09 - AULA DE REDACAO - EDITORIAL  - OK
AULA 09 - AULA DE REDACAO - EDITORIAL - OK
 

Similar to Session on machine translation batu 19 march2016

Transactional Roll-backs and upgrades [preview]
Transactional Roll-backs and upgrades [preview]Transactional Roll-backs and upgrades [preview]
Transactional Roll-backs and upgrades [preview]johngt
 
Introduction to system programming
Introduction to system programmingIntroduction to system programming
Introduction to system programmingsonalikharade3
 
Computer software and operating system
Computer software and operating systemComputer software and operating system
Computer software and operating systemsonykhan3
 
Perfect papers software
Perfect papers   softwarePerfect papers   software
Perfect papers softwareguest0a1ce99
 
Insight into progam execution ppt
Insight into progam execution pptInsight into progam execution ppt
Insight into progam execution pptKeerty Smile
 
Inroduction System Software -features Types
Inroduction System Software -features TypesInroduction System Software -features Types
Inroduction System Software -features TypesSwapnaliPawar27
 
Unit i (part2) b.sc
Unit i (part2)   b.scUnit i (part2)   b.sc
Unit i (part2) b.scHepsijeba
 
system software and application software
system software and application softwaresystem software and application software
system software and application softwareTallat Satti
 
Why software performance reduces with time?.pdf
Why software performance reduces with time?.pdfWhy software performance reduces with time?.pdf
Why software performance reduces with time?.pdfMike Brown
 
SPCC:System programming and compiler construction
SPCC:System programming and compiler constructionSPCC:System programming and compiler construction
SPCC:System programming and compiler constructionmohdumaira1
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translationStephen Peacock
 
Model Based System Random Test For Smart OS
Model Based System Random Test For Smart OSModel Based System Random Test For Smart OS
Model Based System Random Test For Smart OSLex Yu
 
Computer Software & It's types.
Computer Software &  It's types.Computer Software &  It's types.
Computer Software & It's types.Mohit Dhankher
 

Similar to Session on machine translation batu 19 march2016 (20)

Transactional Roll-backs and upgrades [preview]
Transactional Roll-backs and upgrades [preview]Transactional Roll-backs and upgrades [preview]
Transactional Roll-backs and upgrades [preview]
 
How to Translate from English to Khmer using Moses
How to Translate from English to Khmer using MosesHow to Translate from English to Khmer using Moses
How to Translate from English to Khmer using Moses
 
Introduction to system programming
Introduction to system programmingIntroduction to system programming
Introduction to system programming
 
Computer software and operating system
Computer software and operating systemComputer software and operating system
Computer software and operating system
 
Software 3
Software 3Software 3
Software 3
 
Perfect papers software
Perfect papers   softwarePerfect papers   software
Perfect papers software
 
ECTmorse
ECTmorseECTmorse
ECTmorse
 
Insight into progam execution ppt
Insight into progam execution pptInsight into progam execution ppt
Insight into progam execution ppt
 
Inroduction System Software -features Types
Inroduction System Software -features TypesInroduction System Software -features Types
Inroduction System Software -features Types
 
Unit i (part2) b.sc
Unit i (part2)   b.scUnit i (part2)   b.sc
Unit i (part2) b.sc
 
Perfect Papers Software
Perfect Papers   SoftwarePerfect Papers   Software
Perfect Papers Software
 
Computer Systems Hardware
Computer Systems   HardwareComputer Systems   Hardware
Computer Systems Hardware
 
system software and application software
system software and application softwaresystem software and application software
system software and application software
 
Why software performance reduces with time?.pdf
Why software performance reduces with time?.pdfWhy software performance reduces with time?.pdf
Why software performance reduces with time?.pdf
 
Mpi.net tutorial
Mpi.net tutorialMpi.net tutorial
Mpi.net tutorial
 
SPCC:System programming and compiler construction
SPCC:System programming and compiler constructionSPCC:System programming and compiler construction
SPCC:System programming and compiler construction
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Model Based System Random Test For Smart OS
Model Based System Random Test For Smart OSModel Based System Random Test For Smart OS
Model Based System Random Test For Smart OS
 
Computer Software & It's types.
Computer Software &  It's types.Computer Software &  It's types.
Computer Software & It's types.
 
Rapidly deploying software
Rapidly deploying softwareRapidly deploying software
Rapidly deploying software
 

More from Nakul Sharma

Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Nakul Sharma
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters Nakul Sharma
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a surveyMapping and visualization of source code a survey
Mapping and visualization of source code a surveyNakul Sharma
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a surveyMapping and visualization of source code a survey
Mapping and visualization of source code a surveyNakul Sharma
 
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
 A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code... A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...Nakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineeringNakul Sharma
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Nakul Sharma
 
Possibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering andPossibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering andNakul Sharma
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineeringNakul Sharma
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 

More from Nakul Sharma (10)

Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a surveyMapping and visualization of source code a survey
Mapping and visualization of source code a survey
 
Mapping and visualization of source code a survey
Mapping and visualization of source code a surveyMapping and visualization of source code a survey
Mapping and visualization of source code a survey
 
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
 A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code... A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
A Conceptual Dependency Graph Based Keyword Extraction Model for Source Code...
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...
 
Possibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering andPossibility of interdisciplinary research software engineering and
Possibility of interdisciplinary research software engineering and
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Session on machine translation batu 19 march2016

  • 1. English To Hindi Statistical Machine Translation System Presented By Nakul Sharma Assistant Professor (IT) SAE Pune
  • 2. Agenda  Introduction  Machine Translation Approaches  My Experiences  Design and Implementation of Statistical Machine Translation System.  Conclusion  Q/A Session  Practical Sessions for demonstating the SMT concepts
  • 3. Introduction  Machine Translation employs machines in converting one natural language to another.  Statistical Machine Translation (SMT) involves using statistical methods (Bayers Theorem, Probability, etc.) in undertaking this conversion.
  • 4. Machine Translation Approaches  Direct Based MT  Rule Based MT  Corpus Based MT  Knowledge Based MT
  • 5. Why only Linux for SMT 1. There is no license issues 2. Softwares are available for free 3. Softwares can be used directly for research.
  • 6. Basic Prerequisities Linux commands File structure Installation of software on linux. Knowledge about computational linguistics and machine translation tasks
  • 7. My Experiences with Linux (Ubuntu) Installing Softwares  Installing Softwares  sudo apt-get install <Name_of_Software> provided it is available in the repository... Downloading the binaries and then reading the README files and and completing the installation.  Using synaptic pakage manager (for certain dependencies). Ubuntu 10 above do not have synaptic package manager, so install them seperately.  Using Ubuntu Software Center
  • 8. My Experiences With SMT Softwares  Statistical Machine Translation relies heavily on the following:-  Development of high quality parallel corpus  Algorithms applied in LM/TM/Decoder
  • 9. Transition from Windows  Linux Quite different.  Errors are sometimes difficult to understand (Especially Windows Users)  Errors can occur during installation or they can occur while training/running the software.
  • 10. General Issues with Open Source Softwares  Sometimes give unexpected results.  Quite dynamic in nature.  Documentation is too vague and when the software/os version gets updated it is difficult to find the clue as to what is to be done.  It is sometimes a challenge to run/install the softwares as the person gets lost in the dependencies.
  • 11. SMT Software for Windows ?  Well, not so easy but some generic solutions do exist:-  Using Vmware to install and run Ubuntu system  Updating the system is slow (unless you are using high configuration machines)  Moses for Win 7 (online support for running Moses for Windows 7)  Cygwin for windows.
  • 12. Latest Updates Lots of changed since 2011 (Time of ME Thesis). Moses has undergone a revamp of major functionalities SRILM is now available in 1.7 version GIZA++ not much change is there
  • 13. Effect of the Changes in Implementation The older methods may not work very accurately. For example in case there is change in the directory structure then the same command may not run properly. However, the generic steps remain the same. These change only after a major version is released...
  • 14. Design and Implementation of Statistical Machine Translation System LM TM Decoder
  • 15.
  • 16. SMT System Involves 1. Downloading the softwares 2. Installing the softwares 3. Preparing the corpus 4. Training the softwares 5. Testing the Softwares 6. Developing applications using softwares 7. Deploying the applications
  • 17. Language Model SRI 's LM Predictes the probabilities of a target sentence
  • 18. Translation Model The Translation Model (TM) computes the probability of source sentence ‘S’, for a given target sentence ‘T’. (conditional probability) GIZA++ for TM
  • 19. Decoder The decoder maximizes the probability of the generated sentence. Moses software is used for decoder.
  • 20. Preperation of Data Development of parallel corpus text having following contents:- 1. One sentence per line. 2. All sentences of parallel corpus need to be in lowercased. 3. Try to include simple sentences instead of long complicated sentences at least initially.
  • 21. Preperation of Data 1. Tokenizing the Corpus 2. Filtering out long sentences 3. Lowercasing data All the above is done using training scripts available in moses folder
  • 22. Language Model (Command will change according to the software being used) ./ngram-count –order 3 –text corpus_new4.lowercased.hi –lm hindi.lm –write count.cnt (According to ME thesis, But for the latest see LM documentation)
  • 23. Translation Model GIZA++ Training as given in Moses Manual
  • 24. Moses ./train-factored-phrase-model.perl -scripts-root-dir /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts- 20110405-1055/ -root-dir . --corpus corpus_new5.loweredcased -f en -e hi -lm 0:3:/home/nakul/moses/mosesdecoder/trunk/scripts/training/moses- scripts/scripts-20110405-1055/training/hindi_lm5.lm>& training_new5.out &
  • 25. Few Names of Stalwards in SMT Philip Kohen Hieu Hong—Moses Specific Christopher D Manning
  • 26. Acknowledgements Prof. Dr. Prateek Bhatia (ME Thesis Guide) Mother and Father
  • 27. References  My own experience in developing SMT system  Prateek Bhatia, Nakul Sharma, “English to Hindi Statistical Machine Translation System”, ME Thesis, Thapar University, Patiala, Punjab.  Softwares (Moses, GIZA++, SRILM).
  • 28. Thank You Any Questions or Comments... Please feel free to drop me email at nakul777@gmail.com.