SlideShare a Scribd company logo
1 of 17
Download to read offline
PurePos – an open source
        morphological disambiguator
                       György Orosz, Attila Novák

                       {oroszgy, novak.attila}@itk.ppke.hu

 Pázmány Péter Catholic University, Faculty of Information Technology
           MTA-PPKE Language Technology Research Group


This work was partially supported by TÁMOP: 4.2.2/B – 10/1–2010–0014
Outline

  PurePos
    – Full morphological disambiguation (tag + lemma)
    – Integrated morphological analyzer




1) Need of a tagger with an integrated MA
2) Implementation, Contribution
3) Evaluation
Problems with agglutinating languages
• Small word coverage of the corpus
• Even 1000+ possible forms of a word
• Possibly huge tagset
  – absent tags
  – absent tag sequences
• Standalone lemmatization is not a good
  solution
Less-resourced languages
• Morphologically complex
• Lack of annotated corpora



Building an annotated corpus:
  1) Manually disambiguate/correct
  2) Train the tagger
  3) Tag some text
Web service scenario
• Need of a high precision tagging tool
• Noisy and unseen data
• Incremental training
What do we need?
• Full morphological disambiguation
    – Including lemmatization
•   Integrated morphological analyzer
•   Incremental training
•   Unicode support
•   Fast to train
•   Open source
•   Easy to use
Where to start?
• From scratch?
• Modifying an existing tool?
  –   TriTagger
  –   IceMorphy
  –   Apertium tagger
  –   HunPos
  –   OpenNLP
  –   ...
HunPos
Pros:                      Cons:
  – Trigram tagger (TnT)     – Only POS tagging
  – Beam search                (no lemmatization)
  – Clever tricks            – Implemented in
  – Contains a suffix          OCaml
    guesser                  – No support for
  – Employing a                Unicode
    morphological table      – No real MA
  – Fast to train and
    decode
Using the analyzer



          • Reducing the
            search space
          • Generating lemma
            candidates
Lemmatization

Morphological guesser
                           1) Generating
 E.g.:                       candidates
  Facebookjukba
                           2) Filter by POS tag
                           3) Select the most
                             probable one
Incremental training
Training                 Tagging
  1) Train the tagger    1) Load the model
  2) Save the model      2) Compile the model
  3) Load the model      3) Use the model for
  4) Add training data     tagging
    to the model
  5) Save the model
Evaluation

                       Accuracy
OpenNLP (perceptron)   97,16%
OpenNLP (maxent)       96.45%     POS tagging
PurePos (without MA)   98.14%     accuracy
PurePos (with MA)      98.99%



                                                 Accuracy
        Full disambiguation       Guesser        89.79%
        accuracy of PurePos       Guesser + MT   90.35%
                                  Guesser + MA   98.35%
Evaluation

POS tagging accuracy
Evaluation

Full disambiguation accuracy
Evaluation

Performance as a web service

               Lemmatization   Tagging   Combined
Baseline       90.58%          98.14%    89.79%
MT-10k         90.58%          98.14%    89.79%
MT-30k         90.58%          98.17%    89.81%
MT-100k        90.64%          98.30%    89.90%
MT-100k*       90.72%          98.39%    89.97%
PurePos        99.07%          98.99%    98.35%
PurePos
•   Reimplementation of HunPos
•   Deeply integrated MA
•   Full disambiguation
•   State-of-the-art accuracy
•   Full Unicode support
•   Incremental training
•   Open source
•   Easily extensible
Thank you!

http://nlpg.itk.ppke.hu/software/purepos

More Related Content

Similar to Purepos -- an open source morphological disambiguator

Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Pythonbotsplash.com
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMTJulia Kreutzer
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPykammeyer
 
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)Databricks
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlpPan Xiaotong
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)TANVIRAHMED611926
 
Investigating the Possibilities of Using SMT for Text Annotation
Investigating the Possibilities of Using SMT for Text AnnotationInvestigating the Possibilities of Using SMT for Text Annotation
Investigating the Possibilities of Using SMT for Text Annotationnlpg
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Vienna Data Science Group
 
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftUsing Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftGuhan Suriyanarayanan
 
Error handling in visual fox pro 9
Error handling in visual fox pro 9Error handling in visual fox pro 9
Error handling in visual fox pro 9Mike Feltman
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptxAmanBadesra1
 

Similar to Purepos -- an open source morphological disambiguator (20)

Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMT
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
MTM 2015
MTM 2015MTM 2015
MTM 2015
 
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
Investigating the Possibilities of Using SMT for Text Annotation
Investigating the Possibilities of Using SMT for Text AnnotationInvestigating the Possibilities of Using SMT for Text Annotation
Investigating the Possibilities of Using SMT for Text Annotation
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Chat adapted pos tagger for romanian language
Chat adapted pos tagger for romanian languageChat adapted pos tagger for romanian language
Chat adapted pos tagger for romanian language
 
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftUsing Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
 
Error handling in visual fox pro 9
Error handling in visual fox pro 9Error handling in visual fox pro 9
Error handling in visual fox pro 9
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Purepos -- an open source morphological disambiguator

  • 1. PurePos – an open source morphological disambiguator György Orosz, Attila Novák {oroszgy, novak.attila}@itk.ppke.hu Pázmány Péter Catholic University, Faculty of Information Technology MTA-PPKE Language Technology Research Group This work was partially supported by TÁMOP: 4.2.2/B – 10/1–2010–0014
  • 2. Outline PurePos – Full morphological disambiguation (tag + lemma) – Integrated morphological analyzer 1) Need of a tagger with an integrated MA 2) Implementation, Contribution 3) Evaluation
  • 3. Problems with agglutinating languages • Small word coverage of the corpus • Even 1000+ possible forms of a word • Possibly huge tagset – absent tags – absent tag sequences • Standalone lemmatization is not a good solution
  • 4. Less-resourced languages • Morphologically complex • Lack of annotated corpora Building an annotated corpus: 1) Manually disambiguate/correct 2) Train the tagger 3) Tag some text
  • 5. Web service scenario • Need of a high precision tagging tool • Noisy and unseen data • Incremental training
  • 6. What do we need? • Full morphological disambiguation – Including lemmatization • Integrated morphological analyzer • Incremental training • Unicode support • Fast to train • Open source • Easy to use
  • 7. Where to start? • From scratch? • Modifying an existing tool? – TriTagger – IceMorphy – Apertium tagger – HunPos – OpenNLP – ...
  • 8. HunPos Pros: Cons: – Trigram tagger (TnT) – Only POS tagging – Beam search (no lemmatization) – Clever tricks – Implemented in – Contains a suffix OCaml guesser – No support for – Employing a Unicode morphological table – No real MA – Fast to train and decode
  • 9. Using the analyzer • Reducing the search space • Generating lemma candidates
  • 10. Lemmatization Morphological guesser 1) Generating E.g.: candidates Facebookjukba 2) Filter by POS tag 3) Select the most probable one
  • 11. Incremental training Training Tagging 1) Train the tagger 1) Load the model 2) Save the model 2) Compile the model 3) Load the model 3) Use the model for 4) Add training data tagging to the model 5) Save the model
  • 12. Evaluation Accuracy OpenNLP (perceptron) 97,16% OpenNLP (maxent) 96.45% POS tagging PurePos (without MA) 98.14% accuracy PurePos (with MA) 98.99% Accuracy Full disambiguation Guesser 89.79% accuracy of PurePos Guesser + MT 90.35% Guesser + MA 98.35%
  • 15. Evaluation Performance as a web service Lemmatization Tagging Combined Baseline 90.58% 98.14% 89.79% MT-10k 90.58% 98.14% 89.79% MT-30k 90.58% 98.17% 89.81% MT-100k 90.64% 98.30% 89.90% MT-100k* 90.72% 98.39% 89.97% PurePos 99.07% 98.99% 98.35%
  • 16. PurePos • Reimplementation of HunPos • Deeply integrated MA • Full disambiguation • State-of-the-art accuracy • Full Unicode support • Incremental training • Open source • Easily extensible