SlideShare a Scribd company logo
1 of 41
Download to read offline
Reciprocal Enrichment 
    between Wikipedia and 
     Machine Translators
         OpenMT­2 project

             Mikel Iturbe
           Wikimania 2010 
           Gdańsk, Poland 



                   
languages in 
      wikipedia

           
Distribution of wikipedia
      articles by language
                                English
                                German
                                French
                                Polish
                                Italian
                                Japanese
                                Spanish
                                Dutch
                                Other




                 
Less than 1% of 
     languages have 
    more than 50% of 
         articles 
             
Can we ease good 
    article creation?  

              
How can we boost 
    article creation in 
          minority 
       languages?
              
OpenMT­2 project
     http://ixa.si.ehu.es/openmt2/



                    
What is it?

          
EHU, UPC and 
Basque wikipedians

         
Funded by the 
      Spanish 
     government
           
Free    

        
Hybrid Machine 
      Translation and 
    advanced evaluation 
          system
              
Hybrid?

        
Rule-based MT
                +
    Statistical post-editing

                
The aim: To teach the 
     existing MT to correct 
    it's own mistakes when 
           translating 
                
Using wikipedia

            
How?

       
(1)

      
Translate using 
      rule­based 
    Matxin­Opentrad
       http://opentrad.com/

                 
100 long articles
      es         eu

             
(2)

      
Correct Basque 
    output manually
            
(3)

      
Analyze logs

          
(4)

      
Make 
    improvements to 
     the MT system
            
     
Final test and 
       results

            
Tools

       
Google translator 
        toolkit

             
Specific help for wikipedia
            Not Free Software



                     
OmegaT
    http://omegat.org



             
Suitable to do the job
           Free software



                
What's in?

          
100 new and good 
      articles for the 
    Basque Wikipedia
              
Provide research 
        material

             
Walk towards a MT 
     system that can be 
    used in our wikipedia

               
Thank you.

         
Aurélio A. Heckert (source), David Vignoni (source), 
    Wilfredor (source), Tango project & Arkanosis (source) 
    , OmegaT project (source)




                      Image credits 
                               
e­mail: mikel@hamahiru.org

    User page: http://eu.wikipedia.org/wiki/Lankide:Janfri

    Address: http://hamahiru.org/media/wikimania2010.pdf



                                              contact 
                                    
Text licensed under
       cc­by­sa 3.0
    images maintain their original licenses


                        

More Related Content

Similar to Reciprocal Enrichment between Wikipedia and Machine Translators

Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
butest
 
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
Media & Learning Conference
 
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Anja Jentzsch
 
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov
 
Tools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWFTools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWF
Antelink
 
Wikipedia : Workshop
Wikipedia : WorkshopWikipedia : Workshop
Wikipedia : Workshop
NIFT
 

Similar to Reciprocal Enrichment between Wikipedia and Machine Translators (20)

TraduXio project - Cosi10
TraduXio project - Cosi10TraduXio project - Cosi10
TraduXio project - Cosi10
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
 
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
 
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020
 
Organising a GLAM wiki
Organising a GLAM wikiOrganising a GLAM wiki
Organising a GLAM wiki
 
Niatalk24jan10
Niatalk24jan10Niatalk24jan10
Niatalk24jan10
 
LIASCD_carriero
LIASCD_carrieroLIASCD_carriero
LIASCD_carriero
 
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
Olf2016
Olf2016Olf2016
Olf2016
 
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
 
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1
 
Tools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWFTools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWF
 
Wikipedia : Workshop
Wikipedia : WorkshopWikipedia : Workshop
Wikipedia : Workshop
 
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxWhy to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptx
 
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
 
Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community way
 
Models and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAPModels and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAP
 
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Reciprocal Enrichment between Wikipedia and Machine Translators