SlideShare a Scribd company logo
Pre and post editing
environment for Apertium




                                   Lluís Villarejo
                           Learning Technologies
                                     March 2012
c



                 What is GSoC?
• It's a global program that offers student developers stipends
  to write code for various open source software projects.
• Since 2005

• Inspire young developers to participate in OSS projects.
• Give students more exposure to real-world soft dev
  scenarios.
• Get more open source code created and released.
• Help open source prjs identify and bring in new developers.
c



             Some participants
•   Apache Soft. Found.   •   Sakai Foundation
•   Debian                •   Mozilla
•   Facebook              •   Inclusive Design Inst.
•   Drupal                •   The Linux Foundation
•   Creative Commons      •   The GNU project
•   DocBook project       •   Wikimedia Foundation
•   GCC                   •   WordPress
•   Gnome                 •   Inclusive Design Inst.
•   ...                   •   ...
c



                How does it work?
•   Orgs present themselves as mentoring agents.
•   Orgs present a list of potential projects and mentors.
•   Accepted orgs should try to attract students' interest.
•   Students build project proposals.
•   Google finances slots for each org (5.000 + 500 USD).
•   The project community decides the student-slot assignation.
•   Between end of May and end of August.
c



               GsoC'11 statistics
• $7.2M budget

• 1115 students accepted from 68 countries

• 2096 mentors and co-mentors from 55 countries

• 175 Open Source organizations

• 18.1% of students have participated in previous years

• 97 countries with student applicants

• 88% overall success rate
c



Accepted Students GSoC'11
c



Why participating with Apertium?
• Strategically:
   – Apertium is a strategic agent inside UOC.
   – Developing Apertium means further developing
     internationalization aids for UOC.
   – Attract and onboard new developers for Apertium.
   – Collaboration with Google's Open Source initiatives.

• Functionally:
   – Opporutnity to further develop specific UOC needs with
     external funding.
   – Capitalize specific user feedback on translation quality.
c



              The Apertium case
• 20 proposed tasks
• 17 tasks got interest from students [1-9]
   – Pre and post-editing environment gets 11 students
     interested.

• Apertium community ranks the 17 tasks
   – Pre and post-editing environment ranks 4th

• Google assigns 9 slots to Apertium (49.500 USD)
  – Our task goes through and Camille Mougey is selected
    from the Grenoble Insitute of Technology.
c



      Pre and post-editing, why?
• An important part of the errors you get when translating a
  document are due to deficiencies in the original.
• The integration of existing resources can help to ease this
  burden:
   – Digital knowledge sources (digital dictionaries... )
   – Automatic tools (spell-checker, grammar checker, translation
     memory generation, search & replace...)
• These processes should be integrated naturally in the
  translation workflow → the need for an integrated web interface
  to Apertium.
• To improve the system we need to have access to the human
  post-editing process.
c



     Pre and post-editing, features
•   Pre and Post-editing web interface integrated with Apertium translation toolbox.
•   Spell checking on source and target languages. Integration with Aspell
•   Grammar checking on source and target languages. Integration with
    LanguageTool
•   Integration with several external dictionaries.
•   Search & replace functionalities on source and target languages.
•   Ability to deal with formatted text.
•   Logging system. All events are logged as they happen, ie at the very moment
    the user inserts or deletes text. This allows for a further data mining process to
    be run on the logs to detect commonly modified structures or vocabulary.
•   Translation memory generation. Integration of Maligna.
•   PDF translation through pdftohtml
•   Image translation. Through tesseract.
                                                                        Final report 2010
                                                                        Final report 2011
c



        Results & learned lessons
• Fully functional environment, goals accomplished.
• Automatic availability of feedback on post-editing human
  behaviour.

•   Jointly defined task (flexible framework provided).
•   Interest in developing great empathy with the student.
•   Motivated and pro-active student.
•   Student engagement.
•   Very frequent feedback.
•   Mentoring team with access to ABSOLUTELY ALL the
    information regarding the project.
c



                   Further work
• Proof of concept accomplished.
• Base platform developed so further work can be easily
  added.
• Integration of other resources (more external dictionaries).
• Extension of currently used resources (addition of
  grammar rules, dictionaries improvement, format range
  extension).
• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
c



                    GsoC 2012




• Logging information mining to get deeper knowledge on
  the human post-editing process.
• Use of this mining process to improve Apertium translation
  engine.
• Post-edition over formatted text.
c




   Thanks
Questions & answers

More Related Content

Similar to Google Summer of Code 2011: UOC & Apertium

Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
Vladimir Vassilev
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
guest59ccff
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
Jody Garnett
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizaki
Satoru Kizaki
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languages
Danny Liu
 
Shirley Evans
Shirley EvansShirley Evans
Shirley Evans
Jisc
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
Jody Garnett
 

Similar to Google Summer of Code 2011: UOC & Apertium (20)

HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
A community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher educationA community of developers stimulating innovation in uk higher education
A community of developers stimulating innovation in uk higher education
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)Google: Summer of Code 2010 (SIP-Communicator)
Google: Summer of Code 2010 (SIP-Communicator)
 
Google Summer of Code
Google Summer of CodeGoogle Summer of Code
Google Summer of Code
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 
Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...Fostering pre-university student participation in OSGeo through the Google Co...
Fostering pre-university student participation in OSGeo through the Google Co...
 
International pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizakiInternational pbl conf_5b-c_kizaki
International pbl conf_5b-c_kizaki
 
Induction session
Induction sessionInduction session
Induction session
 
Venturing into the cloud
Venturing into the cloudVenturing into the cloud
Venturing into the cloud
 
CPSeis & GeoCraft
CPSeis & GeoCraftCPSeis & GeoCraft
CPSeis & GeoCraft
 
summer internship
summer internshipsummer internship
summer internship
 
Using technology to learn languages
Using technology to learn languagesUsing technology to learn languages
Using technology to learn languages
 
Open World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source WayOpen World Forum - The Agile and Open Source Way
Open World Forum - The Agile and Open Source Way
 
OER Authoring and Delivery Platforms
OER Authoring and Delivery PlatformsOER Authoring and Delivery Platforms
OER Authoring and Delivery Platforms
 
French Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source WayFrench Scrum User Group @Google - The Agile and Open Source Way
French Scrum User Group @Google - The Agile and Open Source Way
 
Software Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a ChangeSoftware Sustainability in e-Research: Dying for a Change
Software Sustainability in e-Research: Dying for a Change
 
Shirley Evans
Shirley EvansShirley Evans
Shirley Evans
 
Foundation Comparison
Foundation ComparisonFoundation Comparison
Foundation Comparison
 

More from Office of Learning Technologies, Universitat Oberta de Catalunya

More from Office of Learning Technologies, Universitat Oberta de Catalunya (20)

My uoc mobil
My uoc mobilMy uoc mobil
My uoc mobil
 
How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014How to design a mobile learning environement csedu 2014
How to design a mobile learning environement csedu 2014
 
Presentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprintPresentació Jornada Técnica #uoc-sprint
Presentació Jornada Técnica #uoc-sprint
 
Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)Introducció a la programació en android (recovered)
Introducció a la programació en android (recovered)
 
Diseño universal y personalización en entornos virtuales de aprendizaje para...
Diseño universal y personalización en entornos virtuales  de aprendizaje para...Diseño universal y personalización en entornos virtuales  de aprendizaje para...
Diseño universal y personalización en entornos virtuales de aprendizaje para...
 
2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users2.0 features in institutional repositories: The point of view of end-users
2.0 features in institutional repositories: The point of view of end-users
 
Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...Using the personas method to describe visually impaired students using an onl...
Using the personas method to describe visually impaired students using an onl...
 
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendacionesEstudiantes con discapacidad visual en la uoc y elearning: recomendaciones
Estudiantes con discapacidad visual en la uoc y elearning: recomendaciones
 
Augmented reality & cultural heritage eiasm 2013
Augmented reality & cultural heritage   eiasm 2013Augmented reality & cultural heritage   eiasm 2013
Augmented reality & cultural heritage eiasm 2013
 
Augmented reality, education & tourism
Augmented reality, education & tourism Augmented reality, education & tourism
Augmented reality, education & tourism
 
E-learning, tourism and augmented reality
E-learning, tourism and augmented realityE-learning, tourism and augmented reality
E-learning, tourism and augmented reality
 
Education and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritageEducation and augmented reality: the cultural heritage
Education and augmented reality: the cultural heritage
 
Augmented reality
Augmented reality   Augmented reality
Augmented reality
 
Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...Exploration in m-learning, two case studies: iPad application and web version...
Exploration in m-learning, two case studies: iPad application and web version...
 
Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:Laboratorio de Accesibilidad:
Laboratorio de Accesibilidad:
 
Iuoc mobile2.0 2011
Iuoc mobile2.0 2011Iuoc mobile2.0 2011
Iuoc mobile2.0 2011
 
iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011iUOC: enhanced mobile learning at UOC_EUNIS 2011
iUOC: enhanced mobile learning at UOC_EUNIS 2011
 
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...Mobile learning scenarios from a UCD perspective. Madness session presentatio...
Mobile learning scenarios from a UCD perspective. Madness session presentatio...
 
Gestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móvilesGestion de proyectos orientados a dispositivos móviles
Gestion de proyectos orientados a dispositivos móviles
 
Presentació o2
Presentació o2Presentació o2
Presentació o2
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 

Google Summer of Code 2011: UOC & Apertium

  • 1. Pre and post editing environment for Apertium Lluís Villarejo Learning Technologies March 2012
  • 2. c What is GSoC? • It's a global program that offers student developers stipends to write code for various open source software projects. • Since 2005 • Inspire young developers to participate in OSS projects. • Give students more exposure to real-world soft dev scenarios. • Get more open source code created and released. • Help open source prjs identify and bring in new developers.
  • 3. c Some participants • Apache Soft. Found. • Sakai Foundation • Debian • Mozilla • Facebook • Inclusive Design Inst. • Drupal • The Linux Foundation • Creative Commons • The GNU project • DocBook project • Wikimedia Foundation • GCC • WordPress • Gnome • Inclusive Design Inst. • ... • ...
  • 4. c How does it work? • Orgs present themselves as mentoring agents. • Orgs present a list of potential projects and mentors. • Accepted orgs should try to attract students' interest. • Students build project proposals. • Google finances slots for each org (5.000 + 500 USD). • The project community decides the student-slot assignation. • Between end of May and end of August.
  • 5. c GsoC'11 statistics • $7.2M budget • 1115 students accepted from 68 countries • 2096 mentors and co-mentors from 55 countries • 175 Open Source organizations • 18.1% of students have participated in previous years • 97 countries with student applicants • 88% overall success rate
  • 7. c Why participating with Apertium? • Strategically: – Apertium is a strategic agent inside UOC. – Developing Apertium means further developing internationalization aids for UOC. – Attract and onboard new developers for Apertium. – Collaboration with Google's Open Source initiatives. • Functionally: – Opporutnity to further develop specific UOC needs with external funding. – Capitalize specific user feedback on translation quality.
  • 8. c The Apertium case • 20 proposed tasks • 17 tasks got interest from students [1-9] – Pre and post-editing environment gets 11 students interested. • Apertium community ranks the 17 tasks – Pre and post-editing environment ranks 4th • Google assigns 9 slots to Apertium (49.500 USD) – Our task goes through and Camille Mougey is selected from the Grenoble Insitute of Technology.
  • 9. c Pre and post-editing, why? • An important part of the errors you get when translating a document are due to deficiencies in the original. • The integration of existing resources can help to ease this burden: – Digital knowledge sources (digital dictionaries... ) – Automatic tools (spell-checker, grammar checker, translation memory generation, search & replace...) • These processes should be integrated naturally in the translation workflow → the need for an integrated web interface to Apertium. • To improve the system we need to have access to the human post-editing process.
  • 10. c Pre and post-editing, features • Pre and Post-editing web interface integrated with Apertium translation toolbox. • Spell checking on source and target languages. Integration with Aspell • Grammar checking on source and target languages. Integration with LanguageTool • Integration with several external dictionaries. • Search & replace functionalities on source and target languages. • Ability to deal with formatted text. • Logging system. All events are logged as they happen, ie at the very moment the user inserts or deletes text. This allows for a further data mining process to be run on the logs to detect commonly modified structures or vocabulary. • Translation memory generation. Integration of Maligna. • PDF translation through pdftohtml • Image translation. Through tesseract. Final report 2010 Final report 2011
  • 11. c Results & learned lessons • Fully functional environment, goals accomplished. • Automatic availability of feedback on post-editing human behaviour. • Jointly defined task (flexible framework provided). • Interest in developing great empathy with the student. • Motivated and pro-active student. • Student engagement. • Very frequent feedback. • Mentoring team with access to ABSOLUTELY ALL the information regarding the project.
  • 12. c Further work • Proof of concept accomplished. • Base platform developed so further work can be easily added. • Integration of other resources (more external dictionaries). • Extension of currently used resources (addition of grammar rules, dictionaries improvement, format range extension). • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine.
  • 13. c GsoC 2012 • Logging information mining to get deeper knowledge on the human post-editing process. • Use of this mining process to improve Apertium translation engine. • Post-edition over formatted text.
  • 14. c Thanks Questions & answers