SlideShare a Scribd company logo
1 of 13
Download to read offline
Gathering and Organizing System for PErsonal
               Language Skills

                              G.O.S.PE.L.S.




Student: Enrico Zanardo
Supervisor: Prof. Vittore Casarosa
Free University of Bolzano-Bozen
8th October 2010
Goal

Provide appropriate documents to users based on their
language skills in English, Italian and German as
determined in accordance with guidelines provided by
the European Language Portfolio.
                                          EN
                                   DE             IT
Outline
●   Problems;
●   Proposed Solution;
●   Prototype & Results;
●   Conclusion;
Objective
EN-C1
                        EN-B1
IT-A2                           IT-C2
DE-B2
                DE-A2

IT-C2



EN-B2

IT-B1

DE-B2

DE-A2
Problems
1. Classify documents according to “GOSPELS rating
system” and match it to rating of the European Language
Portfolio (A1, A2, ..., C1, C2).


2. Know user's language skills for the three language
supported by the system (English, Italian and German).


3. Provide results in the three different languages
according to user's language skills in each language.
Solution to step 1
Frequency of
                 (Classify documents)
    most
  common                                    Docs
   words



                     Algorithm

                                        Level of complexity
    Part of                              of the document
  Speech of
   the word
Solution to step 2
 (user's language skills)
Match between
        Gospels Algorithm & ELP
Frequency of
    most
  common                                  Docs
   words


               Algorithm


                                      Level of complexity
                                       of the document
    Part of                 Range
                Template
  Speech of                Language
               Documents
   the word                 Levels
Example Results
                                        Italian
                                  Gospels Algorithm

       A1    A2            B1                           B2      C1      C2
4500                                                                         40.00

                                                                       35.72
4000                                                           34.09         35.00
                                                       31.88
3500
                                                                             30.00


3000                     25.51
            23.94                                                            25.00

2500

                                                                             20.00

2000

                                                                             15.00
  12.66
1500


                                                                             10.00
1000


                                                                             5.00
 500



  0                                                                          0.00
       A1    A2            B1                           B2      C1      C2


                       Rating    Known words   Words
Solution to step 3
 (three language results)
Prototype
                                          Apache Nutch 1.1
    Apache
    Solr 1.4                            LanguageLevel plug-in
  APACHE LUCENE

   INDEXER                                   TreeTagger
   SEARCHER

                                             Wiktionary            Internet
                         WEB-GUI
                                                                  “unibz.org”
                          J2EE
                          GOOGLE
                       TRANSLATOR API
                                             CRAWLER

APACHE TOMCAT 6.0




        DB
  Postgresql 8.4.4   USER Profile


                                             ARCH LINUX 2010.05
Conclusions and possible extensions
●   The prototype is stable and seems to work well.
    ●   Further testing required to improve and tune the algorithm
    ●   Further testing required to improve the matching with ELP

●   The architecture can easily support other languages
    ●   It needs the frequency of words in the new language
    ●   It needs the PoS tagger for the new language

●   The prototype can be easily modified to become an additional
    function of an existing digital library
    ●   It has to be embedded in the indexer
Thank-you
Danke                Grazie




        QUESTIONS?




                              demo?

More Related Content

Viewers also liked

Introduction to bibliometrics and Tools for organizing references
Introduction to bibliometrics and Tools for organizing referencesIntroduction to bibliometrics and Tools for organizing references
Introduction to bibliometrics and Tools for organizing referencesUta Grothkopf
 
Organizing Your Research: Using e-Shelf in Primo
Organizing Your Research: Using e-Shelf in PrimoOrganizing Your Research: Using e-Shelf in Primo
Organizing Your Research: Using e-Shelf in Primoemefields
 
Taking better notes powerpoint
Taking better notes powerpointTaking better notes powerpoint
Taking better notes powerpointjoklemm
 
From Clutter to Clear: A Guide to Getting Things Done (Part I)
From Clutter to Clear: A Guide to Getting Things Done (Part I)From Clutter to Clear: A Guide to Getting Things Done (Part I)
From Clutter to Clear: A Guide to Getting Things Done (Part I)Greg Robleto
 
Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...Ilia Bider
 
Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...
Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...
Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...Hrishikesh Jobanputra
 
Note taking and note making presentation
Note taking and note making presentationNote taking and note making presentation
Note taking and note making presentationAmmiBermudez
 
insurance sales planning and organizing
insurance sales planning and organizinginsurance sales planning and organizing
insurance sales planning and organizingRommel Ortega
 

Viewers also liked (10)

Introduction to bibliometrics and Tools for organizing references
Introduction to bibliometrics and Tools for organizing referencesIntroduction to bibliometrics and Tools for organizing references
Introduction to bibliometrics and Tools for organizing references
 
Ref works 2011
Ref works 2011Ref works 2011
Ref works 2011
 
Organizing Your Research: Using e-Shelf in Primo
Organizing Your Research: Using e-Shelf in PrimoOrganizing Your Research: Using e-Shelf in Primo
Organizing Your Research: Using e-Shelf in Primo
 
Taking better notes powerpoint
Taking better notes powerpointTaking better notes powerpoint
Taking better notes powerpoint
 
From Clutter to Clear: A Guide to Getting Things Done (Part I)
From Clutter to Clear: A Guide to Getting Things Done (Part I)From Clutter to Clear: A Guide to Getting Things Done (Part I)
From Clutter to Clear: A Guide to Getting Things Done (Part I)
 
Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...Towards Process Improvement for Case Management. An Outline Based on Viable S...
Towards Process Improvement for Case Management. An Outline Based on Viable S...
 
Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...
Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...
Introduction to Getting Things Done (GTD) & Personal Productivity Ninja - The...
 
Baby Steps to Note-Taking for Consecutive Interpreting
Baby Steps to Note-Taking for Consecutive InterpretingBaby Steps to Note-Taking for Consecutive Interpreting
Baby Steps to Note-Taking for Consecutive Interpreting
 
Note taking and note making presentation
Note taking and note making presentationNote taking and note making presentation
Note taking and note making presentation
 
insurance sales planning and organizing
insurance sales planning and organizinginsurance sales planning and organizing
insurance sales planning and organizing
 

Similar to Gathering and Organizing System for PErsonal Language Skills - GOSPELS

Ti1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming LinguisticsTi1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming LinguisticsEelco Visser
 
201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2Pedro Gallardo
 
Architecting Domain-Specific Languages
Architecting Domain-Specific LanguagesArchitecting Domain-Specific Languages
Architecting Domain-Specific LanguagesMarkus Voelter
 
Domain Specific Language Design
Domain Specific Language DesignDomain Specific Language Design
Domain Specific Language DesignMarkus Voelter
 
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation FrameworkBL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation FrameworkIMPACT Centre of Competence
 
What’s New and Hot in .NET 4.0
What’s New and Hot in .NET 4.0What’s New and Hot in .NET 4.0
What’s New and Hot in .NET 4.0Jess Chadwick
 
plone.app.multilingual
plone.app.multilingual plone.app.multilingual
plone.app.multilingual Ramon Navarro
 
Language Engineering With Xtext
Language Engineering With XtextLanguage Engineering With Xtext
Language Engineering With XtextSven Efftinge
 
.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET
.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET
.NET Fest 2019. Сергей Корж. Natural Language Processing in .NETNETFest
 
Envisioning the Future of Language Workbenches
Envisioning the Future of Language WorkbenchesEnvisioning the Future of Language Workbenches
Envisioning the Future of Language WorkbenchesMarkus Voelter
 
Presentatie eluxis ids_r1_f_sls
Presentatie eluxis ids_r1_f_slsPresentatie eluxis ids_r1_f_sls
Presentatie eluxis ids_r1_f_slskdegrauw
 
28 accessible digital office document (adod) project
28 accessible digital office document (adod) project28 accessible digital office document (adod) project
28 accessible digital office document (adod) projectAEGIS-ACCESSIBLE Projects
 
Domain specific languages and Scala
Domain specific languages and ScalaDomain specific languages and Scala
Domain specific languages and ScalaFilip Krikava
 
High Level Application Scripting With EFL and LuaJIT
High Level Application Scripting With EFL and LuaJITHigh Level Application Scripting With EFL and LuaJIT
High Level Application Scripting With EFL and LuaJITSamsung Open Source Group
 
Adapting Apache OpenOffice for adoption in Regione Emilia-Romagna
Adapting Apache OpenOffice for adoption in Regione Emilia-RomagnaAdapting Apache OpenOffice for adoption in Regione Emilia-Romagna
Adapting Apache OpenOffice for adoption in Regione Emilia-RomagnaGiovanni Grazia
 

Similar to Gathering and Organizing System for PErsonal Language Skills - GOSPELS (20)

Ti1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming LinguisticsTi1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming Linguistics
 
201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2201103 cuore forms2_adf v0.2
201103 cuore forms2_adf v0.2
 
Catalan daily goes Catalan
Catalan daily goes CatalanCatalan daily goes Catalan
Catalan daily goes Catalan
 
Ti1220 Lecture 1
Ti1220 Lecture 1Ti1220 Lecture 1
Ti1220 Lecture 1
 
Architecting Domain-Specific Languages
Architecting Domain-Specific LanguagesArchitecting Domain-Specific Languages
Architecting Domain-Specific Languages
 
Introduction to F#
Introduction to F#Introduction to F#
Introduction to F#
 
Domain Specific Language Design
Domain Specific Language DesignDomain Specific Language Design
Domain Specific Language Design
 
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation FrameworkBL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
 
What’s New and Hot in .NET 4.0
What’s New and Hot in .NET 4.0What’s New and Hot in .NET 4.0
What’s New and Hot in .NET 4.0
 
Bne impact iif
Bne impact iifBne impact iif
Bne impact iif
 
plone.app.multilingual
plone.app.multilingual plone.app.multilingual
plone.app.multilingual
 
Language Engineering With Xtext
Language Engineering With XtextLanguage Engineering With Xtext
Language Engineering With Xtext
 
.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET
.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET
.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET
 
Envisioning the Future of Language Workbenches
Envisioning the Future of Language WorkbenchesEnvisioning the Future of Language Workbenches
Envisioning the Future of Language Workbenches
 
Presentatie eluxis ids_r1_f_sls
Presentatie eluxis ids_r1_f_slsPresentatie eluxis ids_r1_f_sls
Presentatie eluxis ids_r1_f_sls
 
28 accessible digital office document (adod) project
28 accessible digital office document (adod) project28 accessible digital office document (adod) project
28 accessible digital office document (adod) project
 
Domain specific languages and Scala
Domain specific languages and ScalaDomain specific languages and Scala
Domain specific languages and Scala
 
High Level Application Scripting With EFL and LuaJIT
High Level Application Scripting With EFL and LuaJITHigh Level Application Scripting With EFL and LuaJIT
High Level Application Scripting With EFL and LuaJIT
 
Adapting Apache OpenOffice for adoption in Regione Emilia-Romagna
Adapting Apache OpenOffice for adoption in Regione Emilia-RomagnaAdapting Apache OpenOffice for adoption in Regione Emilia-Romagna
Adapting Apache OpenOffice for adoption in Regione Emilia-Romagna
 
.Net language support
.Net language support.Net language support
.Net language support
 

Gathering and Organizing System for PErsonal Language Skills - GOSPELS

  • 1. Gathering and Organizing System for PErsonal Language Skills G.O.S.PE.L.S. Student: Enrico Zanardo Supervisor: Prof. Vittore Casarosa Free University of Bolzano-Bozen 8th October 2010
  • 2. Goal Provide appropriate documents to users based on their language skills in English, Italian and German as determined in accordance with guidelines provided by the European Language Portfolio. EN DE IT
  • 3. Outline ● Problems; ● Proposed Solution; ● Prototype & Results; ● Conclusion;
  • 4. Objective EN-C1 EN-B1 IT-A2 IT-C2 DE-B2 DE-A2 IT-C2 EN-B2 IT-B1 DE-B2 DE-A2
  • 5. Problems 1. Classify documents according to “GOSPELS rating system” and match it to rating of the European Language Portfolio (A1, A2, ..., C1, C2). 2. Know user's language skills for the three language supported by the system (English, Italian and German). 3. Provide results in the three different languages according to user's language skills in each language.
  • 6. Solution to step 1 Frequency of (Classify documents) most common Docs words Algorithm Level of complexity Part of of the document Speech of the word
  • 7. Solution to step 2 (user's language skills)
  • 8. Match between Gospels Algorithm & ELP Frequency of most common Docs words Algorithm Level of complexity of the document Part of Range Template Speech of Language Documents the word Levels
  • 9. Example Results Italian Gospels Algorithm A1 A2 B1 B2 C1 C2 4500 40.00 35.72 4000 34.09 35.00 31.88 3500 30.00 3000 25.51 23.94 25.00 2500 20.00 2000 15.00 12.66 1500 10.00 1000 5.00 500 0 0.00 A1 A2 B1 B2 C1 C2 Rating Known words Words
  • 10. Solution to step 3 (three language results)
  • 11. Prototype Apache Nutch 1.1 Apache Solr 1.4 LanguageLevel plug-in APACHE LUCENE INDEXER TreeTagger SEARCHER Wiktionary Internet WEB-GUI “unibz.org” J2EE GOOGLE TRANSLATOR API CRAWLER APACHE TOMCAT 6.0 DB Postgresql 8.4.4 USER Profile ARCH LINUX 2010.05
  • 12. Conclusions and possible extensions ● The prototype is stable and seems to work well. ● Further testing required to improve and tune the algorithm ● Further testing required to improve the matching with ELP ● The architecture can easily support other languages ● It needs the frequency of words in the new language ● It needs the PoS tagger for the new language ● The prototype can be easily modified to become an additional function of an existing digital library ● It has to be embedded in the indexer
  • 13. Thank-you Danke Grazie QUESTIONS? demo?