Gathering and Organizing System for PErsonal Language Skills - GOSPELS

314 views

Published on

Provide appropriate documents to users based on their
language skills in English, Italian and German as
determined in accordance with guidelines provided by
the European Language Portfolio.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
314
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Gathering and Organizing System for PErsonal Language Skills - GOSPELS

  1. 1. Gathering and Organizing System for PErsonal Language Skills G.O.S.PE.L.S.Student: Enrico ZanardoSupervisor: Prof. Vittore CasarosaFree University of Bolzano-Bozen8th October 2010
  2. 2. GoalProvide appropriate documents to users based on theirlanguage skills in English, Italian and German asdetermined in accordance with guidelines provided bythe European Language Portfolio. EN DE IT
  3. 3. Outline● Problems;● Proposed Solution;● Prototype & Results;● Conclusion;
  4. 4. ObjectiveEN-C1 EN-B1IT-A2 IT-C2DE-B2 DE-A2IT-C2EN-B2IT-B1DE-B2DE-A2
  5. 5. Problems1. Classify documents according to “GOSPELS ratingsystem” and match it to rating of the European LanguagePortfolio (A1, A2, ..., C1, C2).2. Know users language skills for the three languagesupported by the system (English, Italian and German).3. Provide results in the three different languagesaccording to users language skills in each language.
  6. 6. Solution to step 1Frequency of (Classify documents) most common Docs words Algorithm Level of complexity Part of of the document Speech of the word
  7. 7. Solution to step 2 (users language skills)
  8. 8. Match between Gospels Algorithm & ELPFrequency of most common Docs words Algorithm Level of complexity of the document Part of Range Template Speech of Language Documents the word Levels
  9. 9. Example Results Italian Gospels Algorithm A1 A2 B1 B2 C1 C24500 40.00 35.724000 34.09 35.00 31.883500 30.003000 25.51 23.94 25.002500 20.002000 15.00 12.661500 10.001000 5.00 500 0 0.00 A1 A2 B1 B2 C1 C2 Rating Known words Words
  10. 10. Solution to step 3 (three language results)
  11. 11. Prototype Apache Nutch 1.1 Apache Solr 1.4 LanguageLevel plug-in APACHE LUCENE INDEXER TreeTagger SEARCHER Wiktionary Internet WEB-GUI “unibz.org” J2EE GOOGLE TRANSLATOR API CRAWLERAPACHE TOMCAT 6.0 DB Postgresql 8.4.4 USER Profile ARCH LINUX 2010.05
  12. 12. Conclusions and possible extensions● The prototype is stable and seems to work well. ● Further testing required to improve and tune the algorithm ● Further testing required to improve the matching with ELP● The architecture can easily support other languages ● It needs the frequency of words in the new language ● It needs the PoS tagger for the new language● The prototype can be easily modified to become an additional function of an existing digital library ● It has to be embedded in the indexer
  13. 13. Thank-youDanke Grazie QUESTIONS? demo?

×