Provide appropriate documents to users based on their
language skills in English, Italian and German as
determined in accordance with guidelines provided by
the European Language Portfolio.
Gathering and Organizing System for PErsonal Language Skills - GOSPELS
1. Gathering and Organizing System for PErsonal
Language Skills
G.O.S.PE.L.S.
Student: Enrico Zanardo
Supervisor: Prof. Vittore Casarosa
Free University of Bolzano-Bozen
8th October 2010
2. Goal
Provide appropriate documents to users based on their
language skills in English, Italian and German as
determined in accordance with guidelines provided by
the European Language Portfolio.
EN
DE IT
5. Problems
1. Classify documents according to “GOSPELS rating
system” and match it to rating of the European Language
Portfolio (A1, A2, ..., C1, C2).
2. Know user's language skills for the three language
supported by the system (English, Italian and German).
3. Provide results in the three different languages
according to user's language skills in each language.
6. Solution to step 1
Frequency of
(Classify documents)
most
common Docs
words
Algorithm
Level of complexity
Part of of the document
Speech of
the word
8. Match between
Gospels Algorithm & ELP
Frequency of
most
common Docs
words
Algorithm
Level of complexity
of the document
Part of Range
Template
Speech of Language
Documents
the word Levels
9. Example Results
Italian
Gospels Algorithm
A1 A2 B1 B2 C1 C2
4500 40.00
35.72
4000 34.09 35.00
31.88
3500
30.00
3000 25.51
23.94 25.00
2500
20.00
2000
15.00
12.66
1500
10.00
1000
5.00
500
0 0.00
A1 A2 B1 B2 C1 C2
Rating Known words Words
11. Prototype
Apache Nutch 1.1
Apache
Solr 1.4 LanguageLevel plug-in
APACHE LUCENE
INDEXER TreeTagger
SEARCHER
Wiktionary Internet
WEB-GUI
“unibz.org”
J2EE
GOOGLE
TRANSLATOR API
CRAWLER
APACHE TOMCAT 6.0
DB
Postgresql 8.4.4 USER Profile
ARCH LINUX 2010.05
12. Conclusions and possible extensions
● The prototype is stable and seems to work well.
● Further testing required to improve and tune the algorithm
● Further testing required to improve the matching with ELP
● The architecture can easily support other languages
● It needs the frequency of words in the new language
● It needs the PoS tagger for the new language
● The prototype can be easily modified to become an additional
function of an existing digital library
● It has to be embedded in the indexer