Gathering and Organizing System for PErsonal Language Skills - GOSPELS

Gathering and Organizing System for PErsonal
Language Skills

G.O.S.PE.L.S.

Student: Enrico Zanardo
Supervisor: Prof. Vittore Casarosa
Free University of Bolzano-Bozen
8th October 2010

Goal

Provide appropriate documents to users based on their
language skills in English, Italian and German as
determined in accordance with guidelines provided by
the European Language Portfolio.
EN
DE IT

Outline
● Problems;
● Proposed Solution;
● Prototype & Results;
● Conclusion;

Objective
EN-C1
EN-B1
IT-A2 IT-C2
DE-B2
DE-A2

IT-C2

EN-B2

IT-B1

DE-B2

DE-A2

Problems
1. Classify documents according to “GOSPELS rating
system” and match it to rating of the European Language
Portfolio (A1, A2, ..., C1, C2).

2. Know user's language skills for the three language
supported by the system (English, Italian and German).

3. Provide results in the three different languages
according to user's language skills in each language.

Solution to step 1
Frequency of
(Classify documents)
most
common Docs
words

Algorithm

Level of complexity
Part of of the document
Speech of
the word

Solution to step 2
(user's language skills)

Match between
Gospels Algorithm & ELP
Frequency of
most
common Docs
words

Algorithm

Level of complexity
of the document
Part of Range
Template
Speech of Language
Documents
the word Levels

Example Results
Italian
Gospels Algorithm

A1 A2 B1 B2 C1 C2
4500 40.00

35.72
4000 34.09 35.00
31.88
3500
30.00

3000 25.51
23.94 25.00

2500

20.00

2000

15.00
12.66
1500

10.00
1000

5.00
500

0 0.00
A1 A2 B1 B2 C1 C2

Rating Known words Words

Solution to step 3
(three language results)

Prototype
Apache Nutch 1.1
Apache
Solr 1.4 LanguageLevel plug-in
APACHE LUCENE

INDEXER TreeTagger
SEARCHER

Wiktionary Internet
WEB-GUI
“unibz.org”
J2EE
GOOGLE
TRANSLATOR API
CRAWLER

APACHE TOMCAT 6.0

DB
Postgresql 8.4.4 USER Profile

ARCH LINUX 2010.05

Conclusions and possible extensions
● The prototype is stable and seems to work well.
● Further testing required to improve and tune the algorithm
● Further testing required to improve the matching with ELP

● The architecture can easily support other languages
● It needs the frequency of words in the new language
● It needs the PoS tagger for the new language

● The prototype can be easily modified to become an additional
function of an existing digital library
● It has to be embedded in the indexer

Thank-you
Danke Grazie

QUESTIONS?

demo?

Gathering and Organizing System for PErsonal Language Skills - GOSPELS

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Gathering and Organizing System for PErsonal Language Skills - GOSPELS

Similar to Gathering and Organizing System for PErsonal Language Skills - GOSPELS (20)

Gathering and Organizing System for PErsonal Language Skills - GOSPELS