EVALITA 2018
EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN
Overview of the EVALITA 2018 Solving
language games (NLP4FUN) Task
Pierpaolo Basile, Marco de Gemmis
Lucia Siciliani, Giovanni Semeraro
Dipartimento di Informatica
Università degli Studi di Bari Aldo Moro, Italy
EVALITA 2018 Workshop
December 12-13 2018, Turin
“La Ghigliottina”
EVALITA 2018 Workshop
December 12-13 2018, Turin
“La Ghigliottina”
The solution is pacco:
✓ “Pacco, doppio pacco e contropaccotto” (movie)
✓ Carta da pacco
✓ Pacco di soldi
✓ Pacco di pasta
✓ Pacco regalo
EVALITA 2018 Workshop
December 12-13 2018, Turin
Motivation
● Language Games have attracted the attention of
researchers in the fields of AI and NLP
○ Jeopardy!, crossword puzzles
● “La Ghigliottina” is a challenging language
game which demands knowledge covering a
broad range of topics
○ take advantage from the availability of open
repositories and the web
○ cultural and linguistic background are
necessary to understand clues
EVALITA 2018 Workshop
December 12-13 2018, Turin
Task and dataset
● The task: given a set of five words - the
clues - each linked in some way to a
specific word that represents the unique
solution of the game
○ clues are unrelated to each other
○ the player has one minute to find the
solution!!!
● Dataset: set of games taken from
○ the TV show “L’Eredità”
○ the board game “L’Eredità”
EVALITA 2018 Workshop
December 12-13 2018, Turin
Data format
<games>
<game>
<id>3fc953bd...</id>
<clue>uomo</clue>
<clue>cane</clue>
<clue>musica</clue>
<clue>casa</clue>
<clue>pietra</clue>
<solution>chiesa</solution>
<type>TV</type>
</game>
...
</games>
● XML format
● a root element
games which
contains several
game elements
● each game has five
clue elements and
one solution
● the element type
specifies the type of
the game: TV or
board game
EVALITA 2018 Workshop
December 12-13 2018, Turin
Output
The participants must return a ranked list of
solutions in plain text file:
id solution score rank time
For example:
3fc953bd-... porta 0.978 1 3459
3fc953bd-... chiesa 0.932 2 3251
3fc953bd-... santo 0.897 3 4321
...
3fc953bd-... carta 0.321 100 2343
MAX 100
candidate
solutions for each
game
EVALITA 2018 Workshop
December 12-13 2018, Turin
Output
The participants must return a ranked list of
solutions in plain text file:
id solution score rank time
For example:
3fc953bd-... porta 0.978 1 3459
3fc953bd-... chiesa 0.932 2 3251
3fc953bd-... santo 0.897 3 4321
...
3fc953bd-... carta 0.321 100 2343
time taken by the
system to
compute the
solution is
reported in
milliseconds
EVALITA 2018 Workshop
December 12-13 2018, Turin
Dataset: statistics
● Games have different levels of difficulty
○ instances taken both from the TV game and
from the official board game
● Training set: 315 instances of the game
○ 64.8% (TV game), 35.2% (board game)
● Test set: 105 instances of the game
○ 62.9% (TV game)
○ 37.1% (board game)
● 300 fake games (automatically created)
added in the evaluation data
EVALITA 2018 Workshop
December 12-13 2018, Turin
Evaluation
● a (time) weighted version of Mean
Reciprocal Rank (MRR)
● G is the set of games
● rg
is the rank of the solution
● tg
denotes the minutes taken by the system
to give the solution
EVALITA 2018 Workshop
December 12-13 2018, Turin
Participants
● 12 registered teams
● only 2 team submitted results
○ UNIOR4FUN: the idea is that clue words and
the corresponding solution are often part of a
multiword expression (multiword expressions
are filtered by linguistic patterns)
○ LucaSquadrone: co-occurrences of clues and
candidate solutions
EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
● UNIOR4NLP reports very high MRR, the
system is able to place the solution in the
first positions
● Squadrone system takes more time for
solving games MRR≠MRR (std)
System MRR MRR (std) Solved
UNIOR4NLP 0.6428 0.6428 81.90%
Squadrone 0.0134 0.0350 25.71%
EVALITA 2018 Workshop
December 12-13 2018, Turin
Comments
Reported results are remarkable but some
difficult games requiring inference are
unsolved:
● uno, notte, la trippa, auto, palazzo → portiere
○ uno is the number generally assigned to the
role of the goalkeeper (portiere)
○ “La Trippa” is the surname of “Antonio La
Trippa”, a character of the Italian movie “Gli
onorevoli”, whose job is the porter (portiere) of
a building
EVALITA 2018 Workshop
December 12-13 2018, Turin
Conclusions
● Challenging task
● Good results when the solution is a
multiword expression
○ inference is hard to tackle
● Few participants
○ Is the task too difficult?
○ Do no-classification tasks attract few
participants?
● Mobile app “Ghigliottiniamo”
○ integrate your artificial player through REST API,
contact support@quiztime.io
EVALITA 2018 Workshop
December 12-13 2018, Turin
Thank you!
Download our dataset from the GitHub
EVALITA 2018 repository
https://github.com/evalita2018/data

EVALITA 2018 NLP4FUN - Solving language games

  • 1.
    EVALITA 2018 EVALUATION OFNLP AND SPEECH TOOLS FOR ITALIAN Overview of the EVALITA 2018 Solving language games (NLP4FUN) Task Pierpaolo Basile, Marco de Gemmis Lucia Siciliani, Giovanni Semeraro Dipartimento di Informatica Università degli Studi di Bari Aldo Moro, Italy
  • 2.
    EVALITA 2018 Workshop December12-13 2018, Turin “La Ghigliottina”
  • 3.
    EVALITA 2018 Workshop December12-13 2018, Turin “La Ghigliottina” The solution is pacco: ✓ “Pacco, doppio pacco e contropaccotto” (movie) ✓ Carta da pacco ✓ Pacco di soldi ✓ Pacco di pasta ✓ Pacco regalo
  • 4.
    EVALITA 2018 Workshop December12-13 2018, Turin Motivation ● Language Games have attracted the attention of researchers in the fields of AI and NLP ○ Jeopardy!, crossword puzzles ● “La Ghigliottina” is a challenging language game which demands knowledge covering a broad range of topics ○ take advantage from the availability of open repositories and the web ○ cultural and linguistic background are necessary to understand clues
  • 5.
    EVALITA 2018 Workshop December12-13 2018, Turin Task and dataset ● The task: given a set of five words - the clues - each linked in some way to a specific word that represents the unique solution of the game ○ clues are unrelated to each other ○ the player has one minute to find the solution!!! ● Dataset: set of games taken from ○ the TV show “L’Eredità” ○ the board game “L’Eredità”
  • 6.
    EVALITA 2018 Workshop December12-13 2018, Turin Data format <games> <game> <id>3fc953bd...</id> <clue>uomo</clue> <clue>cane</clue> <clue>musica</clue> <clue>casa</clue> <clue>pietra</clue> <solution>chiesa</solution> <type>TV</type> </game> ... </games> ● XML format ● a root element games which contains several game elements ● each game has five clue elements and one solution ● the element type specifies the type of the game: TV or board game
  • 7.
    EVALITA 2018 Workshop December12-13 2018, Turin Output The participants must return a ranked list of solutions in plain text file: id solution score rank time For example: 3fc953bd-... porta 0.978 1 3459 3fc953bd-... chiesa 0.932 2 3251 3fc953bd-... santo 0.897 3 4321 ... 3fc953bd-... carta 0.321 100 2343 MAX 100 candidate solutions for each game
  • 8.
    EVALITA 2018 Workshop December12-13 2018, Turin Output The participants must return a ranked list of solutions in plain text file: id solution score rank time For example: 3fc953bd-... porta 0.978 1 3459 3fc953bd-... chiesa 0.932 2 3251 3fc953bd-... santo 0.897 3 4321 ... 3fc953bd-... carta 0.321 100 2343 time taken by the system to compute the solution is reported in milliseconds
  • 9.
    EVALITA 2018 Workshop December12-13 2018, Turin Dataset: statistics ● Games have different levels of difficulty ○ instances taken both from the TV game and from the official board game ● Training set: 315 instances of the game ○ 64.8% (TV game), 35.2% (board game) ● Test set: 105 instances of the game ○ 62.9% (TV game) ○ 37.1% (board game) ● 300 fake games (automatically created) added in the evaluation data
  • 10.
    EVALITA 2018 Workshop December12-13 2018, Turin Evaluation ● a (time) weighted version of Mean Reciprocal Rank (MRR) ● G is the set of games ● rg is the rank of the solution ● tg denotes the minutes taken by the system to give the solution
  • 11.
    EVALITA 2018 Workshop December12-13 2018, Turin Participants ● 12 registered teams ● only 2 team submitted results ○ UNIOR4FUN: the idea is that clue words and the corresponding solution are often part of a multiword expression (multiword expressions are filtered by linguistic patterns) ○ LucaSquadrone: co-occurrences of clues and candidate solutions
  • 12.
    EVALITA 2018 Workshop December12-13 2018, Turin Results ● UNIOR4NLP reports very high MRR, the system is able to place the solution in the first positions ● Squadrone system takes more time for solving games MRR≠MRR (std) System MRR MRR (std) Solved UNIOR4NLP 0.6428 0.6428 81.90% Squadrone 0.0134 0.0350 25.71%
  • 13.
    EVALITA 2018 Workshop December12-13 2018, Turin Comments Reported results are remarkable but some difficult games requiring inference are unsolved: ● uno, notte, la trippa, auto, palazzo → portiere ○ uno is the number generally assigned to the role of the goalkeeper (portiere) ○ “La Trippa” is the surname of “Antonio La Trippa”, a character of the Italian movie “Gli onorevoli”, whose job is the porter (portiere) of a building
  • 14.
    EVALITA 2018 Workshop December12-13 2018, Turin Conclusions ● Challenging task ● Good results when the solution is a multiword expression ○ inference is hard to tackle ● Few participants ○ Is the task too difficult? ○ Do no-classification tasks attract few participants? ● Mobile app “Ghigliottiniamo” ○ integrate your artificial player through REST API, contact support@quiztime.io
  • 15.
    EVALITA 2018 Workshop December12-13 2018, Turin Thank you! Download our dataset from the GitHub EVALITA 2018 repository https://github.com/evalita2018/data