Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EVALITA 2018 NLP4FUN - Solving language games

163 views

Published on

This paper describes the first edition of the “Solving language games” (NLP4FUN) task at the EVALITA 2018 campaign. The task consists in designing an artificial player for “The Guillotine” (La Ghigliottina, in Italian), a challenging language game which demands knowledge covering a broad range of topics. The game consists in finding a word which is semantically correlated with a set of 5 words called clues. Artificial players for that game can take advantage from the availability of open repositories
on the web, such as Wikipedia, that provide the system with the cultural and linguistic background needed to find the solution.

Published in: Science
  • Be the first to comment

  • Be the first to like this

EVALITA 2018 NLP4FUN - Solving language games

  1. 1. EVALITA 2018 EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN Overview of the EVALITA 2018 Solving language games (NLP4FUN) Task Pierpaolo Basile, Marco de Gemmis Lucia Siciliani, Giovanni Semeraro Dipartimento di Informatica Università degli Studi di Bari Aldo Moro, Italy
  2. 2. EVALITA 2018 Workshop December 12-13 2018, Turin “La Ghigliottina”
  3. 3. EVALITA 2018 Workshop December 12-13 2018, Turin “La Ghigliottina” The solution is pacco: ✓ “Pacco, doppio pacco e contropaccotto” (movie) ✓ Carta da pacco ✓ Pacco di soldi ✓ Pacco di pasta ✓ Pacco regalo
  4. 4. EVALITA 2018 Workshop December 12-13 2018, Turin Motivation ● Language Games have attracted the attention of researchers in the fields of AI and NLP ○ Jeopardy!, crossword puzzles ● “La Ghigliottina” is a challenging language game which demands knowledge covering a broad range of topics ○ take advantage from the availability of open repositories and the web ○ cultural and linguistic background are necessary to understand clues
  5. 5. EVALITA 2018 Workshop December 12-13 2018, Turin Task and dataset ● The task: given a set of five words - the clues - each linked in some way to a specific word that represents the unique solution of the game ○ clues are unrelated to each other ○ the player has one minute to find the solution!!! ● Dataset: set of games taken from ○ the TV show “L’Eredità” ○ the board game “L’Eredità”
  6. 6. EVALITA 2018 Workshop December 12-13 2018, Turin Data format <games> <game> <id>3fc953bd...</id> <clue>uomo</clue> <clue>cane</clue> <clue>musica</clue> <clue>casa</clue> <clue>pietra</clue> <solution>chiesa</solution> <type>TV</type> </game> ... </games> ● XML format ● a root element games which contains several game elements ● each game has five clue elements and one solution ● the element type specifies the type of the game: TV or board game
  7. 7. EVALITA 2018 Workshop December 12-13 2018, Turin Output The participants must return a ranked list of solutions in plain text file: id solution score rank time For example: 3fc953bd-... porta 0.978 1 3459 3fc953bd-... chiesa 0.932 2 3251 3fc953bd-... santo 0.897 3 4321 ... 3fc953bd-... carta 0.321 100 2343 MAX 100 candidate solutions for each game
  8. 8. EVALITA 2018 Workshop December 12-13 2018, Turin Output The participants must return a ranked list of solutions in plain text file: id solution score rank time For example: 3fc953bd-... porta 0.978 1 3459 3fc953bd-... chiesa 0.932 2 3251 3fc953bd-... santo 0.897 3 4321 ... 3fc953bd-... carta 0.321 100 2343 time taken by the system to compute the solution is reported in milliseconds
  9. 9. EVALITA 2018 Workshop December 12-13 2018, Turin Dataset: statistics ● Games have different levels of difficulty ○ instances taken both from the TV game and from the official board game ● Training set: 315 instances of the game ○ 64.8% (TV game), 35.2% (board game) ● Test set: 105 instances of the game ○ 62.9% (TV game) ○ 37.1% (board game) ● 300 fake games (automatically created) added in the evaluation data
  10. 10. EVALITA 2018 Workshop December 12-13 2018, Turin Evaluation ● a (time) weighted version of Mean Reciprocal Rank (MRR) ● G is the set of games ● rg is the rank of the solution ● tg denotes the minutes taken by the system to give the solution
  11. 11. EVALITA 2018 Workshop December 12-13 2018, Turin Participants ● 12 registered teams ● only 2 team submitted results ○ UNIOR4FUN: the idea is that clue words and the corresponding solution are often part of a multiword expression (multiword expressions are filtered by linguistic patterns) ○ LucaSquadrone: co-occurrences of clues and candidate solutions
  12. 12. EVALITA 2018 Workshop December 12-13 2018, Turin Results ● UNIOR4NLP reports very high MRR, the system is able to place the solution in the first positions ● Squadrone system takes more time for solving games MRR≠MRR (std) System MRR MRR (std) Solved UNIOR4NLP 0.6428 0.6428 81.90% Squadrone 0.0134 0.0350 25.71%
  13. 13. EVALITA 2018 Workshop December 12-13 2018, Turin Comments Reported results are remarkable but some difficult games requiring inference are unsolved: ● uno, notte, la trippa, auto, palazzo → portiere ○ uno is the number generally assigned to the role of the goalkeeper (portiere) ○ “La Trippa” is the surname of “Antonio La Trippa”, a character of the Italian movie “Gli onorevoli”, whose job is the porter (portiere) of a building
  14. 14. EVALITA 2018 Workshop December 12-13 2018, Turin Conclusions ● Challenging task ● Good results when the solution is a multiword expression ○ inference is hard to tackle ● Few participants ○ Is the task too difficult? ○ Do no-classification tasks attract few participants? ● Mobile app “Ghigliottiniamo” ○ integrate your artificial player through REST API, contact support@quiztime.io
  15. 15. EVALITA 2018 Workshop December 12-13 2018, Turin Thank you! Download our dataset from the GitHub EVALITA 2018 repository https://github.com/evalita2018/data

×