Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
EVALITA 2018
EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN
iLISTEN
itaLIan Speech acT labEliNg
https://ilisten2018.github...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Task Description
• Goal
o Annotating dialogue turns with speech act label...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Motivation
• Conversational access to information
o Chat-oriented dialogu...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Development and Test Data
• Transcripts of 60 dialogues
o 30 speech-based...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Development and Test Data
• Corpus of
persuasion dialogues
with an ECA
o ...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: User’s Moves
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: User’s Moves
Target of classification
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: System’s
Moves
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Acts: System’s
Moves
Provided as context
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Act Annotation
A excerpt a from a dialogue
EVALITA 2018 Workshop
December 12-13 2018, Turin
Speech Act Annotation
A excerpt a from a dialogue
The turn ID provides an...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Distribution and Format
EVALITA 2018 Workshop
December 12-13 2018, Turin
Evaluation
• Ranking: classification of user dialogue acts
o F1-score (ma...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Participants
• Task open to everyone from industry and
academia
• Sixteen...
EVALITA 2018 Workshop
December 12-13 2018, Turin
EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
System Prec Rec F Prec Rec F
Unitor 0.7328 0.7328 0.7328 0.6810 0...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Performance by class
Freq Prec Rec F Prec Rec F
OPENING 2% 1.00 1.00 1.00...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Ideas for future editions
• The best performing system leverages
syntacti...
EVALITA 2018 Workshop
December 12-13 2018, Turin
Have fun!
• Download our dataset from the GitHub
EVALITA 2018 repository
...
Upcoming SlideShare
Loading in …5
×

Evalita2018 iListen - itaLIan Speech acT labEliNg

11 views

Published on

Task overview

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Evalita2018 iListen - itaLIan Speech acT labEliNg

  1. 1. EVALITA 2018 EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN iLISTEN itaLIan Speech acT labEliNg https://ilisten2018.github.io/ Pierpaolo Basile and Nicole Novielli University of Bari Aldo Moro Dipartimento di Informatica {pierpaolo.basile, nicole.novielli}@uniba.it @NicoleNovielli@basilepp
  2. 2. EVALITA 2018 Workshop December 12-13 2018, Turin Task Description • Goal o Annotating dialogue turns with speech act labels • Speech acts o Labels define the communicative intention of the speaker o i.e. statement, request for information, agreement, opinion expression, general answer • Who is telling what to whom? o Speech acts as a coding standard for natural dialogues tasks J. L. Austin. 1962. How to do things with words. William James Lectures. Oxford University Press. J. R. Searle. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, London.
  3. 3. EVALITA 2018 Workshop December 12-13 2018, Turin Motivation • Conversational access to information o Chat-oriented dialogue systems o Simulation of natural dialogues with embodied conversational agents or chatbots o Conversational interfaces for smart devices and IoT • Dialogue analysis o Chatlog analysis o Interaction on social media o Extraction of long-lasting value information from technical discussions • Dedicated venues
  4. 4. EVALITA 2018 Workshop December 12-13 2018, Turin Development and Test Data • Transcripts of 60 dialogues o 30 speech-based + 30 text-based o 1,576 user dialogue turns o 1,611 system turns o ~22k words • Development set: 40 dialogues o 20 speech-based + 20 text-based • Development set: 20 dialogues o 10 speech-based + 10 text-based
  5. 5. EVALITA 2018 Workshop December 12-13 2018, Turin Development and Test Data • Corpus of persuasion dialogues with an ECA o Valentina plays the role of an advisor in the healthy eating domain o Wizard of Oz studies: ECA’s moves are pre- defined G. Clarizio, I. Mazzotta, N. Novielli, and F. De Rosis. 2006. Social attitude towards a conversational character. In Proc. of IEEE International Workshop on Robot and Human Interactive Communication, pp. 2–7.
  6. 6. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: User’s Moves
  7. 7. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: User’s Moves Target of classification
  8. 8. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: System’s Moves
  9. 9. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Acts: System’s Moves Provided as context
  10. 10. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Act Annotation A excerpt a from a dialogue
  11. 11. EVALITA 2018 Workshop December 12-13 2018, Turin Speech Act Annotation A excerpt a from a dialogue The turn ID provides an indication of the speaker and the input modality
  12. 12. EVALITA 2018 Workshop December 12-13 2018, Turin Distribution and Format
  13. 13. EVALITA 2018 Workshop December 12-13 2018, Turin Evaluation • Ranking: classification of user dialogue acts o F1-score (macro-averaging) • Precision and Recall are also computed o Both, micro- and macro-averaging • Baseline: trivial classifier predicting the majority class o STATEMENT (33%)
  14. 14. EVALITA 2018 Workshop December 12-13 2018, Turin Participants • Task open to everyone from industry and academia • Sixteen participants registered, but only two teams actually submitted the o UNITOR (Academia) - Supervised system based on Structured Kernel-based Support Vector Machine - Exploits the parse tree and the cosine similarity between the word vectors in a distributional semantics model o X2Check (Industry) – Report not submitted
  15. 15. EVALITA 2018 Workshop December 12-13 2018, Turin
  16. 16. EVALITA 2018 Workshop December 12-13 2018, Turin Results System Prec Rec F Prec Rec F Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531 X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957 Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564 Danilo Croce and Roberto Basili A Markovian Kernel-based Approach for itaLIan Speech acT labEliNg Macro Micro
  17. 17. EVALITA 2018 Workshop December 12-13 2018, Turin Results System Prec Rec F Prec Rec F Unitor 0.7328 0.7328 0.7328 0.6810 0.6274 0.6531 X2Check 0.6848 0.6848 0.6848 0.6076 0.5844 0.5957 Baseline 0.3403 0.3403 0.3403 0.0378 0.1111 0.0564 • Both systems overcome the baseline • Some classes are harder to predict o Low number of examples in the training data Macro Micro
  18. 18. EVALITA 2018 Workshop December 12-13 2018, Turin Performance by class Freq Prec Rec F Prec Rec F OPENING 2% 1.00 1.00 1.00 1.00 0.73 0.84 CLOSING 2% 0.78 0.70 0.74 0.82 0.90 0.86 INFO-REQUEST 25% 0.78 0.83 0.80 0.74 0.79 0.76 SOLICITATION-REQ-CLARIF 7% 0.40 0.33 0.36 0.44 0.33 0.38 STATEMENT 33% 0.75 0.94 0.84 0.67 0.89 0.76 GENERIC-ANSWER 10% 0.86 0.92 0.89 0.76 0.90 0.82 AGREE-ACCEPT 5% 0.65 0.46 0.54 0.57 0.50 0.53 REJECT 5% 0.43 0.08 0.13 0.00 0.00 0.00 KIND-ATT-SMALLTALK 11% 0.50 0.39 0.44 0.47 0.20 0.29 Unitor X2Check Some classes are harder to predict - low number of examples in the training data - the main cause of error is the misclassification as STATEMENT
  19. 19. EVALITA 2018 Workshop December 12-13 2018, Turin Ideas for future editions • The best performing system leverages syntactic features o Task-related features are not defined o Follow-up: extending the benchmark with dialogues from different domains • Is the task inherently dependent on the language? o To what extent the approaches generalize beyond Italian? o Dialogues in other languages might be included in the gold standard, as in AMI
  20. 20. EVALITA 2018 Workshop December 12-13 2018, Turin Have fun! • Download our dataset from the GitHub EVALITA 2018 repository https://github.com/evalita2018/data

×