Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Perfect Recipe: Add SUGAR, Add Data
Simone Magnolini, Vevake Balaraman, Marco Guerini, Bernardo Magnini
Fondazione Bru...
From Speech to Commands: a
complex task
The SUGAR task's goal is to train a voice-controlled
agent that transform spoken u...
Preprocessing
The first step is ASR, i.e., transcription from audio to text.
We used the Google API
We split the dataset i...
Preprocessing
The first step is ASR, i.e., transcription from audio to text.
We used the Google API
We split the dataset i...
Preprocessing
The first step is ASR, i.e., transcription from audio to text.
We used the Google API
We split the dataset i...
Preprocessing
The first step is ASR, i.e., transcription from audio to text.
We used the Google API
We split the dataset i...
Preprocessing
We substituted all the prepositions in the corpus with an apostrophe:
- D' -> Di
- L' -> Lo
- Un' -> Una
In ...
Two Different Approaches
As mentioned there is not a unique way to deal with SUGAR
We try two different approaches:
- Clas...
Memory + Pointer Networks
The utterance is encoded using Gated Recurrent Unit (GRU)
The final state is used by two decoder...
Memory + Pointer Networks
The utterance is encoded using Gated Recurrent Unit (GRU)
The final state is used by two decoder...
Memory + Pointer Networks
The utterance is encoded using Gated Recurrent Unit (GRU)
The final state is used by two decoder...
Fairseq
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for:
- T...
Fairseq
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for:
- T...
Data Augmentation
We generate new data from the original dataset with three strategies:
- Most-similar token substitution:...
Data Augmentation (examples)
Experiments and Evaluation
Actions Arguments
Memory + Pointer Networks
- Data Augmentation 65.091 30.856
+ Data Augmentati...
Thanks for your attention!
And remember...
Thanks for your attention!
And remember...
Upcoming SlideShare
Loading in …5
×

Evalita2018 sugar

5 views

Published on

Presentazione EVALITA 2018: FBK SUGAR

Published in: Software
  • Be the first to comment

  • Be the first to like this

Evalita2018 sugar

  1. 1. The Perfect Recipe: Add SUGAR, Add Data Simone Magnolini, Vevake Balaraman, Marco Guerini, Bernardo Magnini Fondazione Bruno Kessler - Italy University of Trento - Italy AdeptMind Scholar - Canada EVALITA 2018 6th evaluation campaign of NLP and Speech Tools for Italian – Torino, December 12th and 13th 2018
  2. 2. From Speech to Commands: a complex task The SUGAR task's goal is to train a voice-controlled agent that transform spoken utterances into. In this scenario there are several complex aspects: - Small dataset - 3 recipes - No transcription of the audio file - There is not a clear way to model the problem: - Classification to detect actions? - Sequence labeling for parameters? - ... These aspects are realistic and challenging!
  3. 3. Preprocessing The first step is ASR, i.e., transcription from audio to text. We used the Google API We split the dataset into training set, development set and test set: - 80/20 split - Two recipes for training and development, a recipe for test The second options seems to create a more realistic data set
  4. 4. Preprocessing The first step is ASR, i.e., transcription from audio to text. We used the Google API We split the dataset into training set, development set and test set: - 80/20 split - Two recipes for training and development, a recipe for test The second options seems to create a more realistic data set
  5. 5. Preprocessing The first step is ASR, i.e., transcription from audio to text. We used the Google API We split the dataset into training set, development set and test set: - 80/20 split - Two recipes for training and development, a recipe for test The second options seems to create a more realistic data set
  6. 6. Preprocessing The first step is ASR, i.e., transcription from audio to text. We used the Google API We split the dataset into training set, development set and test set: - 80/20 split - Two recipes for training and development, a recipe for test The second options seems to create a more realistic data set
  7. 7. Preprocessing We substituted all the prepositions in the corpus with an apostrophe: - D' -> Di - L' -> Lo - Un' -> Una In order to take advantage of the structure of the dialogue in the dataset, in every line of the corpus we added up to three previous interactions: un filo di olio nella padella # e poi verso lo uovo nella padella # gira la frittata # togli la frittata dal fuoco
  8. 8. Two Different Approaches As mentioned there is not a unique way to deal with SUGAR We try two different approaches: - Classification for actions (memory network) and labeling for parameters (pointer network) - Sequence to sequence (fairseq) But we still have the problem of small dataset...
  9. 9. Memory + Pointer Networks The utterance is encoded using Gated Recurrent Unit (GRU) The final state is used by two decoders: - A memory network to select the action
  10. 10. Memory + Pointer Networks The utterance is encoded using Gated Recurrent Unit (GRU) The final state is used by two decoders: - A memory network to select the action - A pointer network that give to every token in the utterance the probability to be in the output
  11. 11. Memory + Pointer Networks The utterance is encoded using Gated Recurrent Unit (GRU) The final state is used by two decoders: - A memory network to select the action - A pointer network that give to every token in the utterance the probability to be in the output With this dataset it works, so maybe the problem is solved...
  12. 12. Fairseq Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for: - Translation - Summarization - Language modeling - Other text generation tasks (are we here?)
  13. 13. Fairseq Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for: - Translation - Summarization - Language modeling - Other text generation tasks (are we here?) It provides reference implementations of various sequence-to-sequence models, We use Convolutional Neural Networks (CNN) But with the dataset it cannot train properly...
  14. 14. Data Augmentation We generate new data from the original dataset with three strategies: - Most-similar token substitution: based on a similarity mechanisms (i.e. embeddings) - Synonym token substitution: synonymy relations taken from an online dictionary and applied to specific tokens - Entity substitution: replace entities in the examples with random entities of the same type taken from available gazetteers
  15. 15. Data Augmentation (examples)
  16. 16. Experiments and Evaluation Actions Arguments Memory + Pointer Networks - Data Augmentation 65.091 30.856 + Data Augmentation 65.396 35.786 Fine Tuning 66.158 36.102 Fairseq + Data Augmentation 66,361 46,221
  17. 17. Thanks for your attention! And remember...
  18. 18. Thanks for your attention! And remember...

×