2017-08 Dwango AI Lab. 02017-08 Dwango AI Lab.
Simulating the Usage Acquisition of
Two-Word Sentences with a First- or
Second-Person Subject and Verb
ARAKAWA, Naoya
Dwango AI Laboratory
2017-08-04
BICA 2017
2017-08 Dwango AI Lab. 1
Outline
1. Introduction
2. The Experiment
3. Discussion
4. Conclusion
2017-08 Dwango AI Lab. 2
Introduction
The paper shows:
A simplistic simulated agent can learn
the use of ‘I’ & ‘you’
in two-word (subject-verb) sentences
interacting with a caretaker agent,
while babbling,
observing utterances & behavior,
and obtaining rewards.
2017-08 Dwango AI Lab. 3
Background
• Previous Works
Learning 1st & 2nd person pronouns observing
more than one caretakers’ language use
E.g., Oshima-Takane+, Gold & Scasselati
• Question: Can one learn them from a
single caretaker?
• Answer: Yes (from this experiment)
2017-08 Dwango AI Lab. 4
The Experiment
1. The World
2. The Language
3. The Caretaker
4. The Learner
5. Results
Luca
gira.
An Image…
2017-08 Dwango AI Lab. 5
The World of the Experiment
Two rambling agents
–A Caretaker
Uses the language of the experiment
–A Language Learner
Learns the language
Each knows its & the other’s utterance/action
given in symbolic forms.
(No symbol grounding issue involved here.)
Three kinds of action: {come, go, turn}
2017-08 Dwango AI Lab. 6
The Language
• Two-word Sentences: Subject+Verb
• Subject: {I, You, Luca, Mario}
– Luca: Language Learner
– Mario: Caretaker
• Verb: {come, go, turn}
• A sentence is used:
– To describe
• Utterer’s own action
• The other’s action
– To ‘give instruction’ to the other.
2017-08 Dwango AI Lab. 7
The Caretaker
• Executes action {come, go, turn} randomly
• Describes its action in 2-word sentences
• Or, instructs the learner to act
{come, go, turn} with a 2-word sentence
• Rewards the Learner when:
– Learner describes its own or caretaker’s action
correctly.
– Learner acts following instruction.
2017-08 Dwango AI Lab. 8
Language Learner
Three Modes:
Reaction Mode / Spontaneous Action Mode / Direction Mode
Has
Caretaker
acted or
uttered?
Reaction Mode
Acts & Utters
Reaction Mode
Utters
Random
Spontaneous
Action Mode:
Acts & Utters.
Direction Mode:
Utters. based on ‘internal
representation’ of
Caretaker’s action
No
Only Uttered
Acted
2017-08 Dwango AI Lab. 9
Learner’s Utterance/Action
• Produced with information:
– Mode
– Its own action (in spontaneous action mode)
– Caretaker’s action/utterance (in reaction mode)
• Choice of Subjects, Verbs & Actions
– Reinforced by Rewards
• Given by Caretaker
• Internal Reward: when Caretaker follows direction
– Random choice: Babbling
• Naïve Bayes + Dirichlet Dist.
(dice throwing based on reward average)
2017-08 Dwango AI Lab. 10
Results
• 2,500 interactions between Caretaker &
Learner
• Success rate = reward rate
• After 1,200 interactions, Learner learned
to utter & act at a 90% rate of correctness.
2017-08 Dwango AI Lab. 11
Success rate of Subject Selection
The success rate of the reaction mode was better
since it had more choices than the other modes.
S react
S sp. act.
S direction
2017-08 Dwango AI Lab. 12
Success rate of Action Selection
2017-08 Dwango AI Lab. 13
Example Interaction
LL: Language Learner (Luca), CT: Caretaker (Mario)
Utt. Utterance, Rew.: Reward
The language for utterances is Interlingua (ia).
2017-08 Dwango AI Lab. 14
Discussion & Conclusion
1. The World
2. The Language
3. Learning
Conclusion
2017-08 Dwango AI Lab. 15
Discussion – The Result
The experiment showed:
• One can learn 1st & 2nd person pronouns
from a single caretaker.
• Playing a minimal language game
• Without grounded concept: object, other,
etc…
2017-08 Dwango AI Lab. 16
Discussion – The Language
• Semantics
– Programmed in Caretaker’s Language Use
• Human Language Acquisition
– Learners are only presented examples in
interactions with Caretakers
• Two-word Sentences
– cf. 1 or 2 word sentence period in infants’
language acquisition.
(not always subject-verb, though)
2017-08 Dwango AI Lab. 17
Discussion – Learning
• Approval as Reward
– In human learning: Smiling, etc.
• Internal Reward
– When Caretaker follows Learner’s direction
⇔ Goal Achieved
• Babbling (random choice) was necessary
& reinforced.
• Modes {reaction, spontaneous action, and
direction} could be learned
– But not in the scope of the current experiment
2017-08 Dwango AI Lab. 18
Conclusion
• Related Research
– Language Emergence with Artificial Agents
• Steels, Vogt, Sugita, et al.
• The current experiment is rather learning existing language.
• Further directions
– More realistic experiments would require reference to
actual human language acquisition.
– Symbol grounding problem
– Learning language models
• E.g., LSTM
• Language use as System of choices
cf. Functional Grammar (MAK Halliday)
2017-08 Dwango AI Lab. 19
EOP
Thank you very much for your attention!

Simulating the Usage Acquisition of Two-Word Sentences with a First- or Second-Person Subject and Verb

  • 1.
    2017-08 Dwango AILab. 02017-08 Dwango AI Lab. Simulating the Usage Acquisition of Two-Word Sentences with a First- or Second-Person Subject and Verb ARAKAWA, Naoya Dwango AI Laboratory 2017-08-04 BICA 2017
  • 2.
    2017-08 Dwango AILab. 1 Outline 1. Introduction 2. The Experiment 3. Discussion 4. Conclusion
  • 3.
    2017-08 Dwango AILab. 2 Introduction The paper shows: A simplistic simulated agent can learn the use of ‘I’ & ‘you’ in two-word (subject-verb) sentences interacting with a caretaker agent, while babbling, observing utterances & behavior, and obtaining rewards.
  • 4.
    2017-08 Dwango AILab. 3 Background • Previous Works Learning 1st & 2nd person pronouns observing more than one caretakers’ language use E.g., Oshima-Takane+, Gold & Scasselati • Question: Can one learn them from a single caretaker? • Answer: Yes (from this experiment)
  • 5.
    2017-08 Dwango AILab. 4 The Experiment 1. The World 2. The Language 3. The Caretaker 4. The Learner 5. Results Luca gira. An Image…
  • 6.
    2017-08 Dwango AILab. 5 The World of the Experiment Two rambling agents –A Caretaker Uses the language of the experiment –A Language Learner Learns the language Each knows its & the other’s utterance/action given in symbolic forms. (No symbol grounding issue involved here.) Three kinds of action: {come, go, turn}
  • 7.
    2017-08 Dwango AILab. 6 The Language • Two-word Sentences: Subject+Verb • Subject: {I, You, Luca, Mario} – Luca: Language Learner – Mario: Caretaker • Verb: {come, go, turn} • A sentence is used: – To describe • Utterer’s own action • The other’s action – To ‘give instruction’ to the other.
  • 8.
    2017-08 Dwango AILab. 7 The Caretaker • Executes action {come, go, turn} randomly • Describes its action in 2-word sentences • Or, instructs the learner to act {come, go, turn} with a 2-word sentence • Rewards the Learner when: – Learner describes its own or caretaker’s action correctly. – Learner acts following instruction.
  • 9.
    2017-08 Dwango AILab. 8 Language Learner Three Modes: Reaction Mode / Spontaneous Action Mode / Direction Mode Has Caretaker acted or uttered? Reaction Mode Acts & Utters Reaction Mode Utters Random Spontaneous Action Mode: Acts & Utters. Direction Mode: Utters. based on ‘internal representation’ of Caretaker’s action No Only Uttered Acted
  • 10.
    2017-08 Dwango AILab. 9 Learner’s Utterance/Action • Produced with information: – Mode – Its own action (in spontaneous action mode) – Caretaker’s action/utterance (in reaction mode) • Choice of Subjects, Verbs & Actions – Reinforced by Rewards • Given by Caretaker • Internal Reward: when Caretaker follows direction – Random choice: Babbling • Naïve Bayes + Dirichlet Dist. (dice throwing based on reward average)
  • 11.
    2017-08 Dwango AILab. 10 Results • 2,500 interactions between Caretaker & Learner • Success rate = reward rate • After 1,200 interactions, Learner learned to utter & act at a 90% rate of correctness.
  • 12.
    2017-08 Dwango AILab. 11 Success rate of Subject Selection The success rate of the reaction mode was better since it had more choices than the other modes. S react S sp. act. S direction
  • 13.
    2017-08 Dwango AILab. 12 Success rate of Action Selection
  • 14.
    2017-08 Dwango AILab. 13 Example Interaction LL: Language Learner (Luca), CT: Caretaker (Mario) Utt. Utterance, Rew.: Reward The language for utterances is Interlingua (ia).
  • 15.
    2017-08 Dwango AILab. 14 Discussion & Conclusion 1. The World 2. The Language 3. Learning Conclusion
  • 16.
    2017-08 Dwango AILab. 15 Discussion – The Result The experiment showed: • One can learn 1st & 2nd person pronouns from a single caretaker. • Playing a minimal language game • Without grounded concept: object, other, etc…
  • 17.
    2017-08 Dwango AILab. 16 Discussion – The Language • Semantics – Programmed in Caretaker’s Language Use • Human Language Acquisition – Learners are only presented examples in interactions with Caretakers • Two-word Sentences – cf. 1 or 2 word sentence period in infants’ language acquisition. (not always subject-verb, though)
  • 18.
    2017-08 Dwango AILab. 17 Discussion – Learning • Approval as Reward – In human learning: Smiling, etc. • Internal Reward – When Caretaker follows Learner’s direction ⇔ Goal Achieved • Babbling (random choice) was necessary & reinforced. • Modes {reaction, spontaneous action, and direction} could be learned – But not in the scope of the current experiment
  • 19.
    2017-08 Dwango AILab. 18 Conclusion • Related Research – Language Emergence with Artificial Agents • Steels, Vogt, Sugita, et al. • The current experiment is rather learning existing language. • Further directions – More realistic experiments would require reference to actual human language acquisition. – Symbol grounding problem – Learning language models • E.g., LSTM • Language use as System of choices cf. Functional Grammar (MAK Halliday)
  • 20.
    2017-08 Dwango AILab. 19 EOP Thank you very much for your attention!