Simulating the Usage Acquisition of Two-Word Sentences with a First- or Second-Person Subject and Verb
1. 2017-08 Dwango AI Lab. 02017-08 Dwango AI Lab.
Simulating the Usage Acquisition of
Two-Word Sentences with a First- or
Second-Person Subject and Verb
ARAKAWA, Naoya
Dwango AI Laboratory
2017-08-04
BICA 2017
2. 2017-08 Dwango AI Lab. 1
Outline
1. Introduction
2. The Experiment
3. Discussion
4. Conclusion
3. 2017-08 Dwango AI Lab. 2
Introduction
The paper shows:
A simplistic simulated agent can learn
the use of ‘I’ & ‘you’
in two-word (subject-verb) sentences
interacting with a caretaker agent,
while babbling,
observing utterances & behavior,
and obtaining rewards.
4. 2017-08 Dwango AI Lab. 3
Background
• Previous Works
Learning 1st & 2nd person pronouns observing
more than one caretakers’ language use
E.g., Oshima-Takane+, Gold & Scasselati
• Question: Can one learn them from a
single caretaker?
• Answer: Yes (from this experiment)
5. 2017-08 Dwango AI Lab. 4
The Experiment
1. The World
2. The Language
3. The Caretaker
4. The Learner
5. Results
Luca
gira.
An Image…
6. 2017-08 Dwango AI Lab. 5
The World of the Experiment
Two rambling agents
–A Caretaker
Uses the language of the experiment
–A Language Learner
Learns the language
Each knows its & the other’s utterance/action
given in symbolic forms.
(No symbol grounding issue involved here.)
Three kinds of action: {come, go, turn}
7. 2017-08 Dwango AI Lab. 6
The Language
• Two-word Sentences: Subject+Verb
• Subject: {I, You, Luca, Mario}
– Luca: Language Learner
– Mario: Caretaker
• Verb: {come, go, turn}
• A sentence is used:
– To describe
• Utterer’s own action
• The other’s action
– To ‘give instruction’ to the other.
8. 2017-08 Dwango AI Lab. 7
The Caretaker
• Executes action {come, go, turn} randomly
• Describes its action in 2-word sentences
• Or, instructs the learner to act
{come, go, turn} with a 2-word sentence
• Rewards the Learner when:
– Learner describes its own or caretaker’s action
correctly.
– Learner acts following instruction.
9. 2017-08 Dwango AI Lab. 8
Language Learner
Three Modes:
Reaction Mode / Spontaneous Action Mode / Direction Mode
Has
Caretaker
acted or
uttered?
Reaction Mode
Acts & Utters
Reaction Mode
Utters
Random
Spontaneous
Action Mode:
Acts & Utters.
Direction Mode:
Utters. based on ‘internal
representation’ of
Caretaker’s action
No
Only Uttered
Acted
10. 2017-08 Dwango AI Lab. 9
Learner’s Utterance/Action
• Produced with information:
– Mode
– Its own action (in spontaneous action mode)
– Caretaker’s action/utterance (in reaction mode)
• Choice of Subjects, Verbs & Actions
– Reinforced by Rewards
• Given by Caretaker
• Internal Reward: when Caretaker follows direction
– Random choice: Babbling
• Naïve Bayes + Dirichlet Dist.
(dice throwing based on reward average)
11. 2017-08 Dwango AI Lab. 10
Results
• 2,500 interactions between Caretaker &
Learner
• Success rate = reward rate
• After 1,200 interactions, Learner learned
to utter & act at a 90% rate of correctness.
12. 2017-08 Dwango AI Lab. 11
Success rate of Subject Selection
The success rate of the reaction mode was better
since it had more choices than the other modes.
S react
S sp. act.
S direction
14. 2017-08 Dwango AI Lab. 13
Example Interaction
LL: Language Learner (Luca), CT: Caretaker (Mario)
Utt. Utterance, Rew.: Reward
The language for utterances is Interlingua (ia).
15. 2017-08 Dwango AI Lab. 14
Discussion & Conclusion
1. The World
2. The Language
3. Learning
Conclusion
16. 2017-08 Dwango AI Lab. 15
Discussion – The Result
The experiment showed:
• One can learn 1st & 2nd person pronouns
from a single caretaker.
• Playing a minimal language game
• Without grounded concept: object, other,
etc…
17. 2017-08 Dwango AI Lab. 16
Discussion – The Language
• Semantics
– Programmed in Caretaker’s Language Use
• Human Language Acquisition
– Learners are only presented examples in
interactions with Caretakers
• Two-word Sentences
– cf. 1 or 2 word sentence period in infants’
language acquisition.
(not always subject-verb, though)
18. 2017-08 Dwango AI Lab. 17
Discussion – Learning
• Approval as Reward
– In human learning: Smiling, etc.
• Internal Reward
– When Caretaker follows Learner’s direction
⇔ Goal Achieved
• Babbling (random choice) was necessary
& reinforced.
• Modes {reaction, spontaneous action, and
direction} could be learned
– But not in the scope of the current experiment
19. 2017-08 Dwango AI Lab. 18
Conclusion
• Related Research
– Language Emergence with Artificial Agents
• Steels, Vogt, Sugita, et al.
• The current experiment is rather learning existing language.
• Further directions
– More realistic experiments would require reference to
actual human language acquisition.
– Symbol grounding problem
– Learning language models
• E.g., LSTM
• Language use as System of choices
cf. Functional Grammar (MAK Halliday)
20. 2017-08 Dwango AI Lab. 19
EOP
Thank you very much for your attention!