(발제) Grounding words in perception and  action  computational model. +TREDNS in Cognitive Sciences 2005 -Deb Roy /최유진 x2011 autum
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

(발제) Grounding words in perception and action computational model. +TREDNS in Cognitive Sciences 2005 -Deb Roy /최유진 x2011 autum

on

  • 211 views

 

Statistics

Views

Total Views
211
Views on SlideShare
211
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

(발제) Grounding words in perception and action computational model. +TREDNS in Cognitive Sciences 2005 -Deb Roy /최유진 x2011 autum Presentation Transcript

  • 1. 최유진
  • 2. Grounding words in perception and action: computational model. Deb Roy TRENDS in Cognitive Sciences Vol.9 No.8 August 2005 Thursday, October 13, 2011
  • 3. Language English Russian Korean French Chinese Japanese Portuguise Indian Germane Spanish Arabic Thursday, October 13, 2011
  • 4. Oneʼs language = Oneʼs perspective on the world Makes a language of machines with that of humans. Human - communicate with - Machine Thursday, October 13, 2011
  • 5. Deb Roy Associate Professor of Media Arts and Sciences Director, Cognitive Machines Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. To enable this work, he has pioneered new data-driven methods for analyzing and modeling human linguistic and social behavior. : artificial intelligence, cognitive modeling, human-machine interaction, data mining and information visualization http://www.ted.com/talks/deb_roy_the_birth_of_a_word.html Thursday, October 13, 2011
  • 6. We use words to communicate about the things and kinds of things, their properties, relations and actions. Analogy between Human and Machine. - Researches in robotics and simulated systems uses : Ground language in machine perception and action = Human abilities. - Research Tradition in computational model moves from : purely symbolic level to connecting symbolic to physical realm of the real world referents. : purely symbolic model context-dependent . Index. 1. Words about the physical world. 2. Association between words and perceptual categories. 3. Modeling context-dependent word use. 4. Models of infant word learning that process ʻfirst-person-perspectiveʼ sensory data. 5. Richer representational structures : grounding verbs in physical action. 6. Integration of action and perception in grounding nouns. 7. Conclusions 0. Research Background Thursday, October 13, 2011
  • 7. 1. Words about the physical world. • Is human language is like dictionary? computational model symbolic . Real-world referents : ? • Computational model and embodied nature of language : Complex crossmodal phenomena --> particularly useful in situated language acquisition. (physical env.) (object and activities) . • Implication of the study : the possibility of machines to autonomously acquire and verify beliefs about the world, and to communicate in natural language about their beliefs. ROUND PUSH HEAVY Visual feature Motor control feature Haptic feature Thursday, October 13, 2011
  • 8. 2. Words - Perceptual Categories : Salient Linguistic Feature 2.1 Language grounding system & categorization. Sensory input Natural language description.translation : continuous sensor input (vectors) -- linguistic categories e.g. Generative and discriminative models of categorization. (a). Two prototypes can ʻcompeteʼ (b), leading to a category boundary along points of equal distance from both prototypes (if non-Euclidean distance measures are used, non-linear boundaries may emerge). Categories may also be modeled by explicitly representing categorical boundaries. In (c), a linear model, f(height)=A*width + B, encodes the same categorical distinction as the prototypes in (b) Thursday, October 13, 2011
  • 9. 2. Words - Perceptual Categories : Salient Linguistic Feature 2.2. Models of color naming : Is perceptual model is fixed? Mojsilovicʼs early model : . , . in different context. “Purple”“Red” “Red wine” Thursday, October 13, 2011
  • 10. 3. Words - Perceptual Categories : Context-dependent Word Use 3.1 Gardenforʼs model : Color distance How linguistic convention and visual perception combine to determine word meanings. : Arbitrary linguistic convention within perceptual color constraints. e.g. ʻRed wineʼ in Spanish : ʻvinto tintoʼ(colored wine,literally) in Catalan : ʻvino negroʼ(black wine) red(tinto) black(negro) linguistic convention (arbitrary) Gardenfor red white . : Distance between white and red(dark) wine > between white and white (light) wine (in the context-independent prototype) Thursday, October 13, 2011
  • 11. 3. Words - Perceptual Categories : Context-dependent Word Use 3.2 Reiger : Spatial Distance : studied graded acceptability judgments of 1) spatial terms. For English speakers , how they perceive the term “Above” in conjunction with the physical context. ʻ The circle is above the blockʼ : Q_ a, b, c ? “Above” L1 : Connects the centers of the mass of the regions. L2 : Connects the closest points between the regions. L1 of (b) = L1 of (c) L2 of (a) = L2 of (b) L1 L2 . ) above near . Thursday, October 13, 2011
  • 12. 3. Words - Perceptual Categories : Context-dependent Word Use 3.2 Reiger : CONT. 2) movements : simple movies of objects moving relative one another to visually ground words s.a. ʻthroughʼ and ʻintoʼ. e.g. ʻPutting a key into a lockʻ vs. ʻRemoving a key from a lockʼ : events distinguished by their initial points vs. end points. 3.3 Limitation in spatial semantics and further studies. - Lack of functional contexts e.g. ʻclean behind the couch( )ʼ ʻhind behind the couchʼ( ) behind . Thursday, October 13, 2011
  • 13. 4. Models of Infant word learning that process ʻfirst-person- perspectiveʼ sensory data 4.1. Cross-channel early lexical learning(CELL) “Step into the shoes” of humans and learn natural sensory data. : Directly process recordings from natural human environments became enabled without manual transcription. CELL Computational Model : (visual categories) (spoken words) . - A model of learning words from sights and sounds. CELL vs. Blinded system : 50% accuracy rate gaps! Thursday, October 13, 2011
  • 14. 4. Models of Infant word learning that process ʻfirst-person- perspectiveʼ sensory data 4.1. Cross-channel early lexical learning(CELL) Method : Lexical Learning Analysis 1) STM : Utterance-Context pair : audio-visual input audio -phonetic representations of spoken sequences : linguistic unit video- context: visually observable object and motion : semantic(contextual) unit 2) LTM - Lexical candidates utterance are decomposed into a set of hypothesized linguistic unit prototype contexts are decomposed into a set of hypothesized semantic category prototypes e.g. bounce - ball , ruf-ruf - dog, vrrooom - car...shoes, truck Limitation : 1) Noises from sensory processes 2) Semantically Inappropriate candidates e.g. ʻyeahʼ Thursday, October 13, 2011
  • 15. 1. word - perception : indirect processing - purely semantic - context-dependent 2. first-person perspective : direct processing - CELL(single object at once) - Eyegaze(multiple objects at once) 3. whatʼs next? VERB = ACTION. Thursday, October 13, 2011
  • 16. 5. Richer representational structures : grounding verbs in physical action. Verbs that refer to physical actions are naturally grounded in representations that encode the temporal flow of events. 5.1 Siskind : Perceptually grounded model of verbs - sequences of human hands moving colored blocks. (video recorded) - , , , , (contact, support, attachment) [Talmyʼs theory of force dynamics] - semantics of basic verbs = temporal schema, an expected sequences of force dynamic interactions. e.g. ʻHands pick up blockʼ table-supports-block hand-contacts-block hand-attached-block hand-supports-block 1 2 3 4 subject verb object * Allen relations : 13 logical pairs of time interval between A and B 5.2. Bailey et al. developed a system that learns verb semantics and action control structure, ʻX-schemaʼ. - e.g.Difference between ʻPushʼ and ʻShoveʼ Thursday, October 13, 2011
  • 17. 6. Integration of action and perception in grounding nouns. 6.1. Roy : structure networks of motors and sensor primitives : conversational robot named Ripley. ʻHand me the blue one on your rightʼ - Ripley maintain a dynamic mental model, three-dimensional model of physical environment : , , ( ) - the contents of the robotʼs mental model maybe updated based on linguistic,visual,or haptic input. (Ripley remember the position of the object when it is out of its sensory field.) - multimodal sensory expectation : When Ripley do something What visual system expects Look at the location Find the visual region Reaches to the location Touch and grasp the object Grasps the objects control over object locationlocation info. updated Thursday, October 13, 2011
  • 18. 6. Integration of action and perception in grounding nouns. 6.1. Roy : CONT. Ripleyʼs representations and algorithms approches to the grounds the meaning of verbs,adjectives,and nouns using a unified representational system. VERB motor-control like X-schemes actions ADJECTIVES object : All perceptual properties corresponds to actions. red =/ color categories = categories linked to motor programs ADJECTIVES object : All perceptual properties corresponds to actions. heavy = haptic categories linked to specific actions. NOUNS Objects linked with locations Ball - Round (or color,size..) - All of actions involved. Thursday, October 13, 2011
  • 19. 7. Conclusions - Interaction between word use, perception, and action - Further research (Box 3): other aspects of the language such as grammatical composition and functional use in social context. - Re-unite sub-fields of AI : from computer vision, parsing, information retrieval, machine learning, and planning. - Drop in cost of sensor and robotic technology, and ubiquitous situated computing : create new forms of situated human-machine communication. Thursday, October 13, 2011