Advertisement
Advertisement

More Related Content

More from Deanna Kosaraju(20)

Advertisement

Voices 2015 - Spatial Temporal Reasoning Over Play-Scripts for Artificially Intelligent Characters

  1. Spatio-Temporal Reasoning Over Play-Scripts for Artificially Intelligent Characters Christine Talbot Richard Burton in Hamlet, directed by Sir Gielgud http://www.youtube.com/watch?v=XRU5yLgs0zw&feature=player_detailpage
  2. Background 2
  3. of 49 Virtual Character Positioning 3 Back- ground EMA: A process model of appraisal dynamics (Stacy C. Marsella, Jonathan Gratch), In Journal of Cognitive Systems Research, volume 10, 2009. Ada and Grace: Toward Realistic and Engaging Virtual Museum Guides (William Swartout, David Traum, Ron Artstein, Dan Noren, Paul Debevec, Kerry Bronnenkant, Josh Williams, Anton Leuski, Shrikanth Narayanan, Diane Piepol, H. Chad Lane, Jacquelyn Morie, Priti Aggarwal, Matt Liewer, Jen-Yuan Chiang, Jillian Gerten, Selina Chu, Kyle White), In Proceedings of the 10th International Conference on Intelligent Virtual Agents (IVA 2010), 2010.
  4. of 49 Mocap Files and Hand-Coding 4 Back- ground Discovery News – Avatar: Motion Capture Mirrors Emotions http://news.discovery.com/videos/avatar-making-the-movie/
  5. of 49 BML and BML Realizers 5 Back- ground SmartBody Path Planning http://smartbody.ict.usc.edu Hamlet played by robots Unity using SmartBody MindMakers Wiki http://www.mindmakers.org/ projects/bml-1-0/wiki/Wiki
  6. of 49 <act><participant id="GRAVEDIGGER2" role="actor" /><bml><gesture lexeme="POINT" target="GRAVEDIGGER1" /></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><speech id="sp1" ref="“ type="application/ssml+xml">Give me leave!</speech></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><gesture lexeme="POINT" target="GRAVEDIGGER2" /></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><gesture lexeme="POINT" target="GRAVE" /></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><speech id="sp1" ref="" type="application/ssml+xml"> Here lies the water -- good? </speech></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><gesture lexeme="POINT" target="GRAVE" /></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><speech id="sp1" ref="" type="application/ssml+xml"> Here stands the man -- good! </speech></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><speech id="sp1" ref="" type="application/ssml+xml"> If the man go to this water and drown himself, it is willynilly he goes, mark you that! But, </speech></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><gesture lexeme="POINT" target="GRAVE" /></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><speech id="sp1" ref="" type="application/ssml+xml">if the water come to HIM and drown him, he drowns not him-self; Argal, he that is not guilty of his own death shortens not his own life!</speech></bml></act> <act><participant id="GRAVEDIGGER1" role="actor" /><bml><locomotion target="GRAVE" type="basic" manner="walk" /></bml></act> <act><participant id="GRAVEDIGGER2" role="actor" /><bml><speech id="sp1" ref="" type="application/ssml+xml">But is this LAW ?</speech></bml></act> Still a Lot of Work… 6 Back- ground 4 hours & 12 minutes for a 10 minute scene!! Point Speak Move
  7. of 49 So How Do We Do It? 7 Back- ground A: Excuse me… B: Can I help you? A: Yes, where is the post office? B: Go straight and turn left. A: Where do I turn left? B: Turn left at the bus stop - you can’t miss it. A: Thank you very much! B: No problem.
  8. Play-Scripts 8
  9. of 49 GRAVEDIGGER1 Give me leave! (GRAVEDIGGER2 sits on the side steps) (Pointing down into the grave) Here lies the water--good? (Pointing to the table ledge) Here stands the man-good! (Illustrating each point literally with his hands) If the man go to this water and drown himself, it is willy-nilly he goes, mark you that! But, (Pointing first to the grave, then to the ledge) if the water come to HIM and drown him, he drowns not him-self; (Greatly pleased with his own logic) Argal, he that is not guilty of his own death shortens not his own life! (He goes behind the barricade down into the grave and prepares to dig) GRAVEDIGGER2 (Trying to disprove him) But is this LAW? Play-Scripts 9 Play- Scripts Character Directions Stage Directions Stage Directions Character Directions Character Directions
  10. of 49 The Baseline Play- Scripts Hamlet Act 5, Scene 3 (Graveyard Scene) Richard Burton in Hamlet, directed by Sir Gielgud http://www.youtube.com/watch?v=XRU5yLgs0zw&feature=pla yer_detailpage 10 400 BML Commands 4 hours & 12 minutes 10 minute scene
  11. of 49 11 Play- Scripts Sentence Subject NP Actor/Noun VP VP Action/Verb NP Target/Noun Example Nouns: GRAVEDIGGER1 GRAVEDIGGER2 HAMLET HORATIO Steps Grave Audience Center stage Stage left Example Verbs: Move to Follow Look at Pick up Put down Speak Point to Example: (Pointing down into the grave) Actor = current speaker Verb = point Target = grave Annotation Parsing
  12. of 49 What did it look like? 12 Play- Scripts
  13. of 49 How Did We Do? 13 Play- Scripts HamletGraveDigger1 Ground Truth Simple NLP Method Character Traces Over Time for Entire Graveyard Scene C. Talbot and G. M. Youngblood. Spatial Cues in Hamlet. In Proceedings of the 12th International Conference on Intelligent Virtual Agents, IVA '12, pages 252-259, Berlin, Heidelberg, 2012. Springer-Verlag.
  14. Spatial Rules 14
  15. of 49 What’s Next?  Applying Spatial Rules  Conversational Spatial Rules  Grouping Spatial Rules  Theatre Rules  General Rules 15 Spatial Rules E. Sundstrom and I. Altman. Interpersonal Relationships and Personal Space: Research Review and Theoretical Model. 1976 Counter-Crossing
  16. of 49 Architecture 16 Spatial Rules BML
  17. of 49 Architecture 17 Spatial Rules
  18. of 49 Rules Engine Logic 18 Spatial Rules
  19. of 49 Position Results 19 Spatial Rules C. Talbot and G. M. Youngblood. Shakespearean Spatial Rules. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS '13, pages 587-594, Richland, SC, 2013.
  20. of 49 Position Results 20 Spatial Rules C. Talbot and G. M. Youngblood. Shakespearean Spatial Rules. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS '13, pages 587-594, Richland, SC, 2013.
  21. Implied Movements 21
  22. of 49 Grave Digger 1 Initiative 22 Implied Mvmt
  23. of 49 Implied Motion 23 Implied Mvmt To be, or not to be— that is the question: Whether 'tis nobler in the mind to suffer …. I should move towards the audience for my monologue
  24. of 49 24 Implied Mvmt Information Captured  For Each Line of Speech:  Movement by Speaker or Other Character  Number of Lines Spoken Before  Number of Lines Spoken After  Annotation Before  Annotation After  Number of Lines since Last Movement  Number of Repeated Words  Number of Upper Case Words  Punctuation Counts  Parts of Speech Counts  Type of Movements:  Fighting  Jumping  Gestures  Object Manipulations  Locomotion  Pointing  Posture  Gaze
  25. of 49 25 Implied Mvmt Machine Learning  RTextTools in R  Maximum Entropy  Random Forests  Boosting  SVM  Specific Movements  General Movements  Any Movement By Speaker  Any Movement at All
  26. of 49 26 Implied Mvmt Learning Combinations  Movement Classifications  Specific Movements  Movement High-Level Categories  Big Movements  Any Movements  N-Gram Sizes  Unigrams  Bigrams  Trigrams  4-grams  5-grams  Training Sizes  Even Split Training / Testing  Even Split of Positive Examples for Training / Testing  Feature Sets  Text Only  POS Counts Only  POS Counts & Text  POS Counts & Contextual Features  All Features
  27. of 49 27 Implied Mvmt Evaluation Criteria  Overall Accuracy  Recall  Precision  F1 score  F0.5 score  Matthews Correlation Coefficient  ROC curves
  28. of 49 28 Implied Mvmt Evaluation Criteria  Overall Accuracy  Recall  Precision  F1 score  F0.5 score  Matthews Correlation Coefficient  ROC curves
  29. of 49 29 Implied Mvmt Evaluation Criteria  Overall Accuracy  Recall  Precision  F1 score  F0.5 score  Matthews Correlation Coefficient  ROC curves
  30. of 49 Best Performing 30 Implied Mvmt Boosting SVM MaxEnt RandForest Any Mvmt, POS, Unigrams Any Mvmt, No Text, Unigrams Gestures, All, 4-grams Any Mvmt, Text, Unigrams C. Talbot and G. M. Youngblood. Lack of Spatial Indicators in Hamlet. In Florida Artificial Intelligence Research Society Conference, FLAIRS '13, pages 154-159. Association for the Advancement of Artificial Intelligence, 2013.
  31. of 49 Best Performing 31 Implied Mvmt Boosting SVM MaxEnt RandForest Random Any Mvmt, POS, Unigrams Any Mvmt, No Text, Unigrams Gestures, All, 4-grams Any Mvmt, Text, Unigrams C. Talbot and G. M. Youngblood. Lack of Spatial Indicators in Hamlet. In Florida Artificial Intelligence Research Society Conference, FLAIRS '13, pages 154-159. Association for the Advancement of Artificial Intelligence, 2013.
  32. Incorporating Human Characters 32
  33. of 49 So Far… 33 Humans Sentence Subject NP Actor/Noun VP VP Action/Verb NP Target/Noun Speech Movement Grouping Spatial Rules Conversational Spatial Rules Theatre Rules General Rules MindMakers Wiki http://www.mindmakers.org/ projects/bml-1-0/wiki/Wiki SmartBody Path Planning http://smartbody.ict.usc.edu
  34. of 49 Adding a Human  Move correctly, on-time  Move correctly, wrong time  Move incorrectly, on-time  Move incorrectly, wrong time  Don’t move at all 34 Humans
  35. of 49 35 Humans  Equilibrium of Forces  Aesthetically Balanced  Easy to See Nodes  Crossings-Free (some)  Fixed Nodes  Varying Relationships Based on Data  Can be Arranged in Pre-defined Shapes (some) Force-Directed Graphs (FDGs) T. M. J. Fruchterman, Edward, and E. M. Reingold. Graph Drawing by Force- Directed Placement. Software: Practice and Experience, 21(11):1129{1164, 1991.
  36. of 49 Force-Directed Graph Structure  Node Representations:  Characters  Human  Target/Marks/Pawns  Audience  Central Grouping Point 36 Humans A H T A H T
  37. of 49 Force-Directed Graph Structure  Node Representations:  Characters  Human  Target/Marks/Pawns  Audience  Central Grouping Point  Linkages  Characters – Humans/Characters  Characters – Targets/Marks/Pawns  Characters – Audience  Characters – Central Grouping Point  Humans – Central Grouping Point  Central Grouping Point - Audience  Humans - Audience 37 Humans A H T
  38. of 49 Force-Directed Graph Functions  Adding Characters  Characters Leaving  Moving Characters  Human Moves 38 Humans B H T T A
  39. of 49 A Force-Directed Graph Functions  Adding Characters  Characters Leaving  Moving Characters  Human Moves 39 Humans B H T T
  40. of 49 Force-Directed Graph Functions  Adding Characters  Characters Leaving  Moving Characters  Human Moves 40 Humans B H T T A
  41. of 49 Force-Directed Graph Functions  Adding Characters  Characters Leaving  Moving Characters  Human Moves 41 Humans B H T
  42. of 49 42 Humans Forces and Time δ= distance between nodes L = length of stage depth α = constant C. Talbot and G. M. Youngblood. Positioning Characters Using Forces. In Proceedings of the Cognitive Agents for Virtual Environments Workshop (CAVE 2013) collocated with AAMAS (W08). IFAMAAS (International Foundation for Autonomous Agents and Multi-agent Systems), 2013.
  43. of 49 Evaluation Approaches  Optimal arrangement based on current relationships  Time-based / sequential arrangement through entire scene  User evaluation of appropriate positioning 43 Humans
  44. of 49 Humans Arrangement Based Upon Relationships 44
  45. of 49 Humans Arrangement Based Upon Relationships 45
  46. of 49 Humans Arrangement Based Upon Relationships 46
  47. of 49 47 Humans Evaluation Criteria Appropriate arrangement based on current relationships  Even Vertex Distribution  Measure character distances  Small Number of Vertices  Count number of vertices  Fixed Vertices  Measure distance from targets/marks  Centering and Encircling of Groups  Comparison to semi-circular shape
  48. of 49 48 Humans Results 100s of Random Relationship Scenarios  Even Vertex Distribution  3.14 feet (SD=1.54) between characters  Small Number of Vertices  At most 40 vertices in graph, with 12 characters  Fixed Vertices  3.30 feet (SD=1.52) from target  Centering and Encircling of Groups  Characters formed nice semi-circles C. Talbot and G. M. Youngblood. Application of Force-Directed Graphs on Character Positioning. In Proceedings of the Spatial Computing Workshop (SCW 2013) collocated with AAMAS (W09), pages 53-58. IFAMAAS (International Foundation for Autonomous Agents and Multi-agent Systems), 2013.
  49. of 49 Incorporating Forces for Time- Based Arrangements 49 Humans
  50. of 49 Evaluation Criteria  Occlusion  Clustering 50 Humans
  51. of 49 51 Humans ResultsCase# Case Description Avg Occlusion Average Clustering X Average Clustering Y 0Baseline All AI 3.60% 19.50% 14.60% 1Baseline Human 90% 3.60% 19.10% 15.40% 2Baseline Human 50% 2.90% 20.00% 14.70% 3Baseline Human 10% 4.40% 30.90% 28.70% 4Forces All AI 2.40% 16.80% 14.60% 5Forces Human 90% 2.40% 16.80% 14.60% 6Forces Human 50% 1.60% 20.40% 13.80% 7Forces Human 10% 2.40% 20.80% 14.00% C. Talbot and G. M. Youngblood. Scene Blocking Utilizing Forces. In Florida Artificial Intelligence Research Society Conference, FLAIRS '14, pages 91-96. Association for the Advancement of Artificial Intelligence, 2014.
  52. of 49 52 Humans ResultsCase# Case Description Avg Occlusion Average Clustering X Average Clustering Y 0Baseline All AI 3.60% 19.50% 14.60% 1Baseline Human 90% 3.60% 19.10% 15.40% 2Baseline Human 50% 2.90% 20.00% 14.70% 3Baseline Human 10% 4.40% 30.90% 28.70% 4Forces All AI 2.40% 16.80% 14.60% 5Forces Human 90% 2.40% 16.80% 14.60% 6Forces Human 50% 1.60% 20.40% 13.80% 7Forces Human 10% 2.40% 20.80% 14.00% C. Talbot and G. M. Youngblood. Scene Blocking Utilizing Forces. In Florida Artificial Intelligence Research Society Conference, FLAIRS '14, pages 91-96. Association for the Advancement of Artificial Intelligence, 2014.
  53. User Studies 53
  54. of 49 Block World 3D Representation 54 User Studies
  55. of 49 Survey Questions 1. Characters showed evidence of engaged listening 2. Characters appeared to perform suitable movements on cue 3. The pace of the performance was too fast 4. The pace of the performance was too slow 5. The use of the space on stage was appropriate 6. The blocking (positioning and timing of the characters) was appropriate 7. There was adequate variety in the staging positions of the characters 8. The characters’ movement onstage during the performance was believable in the context of the performance 9. The performance is free from distracting behavior that does not contribute to the scene 10. The arrangement of the performers appropriately conveys the mood of the scene 11. The character movements provide appropriate dramatic emphasis 12. There is adequate variety and balance in the use of the performance space 13. All visible behaviors appear to be motivated and coordinated within the scene 14. The characters were grouped to give proper emphasis to the right characters at the right time 15. The characters frequently covered or blocked each other from your point of view 16. The movements of the characters were consistent with the play 17. There was a great deal of random movement 18. The characters’ reactions to other characters were believable 19. Characters showed a lack of engagement when listening 20. The arrangement of the performers contradicts the mood of the scene 21. The more prominent characters in the scene were hidden or masked from your view 22. The characters were too close together 23. The characters were too far apart 24. The stage space was not utilized to its full potential 25. All characters were visible from your point of view throughout the scene 55 User Studies
  56. of 49 Results 56 User Studies StronglyDisagreeNeutralStronglyAgree Mean
  57. of 49 Results 57 User Studies StronglyDisagreeNeutralStronglyAgree Mean
  58. Planned Future Work 58
  59. of 49 Planned Future Work  Additional User Studies (shortened)  Random  Baseline  NLP  NLP + Rules  NLP + Rules + FDGs  Human Interaction User Studies  Baseline  NLP + Rules + FDGs  Generalization  Identify play-types based on organization  Apply & evaluate techniques for up to 10 of these 59 Planned
  60. Summary 60
  61. of 49  Proposed Play-Scripts  Applied NLP  Added Rules Engine  Evaluated Speech for Implied Movement  Incorporated Human-Controlled Characters  Added FDGs and Algorithms  Created Spatial Performance Evaluation  Initial User Study Summary 61 Summary
  62. of 49 Christine Talbot ctalbot1@uncc.edu Questions?  C. Talbot. Creating an Artificially Intelligent Director (AID) for Theatre and Virtual Environments. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS '13, pages 1457-1458, Richland, SC, 2013. International Foundation for Autonomous Agents and Multi-agent Systems.  C. Talbot and G. M. Youngblood. Spatial Cues in Hamlet. In Proceedings of the 12th International Conference on Intelligent Virtual Agents, IVA '12, pages 252-259, Berlin, Heidelberg, 2012. Springer-Verlag.  C. Talbot and G. M. Youngblood. Application of Force-Directed Graphs on Character Positioning. In Proceedings of the Spatial Computing Workshop (SCW 2013) collocated with AAMAS (W09), pages 53-58. IFAMAAS (International Foundation for Autonomous Agents and Multi-agent Systems), 2013.  C. Talbot and G. M. Youngblood. Lack of Spatial Indicators in Hamlet. In Florida Artificial Intelligence Research Society Conference, FLAIRS '13, pages 154-159. Association for the Advancement of Artificial Intelligence, 2013.  C. Talbot and G. M. Youngblood. Positioning Characters Using Forces. In Proceedings of the Cognitive Agents for Virtual Environments Workshop (CAVE 2013) collocated with AAMAS (W08). IFAMAAS (International Foundation for Autonomous Agents and Multi-agent Systems), 2013.  C. Talbot and G. M. Youngblood. Shakespearean Spatial Rules. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS '13, pages 587-594, Richland, SC, 2013. International Foundation for Autonomous Agents and Multi-agent Systems.  C. Talbot and G. M. Youngblood. Scene Blocking Utilizing Forces. In Florida Artificial Intelligence Research Society Conference, FLAIRS '14, pages 91-96. Association for the Advancement of Artificial Intelligence, 2014. 62 Questions Selected Bibliography Highlighting This Work

Editor's Notes

  1. Hi, my name is Christine Talbot. I have been working with Dr. Youngblood on this work, entitled “Directing Virtual Humans Using Play-script Spatiotemporal Reasoning”.
  2. Current state of the art technology with virtual characters focuses on emoting, nonverbal behaviors, and interacting with humans. However, these characters typically stand in place or side by side, performing little to no spatial movements. For instance, Ada & Grace interact with museum goers and answer questions about themselves and virtual characters. Below, the Gunslinger prototype incorporates emotional reactions and behavior generation within an interactive application. Other solutions include automated algorithms for spatial navigation or the use of waypoints, however these are less scripted and more reactive. We want the process to be scripted and on cue, but not too rigid either.
  3. Many games utilize modularized low-level code to move characters about in an environment. This requires extensive technical skill to translate basic high-level actions, as well as extensive time to write all that code. Everything is hard-coded on what can be done & when it will occur. Another option that is used today are mocap files. These can capture a lot of detail and be very realistic and include facial emotions. However, it requires some expensive tools, appropriate actors, and relevant “environments” to use this method. But it’s not very dynamic. Every situation must be recorded for the exact situation you are trying to simulate. It’s very hard to pre-record every scenario that could occur. There are other hybrid techniques being researched at universities using mocap files, but these still require human intervention to solve the problem.
  4. Another newer option utilizes BML and BML Realizers. These also require some lower-level coding, but abstracts and parameterizes a lot of this motion. It allows things to be a little more dynamic since the motions are repeatable and can be reused.
  5. The problem is that this method still requires a game-writer to write very specific and detailed steps. For instance, here you can see some basic BML which controls two characters near the beginning of the Graveyard scene in Hamlet. There are two gravediggers, one points at the other one, then proceeds with a speech that includes a lot of gesturing and pointing, followed by moving towards the grave. With BML, you must specify where the character looks, when they look there, how they move, when they move, and when they should pick up / put down objects. This one ten-minute scene of Hamlet took over 4 hours to hand-map the BML. Very time-consuming. Again, not everyone is doing this by hand, but most of those people are focusing on generating characters that stand side by side and talk to each other. We’re more interested in where they will be when they talk to each other, as well as their movement throughout a scene.
  6. So what do humans do? Well, we often give directions to other people on how to get somewhere or how to do something. These directions are often more vague, like these. There isn’t specific information of go this far, walk this way, look in this direction, bus stop is on your left vs your right, etc. So why can’t we do the same for our characters? Stand in center of room
  7. Well, we do use something like this in plays. We give high-level directions to the actors for where to go, what to do, and what to say. These are called play-scripts, as seen here. We don’t have to invent a new way to write spatial directions. Play-scripts are used in almost all theatre productions today. Why couldn’t something like this work for us? Wouldn’t this save us a ton of time in writing new scripts for characters in virtual worlds? We believe so, and have based our approach on utilizing such play-scripts.
  8. So what did we do? Well, we took a very famous production of the play Hamlet starring Richard Burton and directed by Sir Gielgud on Broadway in 1964. <click> We hand-mapped the movements of all the characters in the Graveyard scene, seen here, and translated them into specific BML commands in 2D. This means we made sure we captured each character’s position, where they were looking, what they pointed at, what they picked up or put down, and when throughout the scene. This took about 400 BML commands and 4 hours and 12 minutes to do for a 10 minute scene from Hamlet. This was our baseline.
  9. We also had access to a detailed annotated version of Gielgud’s Hamlet production (great find for any Shakespeare play). Shakespeare plays are notorious for not providing any blocking commands, which has lead to the wide array of interpretations of his scripts on stage. We took this script, annotated by Sir Gielgud and Richard Sterne for the Graveyard scene, along with the Natural Language Toolkit to parse the character blocking into 2D-mappable motions. We worked with a simplistic approach that assumed that most statements in a play-script are actor-verb-target and not necessarily compound or complete sentences. For example, we took “Pointing down into the grave” and parsed it into the current speaker as the actor, action was to point, target was the grave. This approach of part of speech tagging with named entity recognition worked well in translating approximately 75% of the spatial cues in the script.
  10. So we created a 2D BML realizer to visualize our play. We mapped it out to the same dimensions as the original stage used on Broadway. This is what it looked like for the same scene we saw recorded from the Broadway show… Talk about the pointing & facing direction before moving on
  11. So we compared the character traces over time for each character through the entire scene. We wanted to be in the right section of the stage, looking in the right general direction, pointing in the right general direction, and be carrying objects or placing them in the right section of the stage – all at the right time. Looking at the traces, we did pretty well. We were able to position the characters in the right place at the right time about 75% of the time. The discrepancies you see from the left vs the right are due to the un-annotated actor’s interpretation of the script which remained as part of the baseline but was not available in the annotations for the script. We managed to save over 4 hours of time by using this simplistic natural language procedure, and still got these characters doing the right thing at the right time most of the time. Yes, it took about the same amount of time to write the code to do this, but we can now use this code for other scenes without additional cost.
  12. So now that we have a reasonable starting point for positioning our characters, how do we make it better? How do we incorporate the actor’s interpretations to what is explicitly written in the script? One key addition is to better understand some of the implied spatial rules that exist in the theatre. For instance, psychology research has shown that the more comfortable you are with someone, the closer you will stand to them. Also, when looking at larger groups, it has been shown that people will gravitate towards a circular formation to provide a sense of inclusion for all participants. Next, we look towards theatre rules which include the avoidance of upstaging other characters, and results in counter-crossing movements. Finally, we have to include some common sense rules, such as looking at who is speaking, or what someone is pointing to.
  13. Previously, we showed our input as a play-script which we applied natural language processing to in order to provide BML commands to control our characters onstage.
  14. With our new rules engine that covers grouping, conversational, theatre, and general rules embedded between our natural language processor and our BML generation, we are able to apply better spatial positioning on the stage.
  15. So how does the rules engine work? After interpreting the annotations from the natural language processor, we are able to tell what type of actions should be performed by the characters. Based on that action, we add additional BML commands, such as gaze or locomotion, or make adjustments based on our rules about those movements. For instance, when a character points at something, we add additional BML commands to have the other characters look at what is being pointed at. To avoid the upstaging, we check a character’s end target position for a locomotion command to adjust other characters’ positions onstage, as appropriate. All of these commands (the original ones, as well as the added/modified ones) are then sent on to the BML Realizer to control the character.
  16. As you can see here, during our same 10 minute scene of Hamlet, we were able to positionally block the characters very accurately. To accommodate characters being “close enough” on the stage, we split the stage up into 9 segments: combinations of upstage, downstage, stage left, and stage right.
  17. We still see the one blip <click> with the GraveDigger1 where the actor Gravedigger1 walks towards the audience, then turns around and heads back towards the grave. This movement was not annotated in the play-script and therefore was not performed by our rules-based characters. It highlights one aspect of the actor's initiative to improvise despite the directions provided by the script, and confirms that actors do embellish beyond the play-script. However, we were still able to position the characters correctly 89% of the time, from our previous 75%.
  18. This made us ask the question of whether there was something else within the play-script that might influence an actor to perform specific actions during a performance. For example, the movement seen here was not annotated in the play-script and therefore was not performed by our rules-based characters. It highlights one aspect of the actor's initiative to improvise despite the directions provided by the script, and confirms that actors do embellish beyond the play-script. Notice the gravedigger is holding the lantern, and the play-script says to put it down, however the gravedigger decides to walk towards the audience and put it down on the other side of the steps, instead of placing it where he is currently. There is nothing in the play-script that instructed him to make that movement before putting the lantern down.
  19. So we decided to look to see if there was some implied motion because of what the character was saying at the time. We were not looking for someone giving instructions, but more whether there was some sort of movement implied because of what was said, how long the speech was, where they were onstage when they were saying it, and so forth. Therefore, we decided to try some machine learning on the actual speech of the play. We took the 1964 production of Hamlet on Broadway – longest running at 138 performances, and hand mapped all the movement through the entire 3 hour play.
  20. We decided to capture this information about the speech: Length of speech Word Counts Repeated Word Counts Punctuation Counts Time since last movement Part of Speech Counts And captured this information for every line of the script in the Hamlet play (over 3000 lines of speech & 3 hours of video) along with the actual movements performed with each line of speech. These movements were captured at as low a level as possible, but were grouped into the high-level categories you see here. They were captured per line of speech, not as full sentences, to encourage the identification of particular phrases that imply movement, as well as to provide as granular-leveled information on actor movements.
  21. Then, we tested it out with these machine learning methods: SVM RF MaxEnt Boosting We saw that the majority of speech phrases did not incur any movement within the play, and some of the categories of movement did not occur very often, even with our higher-level categorization. The starred actions are the higher categories we focused on, and the chart does not include many of the movements which occurred less than 40 times within the 3 hour play. Based on what we saw here, we could tell that it would a simple classifier would just guess no movement and be correct more than half the time, regardless of any features used.
  22. We also looked at what might help us tell the difference between these instances. We tried different types of movement from very low-level specific movements such as running versus walking, as well as higher level categories of movement such as Locomotion, Big Movements, or Any movement at all. We also looked at different n-grams to see if it helped with capturing phrases of particular lengths that might imply movements. Because of the skewness of the movement versus no movement speech lines, we played with the training set to ensure it had a more evenly balanced movement vs no movement dataset to learn on, and avoid learning to guess no movement. Finally, we tried different combinations of feature sets from the information we captured. We focused on the speech itself, parts of speech, contextual features (like when moved last, length of speech), and varying combinations of these sets.
  23. Using these different combinations, we attempted to evaluate their success in learning movements. We started with overall accuracy <click> which showed how many were classified correctly. These numbers look good, but in many cases, it was merely because of the classifiers guessing no movement.
  24. Since we are more concerned with ensuring that if we classify a line as an implied movement that there really should be a movement with that line, we should look at Precision next <click>. You can see we only recognize a movement correctly about half the time.
  25. Ultimately, we want to differentiate how much we were outperforming the random classifier. Therefore we looked at the Matthews Correlation Coefficient (MCC)<click>. This measure is useful when class distributions are very skewed like ours is, and returns a value between -1 and +1. +1 is a perfect prediction, 0 is the same as random, -1 is perfect incorrect prediction. This, along with some ROC curves, which we’ll look at next, helped us to identify that we were able to do slightly better than a random classifier in many instances, although not much better.
  26. These were some of the best performing combinations, however they barely did any better than random as you can see with these brown lines which represent the random classifier <click>
  27. Since we were able to accommodate approximately 90% of the positioning utilizing play-scripts, NLP, and rules, we did not expect to find much here. It is obvious from these results that there is not a signal related to a person’s speech that can account for the remaining 10% of the positioning. These other movements are likely due to an actor’s improvisation during a performance. Any Speaker Movement, POS only, 1807 split training, 1-grams – top left Any Speaker Movement, No Words, 1807 split training, 1-grams – top right Gesture Speaker Movements, All Features, 154 split training, 4-grams – bottom left Any Speaker Movement, Only Words, 1807 split training, 1-grams – bottom right Blue = Boosting Red = SVM Green = MaxEnt Magenta = Random Forests
  28. Returning to where we left off before our tangent on implied movements… our focus started with a simple play-script for providing basic high-level movement instructions for characters within a scene. <click> Utilizing this basic input, we extracted the spatial movements from the scripts via basic natural language processing techniques such as named entity recognition and part of speech tagging. <click> These movements were then fed into a rules engine to incorporate grouping, conversational, theatre, and other general rules to these movements. <click> Each movement was then parsed into BML commands to move the characters within the scene,<click> providing relatively low-level control of the characters while utilizing high-level instructions from play-scripts.
  29. So that’s great, but what happens when we introduce a human-controlled character? They may not choose to follow the script at all, or even if they do, they may make mistakes or even improvise their own interpretations. We cannot expect them to follow the script perfectly every time. Sometimes their movement may be done at the wrong time or not at all. So how do we handle these adjustments? We still need to make sure our characters follow appropriate blocking despite the human’s incorrect position. But, we want to make it look like the human’s movement is correct by making sure the AI characters adjust, all while avoiding losing the original integrity of the script.
  30. Fruchterman & Reingold 1991 algorithm So here is where we look to force-directed graphs to assist us. Force-directed graphs have been used for years for presenting aesthetically pleasing representations of complex graphs of data, such as networks and relationships. They utilize forces that interact on the position of each node in the graph and attempt to stabilize these forces between nodes. Several approaches exist, however our focus was on Fruchterman and Reingold’s algorithm from 1991 because of its use of forces not only between connected nodes, but also between all nodes in the graph, and its introduction of a cooling function to minimize oscillations.
  31. So how can we make FDGs work for character positioning? First, we need to understand how to map our needs into a graph representation. We need to represent all the characters in the scene by a node. <click> Of course, the human needs to be included in this mix as another node as well, but we won’t move this node. <click> Only the human’s movement will allow this node to change locations. Then, we must consider the relationships between each character and their mark on the stage (where they are supposed to be per the play-script). <click> So we must create nodes to represent these targets, including the pawns (such as a lantern, bench, etc) as well. However, these nodes aren’t going to be moveable, but should still have attractive forces with the character(s) that should be at that mark. Then we must consider the audience as an attractive force for the characters. <click>We want to make sure characters are drawn towards the audience instead of away from them. The trick here is that we don’t want to make this point unmoveable since it’s equally good to be 5 feet from the audience on the left side of the stage as in the center. So we’ll make this node a special case – fixed location in the y direction, but the x direction will keep the connection as a vertical line between the character and the audience. Finally, we know there is a force that pulls multiple characters together into a circle. <click> To accomplish this, we introduce a central grouping point that the characters can be drawn to. This point will have an attraction to the audience as well, which should help to enforce a more semi-circular shape than a circular shape, thereby preventing characters from turning their back on the audience.
  32. Each node will have specific relationships with each other. Relationships between the human character and an AI character (or multiple AI characters) need to enable a typical speaking distance of about 3 feet <click> Each character should be tightly tied to their target or mark on stage, so we will need a relationship between these nodes as well <click> We want the characters to be pulled towards the audience for greater visibility, so we will add a linkage there as well <click>. To centralize groups of characters into semi-circles, we will need to create linkages with the centroid <click>which will balance itself centrally between the characters in the x direction, but be pulled towards the audience in the y direction <click>. Finally, for the humans, we will need the same linkages to this central grouping point<click> and the audience <click>.
  33. Whenever a new character enters the stage, we need to create the new node and all its linkages.
  34. Therefore, we must create a connection to the audience <click> and to the character’s target, whether it is a pawn, or just a mark on the stage. <click> If they are the only character on the stage, we do not need to do anything else. If they are not the only person on the stage, then we need to create connections between the new character and each character onstage. <click> If they are the second person entering the stage, we also need to create our central grouping point and connect both the new character and the other character to this point as well. <click> If they are not the second person onstage, then the central grouping point will already be there, and we merely need to add a connection to it for our new character. If more than one character is entering the stage at the same time, then we need to introduce a stronger connection between the two characters as they are more likely to remain closer to each other initially.
  35. When a character leaves the stage, some basic cleanups are needed <click>, such as removing of their audience node <click>, removal of their target node and connection <click>, removal of their node and all connections to other characters <click>, and if only one person remains onstage, the removal of the central grouping point is needed as well.
  36. When a character moves, we need to remove any remaining linkage to previous marks and create a new linkage to their new mark on the stage. Initially these target connections will be strong, but will weaken over time to simulate the lowering importance of that mark as time progresses. The character’s audience node / connection will also need to be updated to ensure appropriate forces being applied. We will need to re-check the arrangements due to any human positioning with respect to the requested movement of the AI Character, and make any adjustments as needed. The special case here is when the human moves. At this point, we need to update their audience node, just like the AI characters. However, this will signal a reorganizing of the characters to ensure appropriate positioning due to the human’s location. All forces will be re-evaluated for a new equilibrium state, and AI Characters will adjust while the human character is moving. This will not create a connection to a target / mark location for the human since we cannot move where the human resides on the stage.
  37. So we’ve talked about how we structure the graph and how we modify the graph when certain events occur. Here, we start to talk about the actual forces that will play with each of these nodes and connections within the graph, as well as how they interact over time. <click> First, we have the AI Characters’ connections to each other. Since our typical conversation space is about 3 feet (from research done within psychology), we look to have both attractive forces to pull them together, and repellent forces to ensure they don’t get too close to each other. The attractive forces are made stronger between two characters whenever they enter the stage together. However, this extra attraction force will reduce over time to the standard AI Character forces. <click> With the humans, we want the characters pulled towards the human, but not as strongly as to each other, just in case the human is not following the script very well. <click> We want the characters to primarily reside within the front part of the stage, but not too close to the edge either. Therefore, we will have a force to pull them towards the audience, but keep them in the front half/quarter of the stage when possible. The center point will also be pulled towards the audience (but more strongly) to ensure groups are facing forward, and form a semi-circle. <click> The center point will also pull each character towards an even spacing around this point, <click> and the character’s marks will pull strongest of all (without any repelling).
  38. This leads us to the different aspects that need to be evaluated to fully vet these techniques with force-directed graphs. First, we need to determine if the graphs can produce an optimal arrangement based purely on the relationships between the characters, targets, and audience. Secondly, we need to evaluate how this optimal arrangement works over time through an entire scene. In other words, do the characters make too many movements because of the human factor? Do the characters oscillate between positions? How do the characters’ relationships adjust over time within a scene? Finally, we need to evaluate how realistic and appropriate the positioning is from a user’s perspective. This could include a director, a theatre-goer, a gamer, and so forth. This will be the ultimate evaluation of whether this approach, in conjunction with our prior work, can position our characters in a realistic and reasonable manner.
  39. Here we see a random positioning of characters and their relationships on stage. <click>This does not necessarily represent actual locations on the stage for the characters, but is mostly an initial representation of their relationships so we can verify what our force-directed graph will do with those relationships.
  40. To trigger the graph’s repositioning, we will have the human character move to this location here <click>.
  41. By doing this, we trigger the repositioning of the characters<click> and we end up with a nice arrangement of the characters that is balanced, facing the audience, and in a semi-circle. All this while maintaining a reasonable distance between character C’s mark onstage. Again, here we are focusing on optimal positioning of characters based solely on their relationships.
  42. Going back to our first evaluation, we look at whether the graphs can produce an optimal arrangement based purely on the current relationships. To determine this, we have established some key criteria that can be measured. These include: Even Vertex Distribution – are the characters reasonably spaced from each other? Is it balanced? Small Number of Vertices – force-directed graphs work best with small numbers of vertices, so we want to measure how many vertices we have with a maximum number of characters onstage, say 12. Fixed Vertices – are the characters remaining close to their target locations? Or are they being pulled away? We want them to be as close as we can be to their marks since there are key reasons within the playscript for the characters to position themselves there. Centering and Encircling of Groups – are the characters forming semi-circles facing the audience? Does the central point pull them towards the audience?
  43. To validate the basic positioning with force-directed graphs, we generated hundreds of random relationships and random numbers of characters. We calculated the distances, number of vertices, and so forth for each scenario. Based on these numbers, we used our evaluation criteria to determine the force-directed graph approach’s effectiveness. We saw that characters were evenly spaced on the stage, we had up to 40 vertices in our graphs for 12 characters, characters remained close to their marks, and formed semi-circles for the characters.
  44. Next, we want to incorporate these force-directed graphs into our previous architecture to determine their effectiveness through an entire scene. As you can see here, we start with the play-script, apply some natural language processing, add on some rule-based movements, then apply our force-directed graphs for any positional changes before going to the BML Realizer. It is important to note that each component relies on the previous component’s output to perform its adjustments to the character positioning. These positional changes can also come from the human-controlled character performing a movement, which triggers a recalculation for the AI-controlled characters.
  45. So how do we evaluate how well the force-directed graphs work within a scene? First, we have to look at occlusion. One of the key components of theatre is not occluding other actors on stage. Therefore, we will look to minimize the amount of occlusion of one character by another throughout the scene. Secondly, we want to make sure that characters have some sort of grouping, or clustering on the stage. The goal is not necessarily to utilize the entire stage, but to ensure that all characters appear to be in collusion with each other. We do not want one character to be hanging out in one corner and the rest at the other side of the stage. Again, we are looking to minimize the amount of space the characters occupy, keeping in mind the need for conversational space.
  46. To evaluate this, we applied our force-based approach to an entire scene of Hamlet. We calculated the amount of occlusion and clustering found with various cases. We started with a baseline run which utilized the hand-mapped blocking from the 1964 Hamlet production on Broadway. We can see that we have some minor occlusions of the characters on the stage with that production, as well as a fair amount of clustering in both dimensions of the stage as well. <click> When we made one of the characters a “human” by randomizing their movements, we see less clustering & more occlusion of the characters the greater our human error was. <click> Considering the scene we utilized has at most 3 characters onstage at any time, we expect to see normal clustering at approximately 28% if we utilized only conversational space for positioning the characters side-by-side in a row. So we see that there is less clustering of characters (or inclusion of the human) when nothing is done to compensate for the human’s incorrect movement. Remember, in this scenario, the AI characters move to their specified locations, regardless of what the human is doing. <click>
  47. When we take a look at our method of controlling all the characters to follow a play-script using rules and FDGs, we see that we are able to reduce the frequency of characters being occluded on the stage. We are able to better maintain our clustering of the characters, occupying less space than we saw with the baseline measurements. This shows that the force-directed graphs not only help to include the human, but is also able to maintain the integrity of the script.
  48. Now that we’ve performed our quantitative evaluations, we need to perform some qualitative evaluations. To do this, we performed a user study to compare the baseline production to our NLP production. The study requested participants to view a 15 minute video enacting the Graveyard scene from Hamlet within our blockworld, seen here. Some were presented the representation of the hand-mapped performance from 1964, while others were presented the NLP representation of the same play-script. The blockworld was used to avoid any bias of character representation and to enforce a greater focus on the spatial aspects of the performance.
  49. Questions came from judging criteria used in one-act plays at both the high school and collegiate level. Questions intentionally included both positive and negative versions of the questions, along with different wording to avoid any bias on question phrasing. Shown here is a listing of the expected correlations of the different questions. However, we found very little significant correlation between the questions during our user study. Additional questions were presented to the participants to help identify appropriate demographics, as well as open-ended questions to assist with other information not captured with the likert questions.
  50. These questions were presented in a randomized order after viewing the video, only if they knew the color of the intermission screen which appeared halfway through the performance, and had been on the video page for at least as long as the video was. This ensured that participants actually viewed the video before completing the survey. Over 748 people attempted the survey, but only 214 completed it due to these checks.
  51. There were some very small, but significant differences, such as these two categories of questions: Emphasis – Baseline viewers more strongly agreed there was proper emphasis of characters onstage Movement – NLP viewers more strongly agreed there was consistent movement of characters onstage Similarly, there were some slight differences with respect to the participants’ demographics, such as region, age, gender, culture, gaming frequency, theatre and Hamlet familiarity. However, the results showed that users were unable to differentiate between the real production and our NLP production. Some of this may be due to the users’ fatigue in watching a 15 minute video with very little action. This was noted in many users’ open-ended responses, such as “boring” and “slow”. We will take this feedback into consideration in our future user studies. We also saw that some of the questions had significant differences when taking into account different demographics, such as region, age, gender, culture, gaming frequency, theatre familiarity, and Hamlet familiarity. These covered 8 of the 11 categories of questions, and revealed expected differences between NLP and Baseline, such as better grouping and space usage with the Baseline video, but with only a subset of the participant population. It is likely participants experience fatigue during the survey since the video was 15 minutes in length and does not include a lot of action. By the time viewers got to the survey, many seemed to answer most questions consistently, which implies not much thought was involved. Open-ended questions also confirmed this with comments of “boring”, “slow”, etc. Based on this, we will need to reduce the length of the video to ensure less fatigue and more accurate responses in our future surveys.
  52. So where is this work headed? What is left to be done? Obviously, a continuation of our qualititative analysis is required to compare each of our components to a baseline performance, as well as a random performance. This should help to ensure our survey can detect a really bad performance, as well as determine if we can exceed the human-perceived threshold of a quality performance. It will also build upon what we learned from our first user study regarding fatigue. Next, we will want to qualitatively evaluate how our architecture can perform with a human-controlled character. We will want to compare both the baseline scene and the FDG scene, having the user control one of the characters, then evaluate how the AI characters’ performances were. Finally, we will need to determine some level of generalization of these techniques. We will identify the different types of plays/scenes, based on their organization, and apply our techniques to several of these play-types.
  53. In summary… It is challenging to position characters in virtual environments. Current work requires a lot of time or technical expertise to produce realistic positioning of characters. <click>We introduced the use of play-scripts for a high-level instruction set for virtual human characters. <click> We next applied some basic part of speech tagging and named entity recognition to extract the spatial movements from the play-scripts. <click> We introduced some basic rules to adjust the character positioning, thereby allowing us to replicate 89% of an actor’s movement. <click> We looked to identify a signal within the character’s speech that might imply movement within the scene. <click> Next, incorporated human-controlled characters. To do this, we introduced algorithms, structures, and forces for incorporating a human-controlled character within a scene. <click> To qualitatively evaluate the performances, we created a spatiotemporal survey based on one-act play judging criteria. <click> Then, we performed our first user study to evaluate both our questionnaire and our NLP approach. And finally, we reviewed the planned work to complete this dissertation.
  54. Thank you! – Any questions?
Advertisement