Kinect krishna kumar-itkan

1,213 views

Published on

Kinect presentation Krishna Kumar 8/11/11

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,213
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
56
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • I’d like to introduce to you Kinect for Xbox 360 Where YOU are the controller. No gadgets, no gizmos, just you! Kinect brings games and entertainment to life in extraordinary new ways without using a controller. Imagine controlling movies and music with the wave of a hand or the sound of your voice. With Kinect, technology evaporates, letting the natural magic in all of us shine. http://www.xbox.com/en-US/kinect
  • A few inspiration points from the creators of Kinect.
  • So who likes playing video games? Who thinks gaming controllers are really easy to use? How long do you think it would take for you to become an expert at all of these buttons and win games? If you could just turn on the game and play and be pretty good at the game, do you think you’d probably play more video games? The purpose of Kinect is to make XBox more accessible to a broader audience. The Kinect team focused on making XBox so easy to use that anyone could jump in and play and not have to worry about reading any instructions or learning all the different controller buttons and permutations to be great at the game. They wanted to make beginners feel like experts. Kinect is designed so anyone can play, whether they are a kid, an adult, no matter how much gaming experience you have, how old you are -- you can jump in a play right away. Imagine your little brother or sister, or your grandparents trying to play an Xbox game without having to learn which button does what?
  • So, as we said in the last slide, instead of learning all the right buttons to click on the console, make the game understand YOU. That’s Kinect! Make gaming more accessible. Open up gaming to others. Use what you know. Don’t need to learn. But there’s also another unique element to Kinect and that is making gaming more social. Traditionally you would have your hard core gamers sitting alone in front of their game with their console firing away at the next alien, racing away in their own world for hours, etc. With Kinect, gaming is actually bringing people together in a fun, collaborative way, where watching your friends and family play is actually really entertaining. And playing with others using Xbox Live is a very social gaming experience. People are laughing and joining in even if they aren’t playing, so much that they want to get up and play themselves.
  • What is Kinect? Let’s start with the name… Where did the name Kinect come from? “kinetic” which means to be in motion, and "connect" meaning it "connects you to the friends and entertainment you love”! Kinect has Voice Recognition Kinect uses four strategically placed microphones within the sensor to recognize and separate your voice from the other noises in the room, so you can control movies and more with your voice. Kinect has Gesture Recognition , through a Motion Sensor Kinect uses a motion sensor that tracks your entire body. So when you play, it’s not only about your hands and wrists. It’s about all of you. Arms, legs, knees, waist, hips and so on. It also includes Skeletal Tracking As you play, Kinect creates a digital skeleton of you based on depth data. So when you move left or right or jump around, the sensor will capture it and put you in the game. Kinect has Facial Recognition Kinect ID remembers who you are by collecting physical data that’s stored in your profile. So when you want to play again, Kinect will know it’s you, making it easy to jump in whenever you want. In a nutshell – YOU Recognition!
  • What is Kinect? Let’s start with the name… Where did the name Kinect come from? “kinetic” which means to be in motion, and "connect" meaning it "connects you to the friends and entertainment you love”! Kinect has Voice Recognition Kinect uses four strategically placed microphones within the sensor to recognize and separate your voice from the other noises in the room, so you can control movies and more with your voice. Kinect has Gesture Recognition , through a Motion Sensor Kinect uses a motion sensor that tracks your entire body. So when you play, it’s not only about your hands and wrists. It’s about all of you. Arms, legs, knees, waist, hips and so on. It also includes Skeletal Tracking As you play, Kinect creates a digital skeleton of you based on depth data. So when you move left or right or jump around, the sensor will capture it and put you in the game. Kinect has Facial Recognition Kinect ID remembers who you are by collecting physical data that’s stored in your profile. So when you want to play again, Kinect will know it’s you, making it easy to jump in whenever you want. In a nutshell – YOU Recognition!
  • Build out this slide – - Kinect knows what to do - The camera captures you and your movements, voice, etc. - It’s programmed to analyze images, look for basic human form and identify about 32 essential body parts such as your head, torso, hips, knees, elbows and thighs. - Create your Avatar - You’re ready to play!
  • Let’s have a look at the Kinect Sensor. What are those things on the sensor? There’s a RGB camera, a depth sensor and a multi-array microphone. When you first start up Kinect, it reads the layout of your room and configures the play space you'll be moving in. Then, Kinect detects and tracks 32 points on each player's body, mapping them to a digital reproduction of that player's body shape and skeletal structure, including facial detail. Let’s take a look at each component separately to help you understand how it all works together… [next few slides go into more detail]
  • An infrared projector combined with a monochrome CMOS sensor allows Kinect to see the room in 3-D (as opposed to inferring the room from a 2-D image) under any lighting conditions. Depth is determined by projecting invisible infrared (IR) dots into a room. Let’s see how that might look…(next slide)
  • Source: www.ros.org Depth is recovered by projecting invisible infrared (IR) dots into a room. The way the optical system works, on a hardware level, is fairly basic. A class 1 laser is projected into the room. The sensor is able to detect what's going on based on what's reflected back at it. Together, the projector and sensor create a depth map. You can see in this picture the couch is further away from the Kinect sensor than the player’s hand, so the infrared dots on the couch aren’t as bright white as those on the person. This is also very helpful when there are other’s in the room watching the game. The Kinect sensor will use the depth sensors to determine the person sitting on the couch in the distance isn’t playing the game and their movements won’t interfere with the player’s movements. 320×240 depth stream
  • Source: http://nuit-blanche.blogspot.com/2010/11/unsing-kinect-for-compressive-sensing.html
  • There’s also an RGB Camera. Does anyone know what RGB means? This video camera aids in facial recognition and other detection features by detecting three color components: R ed, G reen and B lue. The "RGB camera" is referring to the color components it detects. It’s similar to the web cam you see on computers and laptops today and it’s used for the sharing memories feature of Kinect which captures pictures while you’re playing! It is also used for Video Kinect which we’ll talk about a little later. What else do you think is part of the Kinect sensor?
  • The sensor also has EARS!! The Multi-array microphone is an array of four microphones that can isolate the voices of the players from the noise in the room. This allows the player to be a few feet away from the microphone and still use voice controls. These microphones focus on sound we care about and throw away the noise. When you first plug in Kinect it steps through an accoustic set up. Kinect is bouncing sound and listening to how it sounds to accoustically map your room. There is also a voice recognition component of Kinect. Most voice recognition available today is push to talk. No buttons with Kinect – you can talk to the controller and it recognizes speech!
  • There’s also a motorized tilt. The Kinect sensor will adjust using this motorized tilt so it can recognize all shapes and sizes of players. When you first turn on Kinect, you’ll see the sensor move up and down to find the players.
  • Color VGA video camera - This video camera aids in facial recognition and other detection features by detecting three color components: red, green and blue. Microsoft calls this an "RGB camera" referring to the color components it detects. Depth sensor - An infrared projector and a monochrome CMOS (complimentary metal-oxide semiconductor) sensor work together to "see" the room in 3-D regardless of the lighting conditions. Complementary metal–oxide–semiconductor (CMOS) (pronounced /ˈsiːmɒs/) is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits. CMOS technology is also used for several analog circuits such as image sensors, data converters, and highly integrated transceivers for many types of communication Multi-array microphone - This is an array of four microphones that can isolate the voices of the players from the noise in the room. This allows the player to be a few feet away from the microphone and still use voice controls. What comes in the box Kinect sensor for Xbox 360 Power supply cable User's manual Wi-Fi extension cable Kinect Adventures game Color VGA Motion Camera 640 x 480 pixel resolution at 30FPS Depth Camera 640 x 480 pixel resolution at 30FPS Array of 4 microphones supporting single speaker voice recognition Put it all together with a VERY IMPORTANT piece that makes it all possible – SOFTWARE!! Kinect's software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you'll be moving in. Then, Kinect detects and tracks 32 points on each player's body, mapping them to a digital reproduction of that player's body shape and skeletal structure, including facial details. http://electronics.howstuffworks.com/microsoft-kinect3.htm http://www.popsci.com/gadgets/article/2010-01/exclusive-inside-microsofts-project-natal Kinect Software Learns from "Experience" Kinect's software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you'll be moving in. Then, Kinect detects and tracks 48 points on each player's body, mapping them to a digital reproduction of that player's body shape and skeletal structure, including facial details [source: Rule ]. In an interview with Scientific American, Alex Kipman, Microsoft's Director of Incubation for Xbox 360 , explains Project Natal's approach to developing the Kinect software. Kipman explains, "Every single motion of the body is an input," which creates seemingly endless combinations of actions [source: Kuchinskas ]. Knowing this, developers decided not to program that seemingly endless combination into pre-established actions and reactions in the software. Instead, it would "teach" the system how to react based on how humans learn: by classifying the gestures of people in the real world. To start the teaching process, Kinect developers gathered massive amounts of data from motion-capture in real-life scenarios. Then, they processed that data using a machine-learning algorithm by Jamie Shotton, a researcher at Microsoft Research Cambridge in England. Ultimately, the developers were able to map the data to models representing people of different ages, body types, genders and clothing. With select data, developers were able to teach the system to classify the skeletal movements of each model, emphasizing the joints and distances between those joints. An article in Popular Science describes the four steps Kinect's "brain" goes through 30 times per second to read and respond to your movements [source: Duffy ]. The Kinect software goes a step further than just detecting and reacting to what it can "see." Kinect can also distinguish players and their movements even if they're partially hidden. Kinect extrapolates what the rest of your body is doing as long as it can detect some parts of it. This allows players to jump in front of each other during a game or to stand behind pieces of furniture in the room.
  • http://research.microsoft.com/apps/video/default.aspx?id=139295
  • http://research.microsoft.com/apps/video/default.aspx?id=139295 © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION
  • So, where did this idea of this system where you are the controller come from? Where did the technology behind the system get it’s start? Let me share a bit of background about the technology behind Kinect. Microsoft Research (MSR) did a lot of research back in 2007 on Human Body Tracking. They spent a lot of time and effort and ended up producing this video that you see here. While it seems pretty accurate, it really was quite limited in the range of motion it could track, it wasn’t real time, and couldn’t work with multiple people/players. It was a start, and then some gamers from Xbox gave MSR a call… © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION
  • In 2008 someone from Xbox called Microsoft Research. They saw the published human body tracking work highlighted on the previous slide and they said they needed a computer body tracker for one of their new Xbox Games. They talked about all of the other things they wanted this tracker to be able to do – it needed to track all body motions, it needed to be 10 times faster than real-time, it must support multiple players and it must be 3D. They asked if MSR could help them build it. Well, Microsoft Research said it couldn’t be done.   But the Xbox team had some game programmers that had already been trying to develop a system that could do human body tracking. They sent a video to Microsoft Research of what they had developed and the research team was truly inspired by what they saw. So they teamed up and decided to make this work! Imagine those teams getting together – PHD’s from Microsoft Research meets Xbox gaming developers…those must have been some awkward first meetings!!
  • The first thing they did was collected a lot of data. Xbox sent a team of people to households in about 10 countries where they went into their living rooms and asked them to pretend they were playing on this video. They captured terabytes of information. That gave them data of different sizes of living rooms, backgrounds, different sizes of people. They then went to a Hollywood motion capture studio and asked them to generate billions of computer generated images of humans based on the many different hairstyles, clothing, different poses, lighting, shapes and sizes the team collected across the globe. They took all of this data and used it to teach the computer. See examples of the training data in the next slide. (details highlighted in this article) http://www.popsci.com/gadgets/article/2010-01/exclusive-inside-microsofts-project-natal
  • Here are some examples of the training data (images of different human poses). The idea was this – if they can feed the computer enough data—in this case, millions of images of people—it can learn for itself how to understand it. That saves programmers the near-impossible task of coding rules that describe all the zillions of possible movements a body can make.
  • http://research.microsoft.com/en-us/projects/DryadLINQ/ DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters. So, the painstaking task of the Xbox team (the gathering of pictures of people in many different poses) generated the massive amounts of training data. They ran this data through huge clusters of computers (shown here) where the learning “brain” of Kinect resides to “learn” the many different human body movements.
  • The part of Kinect that the player sees looks like a Webcam, but it’s the software inside that Microsoft casually refers to as “the Brain” that makes sense of the images captured by the camera. It’s programmed to analyze images, look for basic human form and identify about 30 essential body parts such as your head, torso, hips, knees, elbows and thighs. What's the brain thinking as it watches you jump around, swinging imaginary bats or head-butting imaginary soccer balls? As you stand in front of the camera, it judges the distance to different points on your body. Then the brain guesses which parts of your body are which. So you can see here in this image, the bold colored boxes are the probable guesses that the green square is the players head, the pink and light blue squares are the players hands, etc.
  • Once Kinect has determined it has enough certainty about enough body parts to pick the most probable skeletal structure, it outputs that shape to a simplified 3D avatar (you can see the avatar images on the bottom right) Then it does this all over again—30 times a second! As you move, the Kinect “brain” generates all possible skeletal structures at each frame, eventually deciding on, and outputting, the one that is most probable. This thought process takes just a few milliseconds, so there's plenty of time for the Xbox to take the info and use it to control the game. Here’s the programmers view of the different images and probabilistic matching going on to eventually give you your Kinect Avatar!
  • The end result = the game platform is born!
  • Before we start playing, let’s see what type of Play Space is recommended for Kinect. Kinect needs to be able to see your entire body. - Clear the area between the sensor and the players. - If there is only one player: Stand back 6 feet (1.8 m). - If there are two players: Stand back 8 feet (2.4 m). - Make sure that the play space is at least 6 feet (1.8 m) wide, and not wider or longer than 12 feet (3.6 m).
  • You’ll also need to be sure that the lighting in the room is good enough to be able to detect the players. Good lighting - Make sure your room has enough light so that your face is clearly visible and evenly lit. Try to minimize side or back lighting, especially from a window. - Illuminate players from the front, as if you were taking a picture of them. - Make sure the room is brightly lit. Poor lighting - Some lighting conditions can make it difficult for Kinect to identify you or track your movements. - For best results, avoid positioning either the players or the sensor in direct sunlight.
  • There are also some clothing considerations to keep in mind. As we learned earlier, the sensor is detecting points on each player’s body. If clothing is hiding any points the body, for example, a skirt may be hiding your knees, then the player may have difficulty playing. [review other bullets above]
  • Kinect with more than just games: With Xbox LIVE, a whole world of extraordinary entertainment experiences awaits, including streaming music, HD movies, live sporting events, Facebook, Twitter, Video chat and more. Use your voice or a wave of your hand to: - Video Kinect with others* - Manage your media gallery - Music with Last.fm* - HD movies with Zune - Get in the game with ESPN*
  • Here’s an example of Video Kinect. Two families: one in LA, one in Dallas talking over Kinect using Video.
  • The families watching a video together.
  • You can also navigate through HD movies with Kinect and Zune.
  • Can you think of other great uses for Kinect?
  • Source: iFixit
  • Kinect krishna kumar-itkan

    1. 1. Where you are the controller
    2. 2. Krishna Kumar, Sr. Developer Evangelist - Academic [email_address]
    3. 3. Started as a $30,000 prototype Vision: Shift the world from thinking “We need to understand technology” to " Technology needs to understand us "
    4. 4. <ul><li>Option A: </li></ul>Why Kinect ?
    5. 5. Why Kinect ? <ul><li>Option You: </li></ul>
    6. 6. What is Kinect ?
    7. 7. What is Kinect ? <ul><li>An extraordinary new way to play, </li></ul><ul><li>where you are the controller </li></ul>Voice Recognition Face Recognition You Recognition Gesture Recognition “ Xbox”
    8. 8. Kinect knows what to do! “ Xbox?!” “ Let’s Play!”
    9. 9. “ What are those things?” ① ③ ②
    10. 10. “ What are those things?” 3D Depth Sensors ① ③
    11. 11. Projected Invisible IR pattern
    12. 12. Depth Computation
    13. 13. Depth Map
    14. 14. “ What are those things?” RGB Camera ②
    15. 15. “ What are those things?” Multi-array Microphone
    16. 16. “ What are those things?” Motorized Tilt
    17. 17. <ul><li>Combination of RGB camera, depth sensor and multi-array microphone </li></ul><ul><ul><li>RBG camera delivers three basic color components </li></ul></ul><ul><ul><li>Depth sensors “sees” the room in 3-D </li></ul></ul><ul><ul><li>Microphone locates voices by sound and extracts ambient noise </li></ul></ul><ul><li>Software makes all the magic possible </li></ul><ul><ul><li>Skeletal Tracking </li></ul></ul><ul><ul><li>Face, Gesture Recognition </li></ul></ul><ul><ul><li>Audio Echo cancellation </li></ul></ul><ul><ul><li>Audio Beam Forming </li></ul></ul><ul><ul><li>Speech Recognition </li></ul></ul>
    18. 19. Scope of Microsoft Research <ul><li>Significant Investment </li></ul><ul><ul><li>Investing > $9B in R&D (MSR & product dev) </li></ul></ul><ul><li>Staff of over 850 in 55 research areas </li></ul><ul><li>International Research lab locations : </li></ul><ul><ul><li>Redmond, Washington (Sept, 1991) </li></ul></ul><ul><ul><li>San Francisco, California (1995) </li></ul></ul><ul><ul><li>Cambridge, United Kingdom (July, 1997) </li></ul></ul><ul><ul><li>Beijing, People’s Republic of China (Nov, 1998) </li></ul></ul><ul><ul><li>Mountain View, California (July, 2001) </li></ul></ul><ul><ul><li>Bangalore, India (January, 2005) </li></ul></ul><ul><ul><li>Cambridge, Massachusetts (February, 2008) </li></ul></ul><ul><li>Turning ideas into reality. </li></ul>research.microsoft.com
    19. 20. Scope of Microsoft Research <ul><li>Research Areas </li></ul>research.microsoft.com
    20. 21. How does Kinect know what I do? “ Xbox?!” “ Let’s Play!”
    21. 22. Microsoft Research: Object Recognition J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. European Conference on Computer Vision, 2006
    22. 23. Microsoft Research: Human Body Tracking <ul><li>Wide range of motion </li></ul><ul><li>But limited agility </li></ul><ul><li>And not real-time </li></ul><ul><li>Infinite number of movements </li></ul>R Navaratnam, A Fitzgibbon, R Cipolla The Joint Manifold Model for Semi-supervised Multi-valued Regression IEEE Intl Conf on Computer Vision, 2007
    23. 24. XBox calls MSR: September 2008 <ul><li>“ We need a body tracker with </li></ul><ul><ul><li>All body motions… </li></ul></ul><ul><ul><li>All agilities… </li></ul></ul><ul><ul><li>10x Real-time… </li></ul></ul><ul><ul><li>For multiple players… </li></ul></ul><ul><ul><li>… and it has to be 3D  ” </li></ul></ul><ul><li>MSR’s response? </li></ul>
    24. 25. Teach the Computer/Machine Learning <ul><li>Step 1: Collect A LOT of Data </li></ul><ul><ul><li>Teams visit households across the globe, filming real users </li></ul></ul><ul><ul><li>Hollywood motion capture studio generates billions of CG images </li></ul></ul>
    25. 26. Training Data
    26. 27. Training <ul><li>Millions of training images -> millions of classifier parameters </li></ul><ul><ul><li>Very far from “embarrassingly parallel” </li></ul></ul><ul><ul><li>New algorithm for distributed decision-tree training </li></ul></ul><ul><ul><li>Major use of DryadLINQ </li></ul></ul><ul><ul><ul><li>available for download </li></ul></ul></ul>Distributed Data-Parallel Computing Using a High-Level Programming Language M Isard, Y Yu International Conference on Management of Data (SIGMOD), July 2009
    27. 28. Recognize Joint Angles <ul><li>Classify each pixel’s probability of being each of 32 body parts </li></ul><ul><li>Determine probabilistic cluster of body configurations consistent with those parts </li></ul><ul><li>Present the most probable to the user </li></ul>t=1 t=2 t=3
    28. 29. Programmers View
    29. 30. Programmers View
    30. 31. A Platform is Born
    31. 32. Consumer Technologies Push The Envelope Price: $6000 Price: $150
    32. 33. Play Space Field of View and Operational Area <ul><li>Play Space : Ideally need 12ft x 12ft of play space though you can make do with 10ft x 10ft </li></ul><ul><li>Player Position : Ideally is 6-10 feet away from camera </li></ul>
    33. 34. Lighting and Environment <ul><li>Fluorescent or LED lighting are recommended </li></ul><ul><li>No direct light on player </li></ul><ul><li>No direct light into sensor lens </li></ul><ul><li>In a stage environment, all lights need to be Infrared-filtered </li></ul><ul><li>To avoid lighting noise do not intersect sensor lens fields of view </li></ul><ul><li>Avoid playing in/next to reflective surfaces </li></ul>
    34. 35. Clothing Considerations <ul><li>Avoid anything that conceals your arms or legs </li></ul><ul><li>Avoid wearing flowing clothing such as scarves or long dresses and skirts </li></ul><ul><ul><li>Long skirts hide the legs and scarves are often mistaken for arms </li></ul></ul><ul><li>Avoid baggy jackets or overly baggy clothing </li></ul><ul><li>Generally, anything that hides the human form should be removed for optimal game play </li></ul><ul><li>If players with long hair are having difficulty playing, encourage them to pull their hair back and try playing again </li></ul>
    35. 36. Kinect with more than just games <ul><li>Use your voice or a wave of your hand to: </li></ul><ul><ul><li>Video Kinect with others* </li></ul></ul><ul><ul><li>Manage your media gallery </li></ul></ul><ul><ul><ul><li>Music with Last.fm* </li></ul></ul></ul><ul><ul><ul><li>HD movies with Zune </li></ul></ul></ul><ul><ul><li>Get in the game with ESPN* </li></ul></ul>* with Xbox LIVE Gold membership
    36. 37. XBOX LIVE More Ways to Connect with Family and Friends VIDEO KINECT FAMILY CENTER SOCIAL NETWORKS <ul><li>Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat </li></ul><ul><li>Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan. </li></ul><ul><ul><li>Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location </li></ul></ul><ul><ul><li>Ensure safe, secure fun for the whole family </li></ul></ul><ul><ul><li>Connect with friends, share photos and updates through Facebook and Twitter </li></ul></ul>
    37. 43. ESPN Home-field advantage in your living room <ul><li>Access over 3,500 live global events from ESPN3.com, including out-of-market programming plus fresh video clips from ESPN.com </li></ul><ul><li>Enjoy features like HD programming and on-demand viewing, participate in polls, predictions and trivia. </li></ul><ul><li>See what the Xbox LIVE community is watching and declare what team you’re rooting for </li></ul><ul><li>With Kinect™ control the action right from your couch with just your voice or the wave of your hand </li></ul><ul><li>Featured Content: </li></ul><ul><ul><li>NCAA Football, NCAA Basketball, College Bowl Games, NBA, MLB, Soccer, Golf and Tennis majors </li></ul></ul>
    38. 48. Where can Kinect go? <ul><li>Air Guitar Hero? </li></ul><ul><li>Shopping in 3D? </li></ul><ul><li>Remote Replacement? </li></ul><ul><li>Dance Instructor? </li></ul><ul><li>Education? </li></ul><ul><li>Personal Trainer? </li></ul><ul><li>Physical Therapy? </li></ul>“ Xbox?”
    39. 51. The Kinect SDK <ul><li>Provides both Unmanaged and Managed API </li></ul><ul><ul><li>Unmanaged API – Concepts work in C++ </li></ul></ul><ul><ul><li>Managed API – Concepts work in both VB/C# </li></ul></ul><ul><li>Samples & documentation to get you started </li></ul><ul><li>Assumes some programming experience </li></ul><ul><li>http://research.microsoft.com/kinectsdk/ </li></ul>
    40. 52. The Kinect Sensor <ul><li>A hybrid device containing the following input devices: </li></ul><ul><ul><li>A color (RGB) camera </li></ul></ul><ul><ul><li>A depth sensor </li></ul></ul><ul><ul><li>A microphone array </li></ul></ul><ul><ul><li>A tilt sensor </li></ul></ul><ul><li>Play space control is done through a tilt motor </li></ul><ul><ul><li>Pitch +/- 27 degrees </li></ul></ul>
    41. 53. RGB CAMERA MULTI-ARRAY MIC MOTORIZED TILT 3D DEPTH SENSORS
    42. 54. Kinect USB cable
    43. 55. The Innards
    44. 56. The Vision System IR laser projector IR camera RGB camera
    45. 57. Kinect video output <ul><li>30 HZ frame rate; 57deg field-of-view </li></ul>8-bit VGA RGB 640 x 480 12-bit monochrome 320 x 240
    46. 58. The Audio System
    47. 59. Demo: Multichannel Echo Cancellation Input Stream (What the mic array hears) Post-MEC (What APIs present) MEC
    48. 60. The Kinect SDK <ul><li>Provides access to: </li></ul><ul><ul><li>RGB feed </li></ul></ul><ul><ul><li>Depth feed </li></ul></ul><ul><ul><li>Skeletal Tracking capabilities </li></ul></ul><ul><ul><li>Audio Beam data </li></ul></ul><ul><ul><li>Speech Recognition </li></ul></ul>
    49. 61. Data Streams <ul><li>Color stream at 640x480 resolution; 32BPP </li></ul><ul><li>Depth stream at 320 x 240 resolution; 16BPP </li></ul><ul><li>Skeletal Joint positions </li></ul><ul><li>Frame #s, TimeStamps, Tilt sensor data </li></ul><ul><li>Echo-canceled audio </li></ul><ul><li>Higher level systems </li></ul><ul><ul><li>Speech recognition </li></ul></ul>
    50. 62. RGB Camera Fundamentals
    51. 63. Camera Data
    52. 64. RGB stream Format <ul><li>Upto 640 x 480 resolution </li></ul><ul><li>Upto 32 bits per pixel </li></ul><ul><li>Data contained in ImageFrame.Image.Bits </li></ul><ul><li>Array of bytes public byte [] Bits; </li></ul><ul><li>Array </li></ul><ul><ul><li>Starts at top left of image </li></ul></ul><ul><ul><li>Moves left to right, then top to bottom </li></ul></ul>
    53. 65. Stride Stride - # of bytes from one row of pixels in memory to the next
    54. 66. Demos::RGB Camera
    55. 67. Depth Camera Fundamentals
    56. 68. Camera Data
    57. 69. Depth Map Format <ul><li>320 x 240 resolution </li></ul><ul><li>16 bits per pixel </li></ul><ul><ul><li>Upper 13 bits: depth in mm: 800 mm to 4000 mm range </li></ul></ul><ul><ul><li>Lower 3 bits: segmentation mask </li></ul></ul><ul><li>Depth value 0 means unknown </li></ul><ul><ul><li>Shadows, low reflectivity, and high reflectivity among the few reasons </li></ul></ul><ul><li>Segmentation index </li></ul><ul><ul><li>0 – no player </li></ul></ul><ul><ul><li>1 – skeleton 0 </li></ul></ul><ul><ul><li>2 – skeleton 1 </li></ul></ul><ul><ul><li>… </li></ul></ul>
    58. 70. Depth Byte Buffer <ul><li>ImageFrame.Image.Bits </li></ul><ul><li>Array of bytes public byte [] Bits; </li></ul><ul><li>Array </li></ul><ul><ul><li>Starts at top left of image </li></ul></ul><ul><ul><li>Moves left to right, then top to bottom </li></ul></ul><ul><ul><li>Represents distance for pixel </li></ul></ul>
    59. 71. Calculating Distance <ul><li>2 bytes per pixel (16 bits) </li></ul><ul><li>Depth – Distance per pixel </li></ul><ul><ul><li>Bitshift second byte by 8 </li></ul></ul><ul><ul><li>Distance (0,0) = ( int )(Bits[0] | Bits[1] << 8 ); </li></ul></ul><ul><li>DepthAndPlayer Index – Includes Player index </li></ul><ul><ul><li>Bitshift by 3 first byte (player index), 5 second byte </li></ul></ul><ul><ul><li>Distance (0,0) = ( int )(Bits[0] >> 3 | Bits[1] << 5 ); </li></ul></ul>
    60. 72. Demos::Depth Camera
    61. 73. Skeletal Tracking Fundamentals
    62. 74. Human Depth Sensing Object pattern similarity determines disparity
    63. 75. Kinect Depth Sensing IR pattern similarity determines disparity IR Projector IR Camera
    64. 76. Provided Data
    65. 77. Pipeline Architecture Title Space
    66. 78. Skeleton API
    67. 79. Joints <ul><li>Maximum two players tracked at once </li></ul><ul><ul><li>Six player proposals </li></ul></ul><ul><li>Each player with set of <x, y, z> joints in meters </li></ul><ul><li>Each joint has associated state </li></ul><ul><ul><li>Tracked, Not tracked, or Inferred </li></ul></ul><ul><li>Inferred - Occluded, clipped, or low confidence joints </li></ul><ul><li>Not Tracked - Rare, but your code must check for this state </li></ul>
    68. 80. Provided Data <ul><li>Depth and segmentation map </li></ul>
    69. 81. Depth Map Format <ul><li>320 x 240 resolution </li></ul><ul><li>16 bits per pixel </li></ul><ul><ul><li>Upper 13 bits: depth in mm: 800 mm to 4000 mm range </li></ul></ul><ul><ul><li>Lower 3 bits: segmentation mask </li></ul></ul><ul><li>Depth value 0 means unknown </li></ul><ul><ul><li>Shadows, low reflectivity, and high reflectivity among the few reasons </li></ul></ul><ul><li>Segmentation index </li></ul><ul><ul><li>0 – no player </li></ul></ul><ul><ul><li>1 – skeleton 0 </li></ul></ul><ul><ul><li>2 – skeleton 1 </li></ul></ul><ul><ul><li>… </li></ul></ul>
    70. 82. Demos::Skeletal Tracking
    71. 83. Audio Fundamentals
    72. 84. Going Inside the Kinect <ul><li>Four microphone array with hardware-based audio processing </li></ul><ul><ul><li>Multichannel echo cancellation (MEC) </li></ul></ul><ul><ul><li>Sound position tracking </li></ul></ul><ul><ul><li>Other digital signal processing (noise suppression and reduction) </li></ul></ul>
    73. 85. Audio Data
    74. 86. Speech Recognition <ul><li>Grammar – What we are listening for </li></ul><ul><ul><li>Code – GrammarBuilder, Choices </li></ul></ul><ul><ul><li>Speech Recognition Grammar Specification (SRGS) </li></ul></ul><ul><ul><ul><li>C:Program Files (x86)Microsoft Speech Platform SDKSamplesSample Grammars </li></ul></ul></ul><ul><li>Note: Set AutomaticGainControl = false </li></ul>
    75. 87. Grammar <ul><li><!-- Confirmation_YesNo._value: string [&quot;Yes&quot;, &quot;No&quot;] --> </li></ul><ul><li>< rule id =&quot;Confirmation_YesNo&quot; scope =&quot;public&quot;> </li></ul><ul><li>< example > yes </ example > </li></ul><ul><li>< example > no </ example > </li></ul><ul><li>< one-of > </li></ul><ul><li>< item > </li></ul><ul><li>< ruleref uri =&quot;#Confirmation_Yes&quot; /> </li></ul><ul><li></ item > </li></ul><ul><li>< item > </li></ul><ul><li>< ruleref uri =&quot;#Confirmation_No&quot; /> </li></ul><ul><li></ item > </li></ul><ul><li></ one-of > </li></ul><ul><li>< tag > out = rules.latest() </ tag > </li></ul><ul><li></ rule > </li></ul><ul><li></ rule > </li></ul><!-- Confirmation_Yes._value: string [&quot;Yes&quot;] --> < rule id =&quot;Confirmation_Yes&quot; scope =&quot;public&quot;> < example > yes </ example > < example > yes please </ example > < one-of > < item > yes </ item > < item > yeah </ item > < item > yep </ item > < item > ok </ item > </ one-of > < item repeat =&quot;0-1&quot;> please </ item > < tag > out._value = &quot;Yes&quot;; </ tag >
    76. 88. Demos::Audio
    77. 89. [email_address]

    ×