Bruno Capuano<br />MVP – Visual Studio ALM<br />b.capuano@avanade.com<br />Avanade<br />www.elbruno.com<br />Kinectfor Win...
XBOX llama a MSR: Septiembre 2008<br />Necesitamos un body tracker con<br />All body motions…<br />Allagilities…<br />10x ...
Carta a los reyes magos<br />Sin calibración<br /><ul><li>Sin pose para inicio/pausa
Sin calibración para el fondo
Sin calibración para el cuerpo</li></ul>Uso mínimo de la CPU<br />Independiente de la iluminación<br />3<br />
body size<br />hair<br />FOV<br />body type<br />clothes<br />angle<br />pets<br />furniture<br />Pruebas: The test matrix...
MSR & xBox: Machine Learning<br />Paso 1: Recolección de información<br />El equipo visita diferentes ubicaciones y se ded...
J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Ob...
Training<br />Millones de imágenescomoreferncias-> millones de parámetros de clasificación<br />Very far from “embarrassin...
Programmers View<br />
¿Qué es Kinect?<br />
¿QuéesKinect?<br />Voice Recognition<br />Una nueva forma de jugar, donde TU eres el mando<br />Gesture Recognition<br />F...
Opción A:<br />¿Porqué Kinect?<br />
¿Porqué Kinect?<br />Opción TU:<br />
¿Qué es Kinect?<br />Dispositivo que combina una cámara RGB, un sensor de profundidad y un array de micrófonos<br />Cámara...
¿Qué es Kinect?<br />①<br />③<br />②<br />
Source: iFixit<br />15<br />¿Qué es Kinect?<br />
¿Qué es Kinect?<br />3D Depth Sensors<br />①<br />③<br />
Invisible Infrared (IR) Dots <br />320x240<br />
¿Qué es Kinect?<br />RGB Camera<br />②<br />
19<br />¿Qué es Kinect?<br />IR laser projector<br />IR camera<br />RGB camera<br />Source: iFixit<br />
RGB Camera<br />Se utiliza para el reconocimiento facial<br />El reconocimiento facial requiere una fase de “training” <br...
¿Qué es Kinect?<br />Multi-array Microphone<br />
Sensores de sonido<br />22<br /><ul><li>4 channel multi-array microphone
Sincronizado con la consola para eliminar el sonido de los juegos</li></li></ul><li>¿Qué es Kinect?<br />Motorized Tilt<br />
Y la “cajanegra”<br />Software<br />Research<br />Testing<br />Data collection<br />
Prime Sense Chip<br />Xbox Hardware Engineering mejorónotablemente la calidad y velocidadbasado en los diseños de Prime Se...
Projected IR pattern<br />Source: www.ros.org<br />26<br />
Depth computation<br />Source: http://j.mp/eXsCiE<br />27<br />
Depth map<br />Source: www.insidekinect.com<br />28<br />
Salida de video en Kinect<br />30 HZ frame rate<br />57deg field-of-view<br />8-bit VGA RGB640 x 480<br />11-bit monochrom...
XBox 360 Hardware<br /><ul><li>Triple Core PowerPC 970, 3.2GHz
Hyperthreaded, 2 threads/core
500 MHz ATI graphics card
DirectX 9.5
512 MB RAM
2005 performance envelope
Must handle
real-time vision AND
a modern game</li></ul>Source: http://www.pcper.com/article.php?aid=940&type=expert<br />30<br />
¿Cómo funciona Kinect? (I)<br />
1- ¿Cómo sabe Kinectlo que hago?<br />“Xbox?!”<br />“Let’s Play!”<br />
Architectura extensible<br />33<br />Expert 1<br />fuses the hypotheses<br />Arbiter<br />Expert 2<br />Expert 3<br />prob...
Mapa de <br />profundidad<br />Sensor<br />Separación por jugador<br />basado en el fondo<br />Paso a paso para el reconoc...
Analizalasposiciones 3D del todaslaspartesidentificadas del cuerpo<br />Genera unacolección (posicion, confidence)/parte<b...
Basado en 3 modelos de “Skeleton“ <br />El proceso se realiza en:<br />Cálculo de distancia entre puntos conectados(relati...
Como definir el “skeleton”?<br />37<br />
Ejemplos<br />38<br />
KINECT FOR WINDOWS SDK<br />
KINECT FOR WINDOWS SDK<br />Presentado en el MIX11<br />Permitirá desarrollos sobre .Net Framework en C#, Visual Basic.Net...
Upcoming SlideShare
Loading in...5
×

2011 05 23 Kinect for Windows SDK

1,916

Published on

2011 05 23 Kinect for Windows SDK

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,916
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://research.microsoft.com/apps/video/default.aspx?id=139295
  • http://research.microsoft.com/en-us/projects/DryadLINQ/DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.
  • kinetic,&quot; which means to be in motion, and &quot;connect,&quot; which means it &quot;connects you to the friends and entertainment you loveNatural User InterfaceMaking Beginners Feel Like Experts
  • Color VGA video camera - This video camera aids in facial recognition and other detection features by detecting three color components: red, green and blue. Microsoft calls this an &quot;RGB camera&quot; referring to the color components it detects.Depth sensor - An infrared projector and a monochrome CMOS (complimentary metal-oxide semiconductor) sensor work together to &quot;see&quot; the room in 3-D regardless of the lighting conditions. Complementary metal–oxide–semiconductor (CMOS) (pronounced /ˈsiːmɒs/) is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits. CMOS technology is also used for several analog circuits such as image sensors, data converters, and highly integrated transceivers for many types of communicationMulti-array microphone - This is an array of four microphones that can isolate the voices of the players from the noise in the room. This allows the player to be a few feet away from the microphone and still use voice controls.What comes in the boxKinect sensor for Xbox 360Power supply cableUser&apos;s manualWi-Fi extension cableKinect Adventures gameColor VGA Motion Camera 640 x 480 pixel resolution at 30FPSDepth Camera 640 x 480 pixel resolution at 30FPSArray of 4 microphones supporting single speaker voice recognitionKinect&apos;s software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you&apos;ll be moving in. Then, Kinect detects and tracks 32 points on each player&apos;s body, mapping them to a digital reproduction of that player&apos;s body shape and skeletal structure, including facial details.http://electronics.howstuffworks.com/microsoft-kinect3.htmhttp://www.popsci.com/gadgets/article/2010-01/exclusive-inside-microsofts-project-natalKinect Software Learns from &quot;Experience&quot;Kinect&apos;s software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you&apos;ll be moving in. Then, Kinect detects and tracks 48 points on each player&apos;s body, mapping them to a digital reproduction of that player&apos;s body shape and skeletal structure, including facial details [source: Rule].In an interview with Scientific American, Alex Kipman, Microsoft&apos;s Director of Incubation for Xbox 360, explains Project Natal&apos;s approach to developing the Kinect software. Kipman explains, &quot;Every single motion of the body is an input,&quot; which creates seemingly endless combinations of actions [source: Kuchinskas]. Knowing this, developers decided not to program that seemingly endless combination into pre-established actions and reactions in the software. Instead, it would &quot;teach&quot; the system how to react based on how humans learn: by classifying the gestures of people in the real world.To start the teaching process, Kinect developers gathered massive amounts of data from motion-capture in real-life scenarios. Then, they processed that data using a machine-learning algorithm by Jamie Shotton, a researcher at Microsoft Research Cambridge in England. Ultimately, the developers were able to map the data to models representing people of different ages, body types, genders and clothing. With select data, developers were able to teach the system to classify the skeletal movements of each model, emphasizing the joints and distances between those joints. An article in Popular Science describes the four steps Kinect&apos;s &quot;brain&quot; goes through 30 times per second to read and respond to your movements [source: Duffy].The Kinect software goes a step further than just detecting and reacting to what it can &quot;see.&quot; Kinect can also distinguish players and their movements even if they&apos;re partially hidden. Kinect extrapolates what the rest of your body is doing as long as it can detect some parts of it. This allows players to jump in front of each other during a game or to stand behind pieces of furniture in the room.
  • Depth sensor. An infrared projector combined with a monochrome CMOS sensor allows Kinect to see the room in 3-D (as opposed to inferring the room from a 2-D image) under any lighting conditions.
  • a 320×240 depth stream. Depth is recovered by projecting invisible infrared (IR) dots into a room. The way the optical system works, on a hardware level, is fairly basic. A class 1 laser is projected into the room. The sensor is able to detect what&apos;s going on based on what&apos;s reflected back at it. Together, the projector and sensor create a depth map. The regular old video camera is held at a specific distance away from the 3D part of the optical system in a precise alignment, so that Kinect can blend together the depth map and RGB picture for dynamic, on-the-fly green screening.
  • RGB camera. Kinect has a video camera that delivers the three basic color components. As part of the Kinect sensor, the RGB camera helps enable facial recognition and more.
  • Four different microphones allow Kinect to figure out where the sound is coming from
  • Multiarray microphone. Kinect has a microphone that is able to locate voices by sound and extract ambient noise. The multiarray microphone enables headset-free Xbox LIVE party chat and more.
  • Microsoft software. A proprietary software layer makes the magic of Kinect possible. This layer differentiates Kinect from any other technology on the market through its ability to enable human body recognition and extract other visual noise.
  • Micron scale tolerances on large componentsManufacturing process to yield ~1 device / 1.5 seconds
  • 2011 05 23 Kinect for Windows SDK

    1. 1. Bruno Capuano<br />MVP – Visual Studio ALM<br />b.capuano@avanade.com<br />Avanade<br />www.elbruno.com<br />Kinectfor Windows SDK<br />Primavera 2011<br />Todavía no:(<br />
    2. 2. XBOX llama a MSR: Septiembre 2008<br />Necesitamos un body tracker con<br />All body motions…<br />Allagilities…<br />10x Real-time…<br />Formultipleplayers…<br />… and it has to be 3D <br />
    3. 3. Carta a los reyes magos<br />Sin calibración<br /><ul><li>Sin pose para inicio/pausa
    4. 4. Sin calibración para el fondo
    5. 5. Sin calibración para el cuerpo</li></ul>Uso mínimo de la CPU<br />Independiente de la iluminación<br />3<br />
    6. 6. body size<br />hair<br />FOV<br />body type<br />clothes<br />angle<br />pets<br />furniture<br />Pruebas: The test matrix<br />4<br />
    7. 7. MSR & xBox: Machine Learning<br />Paso 1: Recolección de información<br />El equipo visita diferentes ubicaciones y se dedica a filmar usuarios reales de Xbox<br />Hollywood motion capture studiogeneratesbillions of CG images<br />
    8. 8. J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. European Conference on Computer Vision, 2006<br />MSResearch: Reconocimiento de Objetos<br />
    9. 9. Training<br />Millones de imágenescomoreferncias-> millones de parámetros de clasificación<br />Very far from “embarrassingly parallel”<br />Nuevo algoritomopara resolver árboles de decisióndistribuidos<br />Utilizaciónmasiva de DryadLINQ<br />Disponibleparadescargar<br />Distributed Data-Parallel Computing Using a High-Level Programming Language<br />M Isard, Y Yu<br />International Conference on Management of Data (SIGMOD), July 2009<br />
    10. 10. Programmers View<br />
    11. 11. ¿Qué es Kinect?<br />
    12. 12. ¿QuéesKinect?<br />Voice Recognition<br />Una nueva forma de jugar, donde TU eres el mando<br />Gesture Recognition<br />Face Recognition<br />YouRecognition<br />
    13. 13. Opción A:<br />¿Porqué Kinect?<br />
    14. 14. ¿Porqué Kinect?<br />Opción TU:<br />
    15. 15. ¿Qué es Kinect?<br />Dispositivo que combina una cámara RGB, un sensor de profundidad y un array de micrófonos<br />Cámara RBG para el reconocimiento de los tres colores básicos<br />Sensor de Profundidad que permite “ver una habitación en 3D”<br />El array de micrófonos detecta las voces y las aisla del ruido ambiental<br />Caja negra de software que une todo y hace toda la magia<br />
    16. 16. ¿Qué es Kinect?<br />①<br />③<br />②<br />
    17. 17. Source: iFixit<br />15<br />¿Qué es Kinect?<br />
    18. 18. ¿Qué es Kinect?<br />3D Depth Sensors<br />①<br />③<br />
    19. 19. Invisible Infrared (IR) Dots <br />320x240<br />
    20. 20. ¿Qué es Kinect?<br />RGB Camera<br />②<br />
    21. 21. 19<br />¿Qué es Kinect?<br />IR laser projector<br />IR camera<br />RGB camera<br />Source: iFixit<br />
    22. 22. RGB Camera<br />Se utiliza para el reconocimiento facial<br />El reconocimiento facial requiere una fase de “training” <br />Necesita una buena iluminación<br />20<br />
    23. 23. ¿Qué es Kinect?<br />Multi-array Microphone<br />
    24. 24. Sensores de sonido<br />22<br /><ul><li>4 channel multi-array microphone
    25. 25. Sincronizado con la consola para eliminar el sonido de los juegos</li></li></ul><li>¿Qué es Kinect?<br />Motorized Tilt<br />
    26. 26. Y la “cajanegra”<br />Software<br />Research<br />Testing<br />Data collection<br />
    27. 27. Prime Sense Chip<br />Xbox Hardware Engineering mejorónotablemente la calidad y velocidadbasado en los diseños de Prime Sense<br />
    28. 28. Projected IR pattern<br />Source: www.ros.org<br />26<br />
    29. 29. Depth computation<br />Source: http://j.mp/eXsCiE<br />27<br />
    30. 30. Depth map<br />Source: www.insidekinect.com<br />28<br />
    31. 31. Salida de video en Kinect<br />30 HZ frame rate<br />57deg field-of-view<br />8-bit VGA RGB640 x 480<br />11-bit monochrome320 x 240<br />29<br />
    32. 32. XBox 360 Hardware<br /><ul><li>Triple Core PowerPC 970, 3.2GHz
    33. 33. Hyperthreaded, 2 threads/core
    34. 34. 500 MHz ATI graphics card
    35. 35. DirectX 9.5
    36. 36. 512 MB RAM
    37. 37. 2005 performance envelope
    38. 38. Must handle
    39. 39. real-time vision AND
    40. 40. a modern game</li></ul>Source: http://www.pcper.com/article.php?aid=940&type=expert<br />30<br />
    41. 41. ¿Cómo funciona Kinect? (I)<br />
    42. 42. 1- ¿Cómo sabe Kinectlo que hago?<br />“Xbox?!”<br />“Let’s Play!”<br />
    43. 43. Architectura extensible<br />33<br />Expert 1<br />fuses the hypotheses<br />Arbiter<br />Expert 2<br />Expert 3<br />probabilistic<br />Final<br />estimate<br />Raw<br />data<br />Skeleton<br />estimates<br />Sensor<br />Stateless<br />Statefull<br />
    44. 44. Mapa de <br />profundidad<br />Sensor<br />Separación por jugador<br />basado en el fondo<br />Paso a paso para el reconocimiento<br />34<br />Clasificación de <br />partes del cuerpo<br />Identificación de <br />“joints”<br />Creación de<br />“Skeleton”<br />
    45. 45. Analizalasposiciones 3D del todaslaspartesidentificadas del cuerpo<br />Genera unacolección (posicion, confidence)/parte<br />Genera múltiplesopcionesparacada parte del cuerpo<br />El trabajo lo realiza la GPU<br />35<br />Paso 1: «Body» a «Joint Positions»<br />
    46. 46. Basado en 3 modelos de “Skeleton“ <br />El proceso se realiza en:<br />Cálculo de distancia entre puntos conectados(relativos al «tamaño del cuerpo»)<br />Cercanía de los huesos con las partes del cuerpo<br />Aplica además patrones para el «smoothness»<br />36<br />Paso 2: «Joint Positions» a «Skeleton»<br />
    47. 47. Como definir el “skeleton”?<br />37<br />
    48. 48. Ejemplos<br />38<br />
    49. 49. KINECT FOR WINDOWS SDK<br />
    50. 50.
    51. 51. KINECT FOR WINDOWS SDK<br />Presentado en el MIX11<br />Permitirá desarrollos sobre .Net Framework en C#, Visual Basic.Net y C++<br />Fecha prevista de lanzamiento «primavera 2011»<br />
    52. 52. Bruno Capuano<br />MVP – Visual Studio ALM<br />b.capuano@avanade.com<br />Avanade<br />www.elbruno.com<br />Kinectfor Windows SDK<br />Primavera 2011<br />Todavía no:(<br />
    53. 53. ¿Cómo funciona Kinect? (II)<br />APÉNDICE<br />
    54. 54. Preproceso<br />44<br /><ul><li>Identificar el suelo (ground plane)
    55. 55. Aislar el fondo (aislar un sofá)
    56. 56. Identificar los jugadores</li></li></ul><li>2 “Seguidores” (trackers)<br />Seguimiento de cabeza y manos<br />Seguimiento de cuerpo<br />45<br />not exposed through SDK<br />
    57. 57. El problema del seguimiento de cuerpo<br />46<br />Classifier<br />Input<br />Depth map<br />Output<br />Body parts<br />Runs on GPU @ 320x240<br />
    58. 58. Entrenando a Kinect<br />47<br />Comienza desde datos ground-truth<br />Alineados con partes del cuerpo<br />Es necesario entrenar a Kinect para trabajar con<br />Poses<br />Posición por escena<br />Tamaño y formas del cuerpo<br />
    59. 59. 48<br />Entrenando a Kinect<br />Use synthetic data (3D avatar model)<br /><ul><li>Inject noise</li></li></ul><li>Motion Capture: <br /><ul><li>Unrealistic environments
    60. 60. Unrealistic clothing
    61. 61. Low throughput</li></ul>49<br />Entrenando a Kinect<br />
    62. 62. 50<br />Entrenando a Kinect<br />Manual Tagging:<br /><ul><li>Requires training many people
    63. 63. Potentially expensive
    64. 64. Tagging tool influences biases in data.
    65. 65. Quality control is an issue
    66. 66. 1000 hrs @ 20 contractors ~= 20 years</li></li></ul><li>51<br />Entrenando a Kinect<br />Amazon Mechanical Turk:<br /><ul><li>Build web based tool
    67. 67. Tagging tool is 2D only
    68. 68. Quality control can be done with redundant HITS
    69. 69. 2000 frames/hr @ $0.04/HIT -> 6 yrs @ $80/hr</li></li></ul><li>Clasificando pixel a pixel<br />Compute P(ci|wi)<br />pixels i = (x, y)<br />body part ci<br />image window wi<br />Learn classifier P(ci|wi) from training data<br />randomized decision forests<br />example image windows<br />window moves with classifier<br />52<br />
    70. 70. Features<br />53<br />𝑓𝜃𝐼, 𝑥=𝑑𝐼𝑥+𝑢𝑑𝐼𝑥-𝑑𝐼𝑥+𝑣𝑑𝐼𝑥<br /> <br />𝑑𝐼𝑥<br /> <br />-- depth of pixel x in image I<br />-- parameter describing offetsu and v<br />𝜃 = (u,v)<br /> <br />

    ×