Kinect Sensor(s) “The Kinect sensor is a horizontal bar connected to a small base with a motorized pivot and is designed to be positioned lengthwise above or below the video display. The device features an "RGB camera, depth sensor and multi-array microphone running proprietary software", which provide full-body 3D motion capture, facial recognition and voice recognition capabilities. The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures video data in 3D under any ambient light conditions. The sensing range of the depth sensor is adjustable“ Source: http://en.wikipedia.org/wiki/Kinect
Sensors Depth magic chip (data stream) Infrared Laser Projector Image chip RGB Camera (video stream) Infrared Camera Multi-array mic Motorised tilt
Sensors, #1 RGB Image chip 480 @ 30 frames/sec ๏ The RBG camera can be used for recording 640 video and assist with Horizontal ﬁeld of view: 57º Vertical ﬁeld of view: 43º Physical tilt range: 27º tasks such as facial recognition.
Sensors, #2 Depth๏ Project a pattern of dots with a near infrared laser over the scene. A detector establishes the parallax shift Depth magic chip of the dot pattern for each pixel. 480 12 bits (Data Array) @ 30 frames/sec 640 Range: 1.2m -3.5 m
Depth. Uh? Depth computed by comparing distance between Image 1 (uniquely identiﬁable) dot in image 1 and image 2. The more distant, the more depth. Magic Algorithm generates the pattern of dots that will be sent by the Infrared projector. In a sense a virtual image. Image 2 Infrared camera Depth magic chip Stereo Image from: http://gigantico.squarespace.com/ 336554365346/2007/6/21/virtual-reality-part-1.html
Sensors, #3 Infrared Image chip 480 @ 30 frames/sec ๏ Images from the random dot pattern projected by 640 the near-infrared laser. ๏ Switch off other infrared sources (hallogen, sun light) to see them.
Sensors, #4 Voice 16 bits @ 16kHz Not covered in this talk
BW + RGB images ๏ By default, provides depth information as an RGB image. Each color represents a layer of depth. Red = close, blue = far away ๏ Good for Flash. Flash is good with picture but doesn’t excel at looping through numbers.
Demo ๏ Modiﬁed as3 server • video2rgba: rbg to rgba mapping (preﬁxing with 0xFF for alpha). • depth2heatmap : depth to rgba mapping. • depth2grayscale : depth to grayscale mapping, a scale from 0...2048 is reduced to one of 0...255. •
Resolution vs meters ๏ The depth data provided by kinect follow some logarithmic curve : • High resolution in the middle, low res close and far away from the camera ๏ A linear estimate, in centimeters, can be computed with: 0.1236 * tan(rawDisparity / 2842.5 + 1.1863) http://openkinect.org/w/index.php?title=Imaging_Information
Hands tracking ๏ TuioKinect tracks simple hand gestures using the Kinect controller and sends control data based on the TUIO protocol. This allows the rapid creation of gesture enabled applications
Using TUIO from AS https://github.com/silviopaganini/openKinect-as3-experiments Documentation, Parameters Note: getting 0 for z or Z in current implementation. http://nuigroup.com/?ACT=28&ﬁd=33&aid=703_FKSuKMcUkTAJrwBGk2Qv
Body-Pose ๏ What MS uses for his games is Machine Learning “The idea was that we would teach the computer with lots of different people of lots of different shapes and sizes in lots of different poses and the computer will learn how to distinguish one part of your body from another part,” he said. Since the Kinect camera includes depth information, it can distinguish between big people a long way away and small people up close.” http://blogs.wsj.com/tech-europe/2010/11/08/key- kinect-technology-devised-in-cambridge-lab/ For publicly available methods: http://openkinect.org/wiki/Research_Material