• Kinect (codenamed in development as Project Natal) is
a motion sensing input device by Microsoft for the Xbox
360 video game console and Windows PCs.
• Based around a webcam-style add-on peripheral for the
Xbox 360 console, it enables users to control and
interact with the Xbox 360 without the need to touch
a game controller, through a natural user interface using
gestures and spoken commands.
• Kinect was launched in North America on November 4, 2010, in
Europe on November 10, 2010, in Australia, New Zealand and
Singapore on November 18, 2010, and in Japan on November 20,
• The Kinect claimed the Guinness World Record of being the
"fastest selling consumer electronics device" after selling a
total of 8 million units in its first 60 days. 24 million units of
the Kinect sensor had been shipped as of January 2012.
• Microsoft released Kinect software development kit for
Windows 7 on June 16, 2011. This SDK was meant to allow
developers to write Kinecting apps in C++/CLI, C#, or Visual
The Kinect uses structured light and machine learning –
Inferring body position is a two-stage process: first compute a
depth map (using structured light), then infer body position (using
The results are great!
The system uses many college-level math concepts, and
demonstrates the remarkable advances in computer vision in the
last 20 years.
Structured light general principle:
project a known pattern onto the
scene and infer depth from the
deformation of that pattern
The Kinect uses infrared
laser light, with a speckle
The depth map is constructed by analyzing a speckle pattern of infrared laser light.
The Kinect uses an infrared projector and sensor; it does not use its RGB camera for depth
The technique of analyzing a known pattern is called structured light.
The Kinect combines structured light with two classic computer vision techniques: depth from
focus, and depth from stereo.
DEPTH FROM FOCUS
Depth from focus uses the principle that stuff that is more blurry is further
The Kinect dramatically improves the accuracy of traditional depth from
The Kinect uses a special (“astigmatic”) lens with different focal length in xand y- directions
A projected circle then becomes an ellipse whose orientation depends on
DEPTH FROM STEREO
Depth from stereo uses parallax.
If you look at the scene from another angle, stuff that is close gets shifted
to the side more than stuff that is far away.
The Kinect analyzes the shift of the speckle pattern by projecting from
one location and observing from another.
INFERRING BODY POSTION
Body parts are inferred using a randomized decision forest, learned
from over 1 million training examples
Stage 2 starts with 100,000 depth images with
known skeletons (from a motion capture system)
At this point, both the Kinect’s hardware — its camera
and IR-light projector — and its firmware (sometimes
called “middleware”) are operating.
The Kinect has an on-board processor which is using
algorithms to process the data to render the threedimensional image.
The middleware also can recognize people:
distinguishing human body parts, joints and movements,
as well as distinguishing individual human faces from one
another. When you step in front of it, the camera “knows”
who you are.
THE BEST PART OF KINECT
MOVE OR THE
Kinect is something different.
It’s communal, continuous and general: a Natural User
Interface (or NUI) for multimedia, rather than a GUI for
But it takes a lot of tech to make an interface like that
come together seamlessly and “naturally.”
How Microsoft Kinect Works:
How Motion Detection Works in Xbox Kinect:
Kinect SDK Basic
AND OF COURSE GOOGLE AND WIKIPEDIA!!!
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.