Kinect v1+Processing workshot fabcafe_taipei

吳冠穎 MAO
⽊木天寮互動設計有限公司負責⼈人
交⼤大建築研究所博⼠士⽣生
mao@mutienliao.com

Make Things See
Kinect + Processing Workshop
MAO WU
Fabcafe.Taipe
20141230

Meet Kinect
eet the Kinect
r Windows
ensor and SDK
inect for Windows sensor and SDK
de the ears and eyes of your application.
want to keep their capabilities in mind as
esign your application.
Introduction > Meet the Kinect for Windows Sensor and SDK
RGB Camera
Infrared projector and sensor
Motorised till
Microphone
array
“…intended for non-commercial use to enable
experimentation in the world of natural user
interface experiences”

What Kinect sees
Kinect for Windows is versatile, and can see
people holistically, not just smaller hand
gestures. Six people can be tracked, including
two whole skeletons. The sensor has an RGB
(red-green-blue) camera for color video, and
an infrared emitter and camera that measure
depth. The measurements for depth are
returned in millimeters.
The Kinect for Windows sensor enables a wide
variety of interactions, but any sensor has
“sweet spots” and limitations. With this in mind,
we deﬁned its focus and limits as follows:
Physical limits – The actual capabilities of the
sensor and what it can see.
Sweet spots – Areas where people experience
optimal interactions, given that they’ll often
have a large range of movement and need to
be tracked with their arms or legs extended.
Kinect for Window Human interface Guidelines v1.8
Kinect for Windows | Human Interface Guidelines v1.8 7
• Physical limits: 0.4m to 3m
• Sweet spot: 0.8m to 2.5m
What Kinect for
Windows sees
Kinect for Windows is versatile, and can see
people holistically, not just smaller hand
gestures. Six people can be tracked, including
two whole skeletons. The sensor has an RGB
(red-green-blue) camera for color video, and
an infrared emitter and camera that measure
depth. The measurements for depth are
returned in millimeters.
The Kinect for Windows sensor enables a wide
variety of interactions, but any sensor has
“sweet spots” and limitations. With this in mind,
we defined its focus and limits as follows:
Physical limits – The actual capabilities of the
sensor and what it can see.
Sweet spots – Areas where people experience
optimal interactions, given that they’ll often
have a large range of movement and need to
be tracked with their arms or legs extended.
27°
43.5°
27°
57.5°
0.4m/1.3ft
sweet spot
physical limits
3m/9.8ft
0.8m/2.6ft
2.5m/8.2ft
sweet spot
physical limits0.8m/2.6ft
4m/13.1ft
1.2m/4ft
3.5m/11.5ft
Near mode depth
ranges
• Physical limits: 0.8m to 4m
(default)
Extended depth (beyond
4m) can also be retrieved but
skeleton and player tracking
get noisier the further you
get away, and therefore may
be unreliable.
• Sweet spot: 1.2m to 3.5m
Default mode depth
ranges
• Horizontal: 57.5 degrees
• Vertical: 43.5 degrees, with
-27 to +27 degree tilt
range up and down
Angle of vision (depth
and RGB)
Note that Near mode is an actual
setting for Kinect for Windows, and is
different from the various ranges we
detail in Interaction Ranges, later in
this document.
Introduction > Meet the Kinect for Windows Sensor and SDK

Depth Sensor Comparison
Both Microsoft Kinect and ASUS Xtion
(Live) / PrimeSense Carmine sensors are
based on the same PrimeSense infra-red
technology. So all basic characteristics
critical for full-body motion capture are
generally the same. But there are certain
differences that you can take into
account:
http://wiki.ipisoft.com/Depth_Sensors_Comparison
Device Pros Cons
Microsoft Kinect
▪ High quality of device drivers
▪ Stable work with various hardware
models
▪ Has motor that can be controlled
▪ Bigger size (12" x 3" x 2.5"
against 7" x 2" x 1.5")
▪ Higher weight (3.0 lb agains
0.5 lb)
▪ Require ACDC power supply
▪ Lower RGB image quality in
comparison with MS Kinect
ASUS Xtion /
PrimeSense
Carmine
▪ More compact ( 7" x 2" x 1.5" against
12" x 3" x 2.5")
▪ Lighter weight (0.5 lb agains 3.0 lb)
▪ Does not require power supply
except USB
▪ Better RGB image quality
▪ Less popular device
▪ Lower drivers quality
▪ Does not work with some
USB controllers (especially
USB 3.0)
▪ No motor, allow only manual
positioning
MS Kinect for Windows ASUS Xtion Live ASUS Xtion PrimeSense Camine 1.08
• ASUS Xtion Live or PrimeSense Carmine is recommended because it includes color sensor as well. Color
image is currently not used for tracking, but eventually will. Also it helps to operate the system

Simple-Openni
OpenNi library for Processing
https://code.google.com/p/simple-openni/wiki/Installation
Processing
• Open your processing (> 2.0)
• Go to the menu:  
Sketch->Import Liberary…-> Add Library…
• Select and install SimpleOpenNI
◉ Windows Need
Install Kinect SDK
• Download KinectSDK
• Start the Kinect SDK Install 
If everything worked out, you should see the plugged
camera in your Device Manager(under 'Kinect for
Windows').
In case you have an error when you startup a
processing sketch with SimpleOpenNI, try to install the
Runtime Libraries from Microsoft.
Install SimpleOpenNi

Depth Image Interaction
In this section, you will learn how to play
with the pixels of depth image and
implement some simple interactive
codes.

Simple-Openni
OpenNi library for Processing Reference Index
Context
The context is the top-level object that encapsulates all the camera and image functionality. The context
is typically declared globally and instantiated within setup(). There is an optional flag argument for
forcing single or multi-threading, but in our experience we haven't found a difference between the two.
SimpleOpenNI context = new SimpleOpenNI(this)

SimpleOpenNI context = new SimpleOpenNI(this, SimpleOpenNI.RUN_MODE_SINGLE_THREADED)

SimpleOpenNI context = new SimpleOpenNI(this, SimpleOpenNI.RUN_MODE_MULTI_THREADED)
For each frame in Processing, the context needs to be updated with the most recent data from the
Kinect.

context.update()
The image drawn by the context defaults to showing the world from its point of view, so when facing
the kinect and looking at the resulting image, your movements are not mirrored. It is possible to easily
change the configuration so that the Kinect image acts as a mirror. The setMirror() method controls
this. Below the first line turns on mirroring and the second turns it off.

context.setMirror(true)

context.setMirror(false)

Simple-Openni
Image
The different cameras provide different data and functionality.
RGB
The RGB camera is the simplest camera and does no more than a standard webcam. It should be noted that it cannot be used
when the IR (not depth) image is enabled. It ﬁrst needs to be enabled within setup().

context.enableRGB()
To create a window the same size as the RGB camera image, use rgbHeight() and rgbWidth(). The context needs to be
instantiated and RGB enabled before these methods can be called.

size(context.rgbWidth(), context.rgbHeight())

To draw what the RGB camera sees, the current frame is drawn within an image() in draw().

image(context.rgbImage(), 0, 0)
Reference Index

Simple-Openni
Depth
The depth image is calculated by the IR camera and the pseudorandom array of IR points projected onto the scene. It ﬁrst
needs to be enabled within setup().

context.enableDepth()
To create a window the same size as the depth image, use depthHeight() and dpethWidth(). The context needs to be
instantiated and depth enabled before these methods can be called.

size(context.depthWidth(), context.depthHeight())

To draw a grayscale image of the depth values, the current frame is drawn within an image() in draw().

image(context.depthImage(), 0, 0)
The default colour of the drawn depth image is gray, but the colour can be changed. For instance, the below code shades the
image in blue instead of gray.

context.setDepthImageColor(100, 150, 200)

Like Processing, there are two colour modes for the depth image. The default is RGB, but can be switched to HSB.

context.setDepthImageColorMode(0) // for RGB

context.setDepthImageColorMode(1) // for HSB

An array containing all of the distances in millimetres can be requested with depthMap()

int[] dmap = context.depthMap()

The size of the depth map can also be requested.

int dsize = context.depthMapSize()

To draw the depth image, the current frame is drawn within an image() in draw().

image(context.depthImage(), 0, 0)
Reference Index

Simple-Openni
IR
The IR image is what the IR camera sees and cannot be enabled while the RGB image is also enabled or the RGB image will
not appear. It ﬁrst needs to be enabled within setup().

context.enableIR()

To create a window the same size as theIR camera image, use irHeight() and irWidth(). The context needs to be instantiated
and enabled before these methods can be called.

size(context.irWidth(), context.irHeight())

To draw what the IR camera sees, the current frame is drawn within an image() in draw().

image(context.irImage(), 0, 0)

The timestamp returns the number of frames that have passed since the ir was enabled.

context.depthMapTimeStamp()
Reference Index

that will contain all the values of the array and the int that comes before it as a
label telling us that everything that goes in this box must be an integer.
So, we have an array of integers. How can this box full of numbers store the
same kind of information we’ve so far seen in the pixels of an image? The Ki-
nect is, after all, a camera. The data that comes from it is two-dimensional,
representing all the depth values in its rectangular field of view, whereas an
array is one-dimensional, it can only store a single stack of numbers. How do
you represent an image as a box full of numbers?
Here’s how. Start with the pixel in the top-leftmost corner of the image. Put
it in the box. Then, moving to the right along the top row of pixels, put each
pixel into the box on top of the previous ones. When you get to the end of
the row, jump back to left side of the image, move down one row, and repeat
the procedure, continuing to stick the pixels from the second row on top of
the ever-growing stack you began in the first row. Continue this procedure for
each row of pixels in the image until you reach the very last pixel in the bottom
right. Now, instead of a rectangular image, you’ll have a single stack of pixels:
a one-dimensional array. All the pixels from each row will be stacked together,
and the last pixel from each row will be right in front of the first pixel from the
next row, as Figure 2-12 shows.
Pixels in the image
Pixels in an array
row 1
row 1 row 2 row 3
row 3
row 2
1
9
17
25
2
10
18
26
3
11
19
27
4
12
20
28
5
13
21
29
6
14
22
30
7
15
23
31
8
16
24
32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Figure 2-12. Pixels in a two-dimensional image get stored as a flat array. Understanding
how to split this array back into rows is key to processing images.
bject
Figure 2-14. Our red circle following my outstretched fist.
3 Next, let’s look at our two for loops. We know from our pseudocode that
we want to go through every row in the image, and within every row we
want to look at every point in that row. How did we translate that into
code?
What we’ve got here is two for loops, one inside the other. The outer
one increments a variable y from 0 up to 479. We know that the depth
image from the Kinect is 480 pixels tall. In other words, it consists of 480
rows of pixels. This outer loop will run once for each one of those rows,
setting y to the number of the current row (starting at 0).
4 This line kicks off a for loop that does almost the same thing, but with
a different variable, x, and a different constraint, 640. This inner loop will
run once per row. We want it to cover every pixel in the row. Since the
depth image from the Kinect is 640 pixels wide, we know that it’ll have to
run 640 times in order to do so.
The code inside of this inner loop, then, will run once per pixel in the

Gesture & Skeleton
In this section, you will learn how to play
with the the default gesture control and
play something with skeleton points.

GESTURE_WAVE
GESTURE_HAND_RAISE
GESTURE_CLICK
Gesture > Gesture Interaction Design
ign for variability
nput
evious experience and expectations
w they interact with your application.
mind that one person might not perform
e the same way as someone else.
Gesture interpretation
Simply “asking users to wave”
doesn’t guarantee the same
motion.
They might wave:
• From their wrist
• From their elbow
• With their whole arm
• With an open hand
moving from left to right
• By moving their fingers
up and down together
Simple-Openni
Kinect for Windows | Human Interface Guidelines v1.8 22
Basics
In this document we use the term gesture
broadly to mean any form of movement that
can be used as an input or interaction to
control or influence an application. Gestures
can take many forms, from simply using your
hand to target something on the screen, to
specific, learned patterns of movement, to
long stretches of continuous movement using
the whole body.
Gesture is an exciting input method to
explore, but it also presents some intriguing
challenges. Following are a few examples of
commonly used gesture types.
Gesture > Basics
Hand Gesture

Simple-Openni
Hand
To start capture hand gesture we need make it enabled within setup().

context.enableHand()

Then chose which gesture we need.

context.startGesture(SimpleOpenNI.GESTURE_CLICK)
context.startGesture(SimpleOpenNI.GESTURE_WAVE)
context.startGesture(SimpleOpenNI.GESTURE_HAND_RAISE)

Note:

Any skeleton data from SimpleOpenNi need to be convert form real world coordination to projective:

context.convertRealWorldToProjective(realworld_pos,converted_pos)
Hand Gesture

Skeleton Tracking
This tutorial will explain how to track human
skeletons using the Kinect. The OpenNI library
can identify the position of key joints on the
human body such as the hands, elbows,
knees, head and so on. These points form a
representation we call the 'skeleton'..

Simple-Openni
User
To start capture user information we need enable depth and user within setup().

context.enableUser()

Get information of users,

context.getUsers();
Check if the user is tracking,

context.isTrackingSkeleton(userid)
Get user’s center

context.getCoM(userid,center_pos)
Detecting New Users and Losing Users
// when a person ('user') enters the field of view
void onNewUser(int userId)
{
println("New User Detected - userId: " + userId);

// start pose detection
context.startPoseDetection("Psi", userId);
}

// when a person ('user') leaves the field of view
void onLostUser(int userId)
{
println("User Lost - userId: " + userId);
}
Skeleton

Simple-Openni
Drawing the Skeleton
Now we will use our drawSkeleton() function to draw lines between joints.

Each joint has an identiﬁer (just a reference to a simple integer) and there are 15 joints in all. They are:

SimpleOpenNI.SKEL_HEAD

SimpleOpenNI.SKEL_NECK

SimpleOpenNI.SKEL_LEFT_SHOULDER

SimpleOpenNI.SKEL_LEFT_ELBOW

SimpleOpenNI.SKEL_LEFT_HAND

SimpleOpenNI.SKEL_RIGHT_SHOULDER

SimpleOpenNI.SKEL_RIGHT_ELBOW

SimpleOpenNI.SKEL_RIGHT_HAND

SimpleOpenNI.SKEL_TORSO

SimpleOpenNI.SKEL_LEFT_HIP

SimpleOpenNI.SKEL_LEFT_KNEE

SimpleOpenNI.SKEL_LEFT_FOOT

SimpleOpenNI.SKEL_RIGHT_HIP

SimpleOpenNI.SKEL_RIGHT_KNEE

SimpleOpenNI.SKEL_RIGHT_FOOT

Draw line between skeletons

context.drawLimb(userId, SimpleOpenNI.SKEL_HEAD, SimpleOpenNI.SKEL_NECK);
Get skeleton position

context.getJointPositionSkeleton(userId,SimpleOpenNI.SKEL_LEFT_HAND, pos);
Skeleton

Kinect v1+Processing workshot fabcafe_taipei

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kinect v1+Processing workshot fabcafe_taipei

Similar to Kinect v1+Processing workshot fabcafe_taipei (20)

Recently uploaded

Recently uploaded (9)

Kinect v1+Processing workshot fabcafe_taipei