空間 x 科技 x 藝術
吳 冠 穎 MAO
⽊木天寮互動設計有限公司 負責⼈人
交⼤大建築研究所 博⼠士⽣生
mao@mutienliao.com
Make Things See
Kinect + Processing Workshop
MAO WU
Fabcafe.Taipe
20141230
Meet Kinect
eet the Kinect
r Windows
ensor and SDK
inect for Windows sensor and SDK
de the ears and eyes of your application.
want to keep their capabilities in mind as
esign your application.
Introduction > Meet the Kinect for Windows Sensor and SDK
RGB Camera
Infrared projector and sensor
Motorised till
Microphone
array
“…intended for non-commercial use to enable
experimentation in the world of natural user
interface experiences”
What Kinect sees
Kinect for Windows is versatile, and can see
people holistically, not just smaller hand
gestures. Six people can be tracked, including
two whole skeletons. The sensor has an RGB
(red-green-blue) camera for color video, and
an infrared emitter and camera that measure
depth. The measurements for depth are
returned in millimeters.
The Kinect for Windows sensor enables a wide
variety of interactions, but any sensor has
“sweet spots” and limitations. With this in mind,
we defined its focus and limits as follows:
Physical limits – The actual capabilities of the
sensor and what it can see.
Sweet spots – Areas where people experience
optimal interactions, given that they’ll often
have a large range of movement and need to
be tracked with their arms or legs extended.
Kinect for Window Human interface Guidelines v1.8
Kinect for Windows | Human Interface Guidelines v1.8 7
• Physical limits: 0.4m to 3m
• Sweet spot: 0.8m to 2.5m
What Kinect for
Windows sees
Kinect for Windows is versatile, and can see
people holistically, not just smaller hand
gestures. Six people can be tracked, including
two whole skeletons. The sensor has an RGB
(red-green-blue) camera for color video, and
an infrared emitter and camera that measure
depth. The measurements for depth are
returned in millimeters.
The Kinect for Windows sensor enables a wide
variety of interactions, but any sensor has
“sweet spots” and limitations. With this in mind,
we defined its focus and limits as follows:
Physical limits – The actual capabilities of the
sensor and what it can see.
Sweet spots – Areas where people experience
optimal interactions, given that they’ll often
have a large range of movement and need to
be tracked with their arms or legs extended.
27°
43.5°
27°
57.5°
0.4m/1.3ft
sweet spot
physical limits
3m/9.8ft
0.8m/2.6ft
2.5m/8.2ft
sweet spot
physical limits0.8m/2.6ft
4m/13.1ft
1.2m/4ft
3.5m/11.5ft
Near mode depth
ranges
• Physical limits: 0.8m to 4m
(default)
Extended depth (beyond
4m) can also be retrieved but
skeleton and player tracking
get noisier the further you
get away, and therefore may
be unreliable.
• Sweet spot: 1.2m to 3.5m
Default mode depth
ranges
• Horizontal: 57.5 degrees
• Vertical: 43.5 degrees, with
-27 to +27 degree tilt
range up and down
Angle of vision (depth
and RGB)
Note that Near mode is an actual
setting for Kinect for Windows, and is
different from the various ranges we
detail in Interaction Ranges, later in
this document.
Introduction > Meet the Kinect for Windows Sensor and SDK
Depth Sensor Comparison
Both Microsoft Kinect and ASUS Xtion
(Live) / PrimeSense Carmine sensors are
based on the same PrimeSense infra-red
technology. So all basic characteristics
critical for full-body motion capture are
generally the same. But there are certain
differences that you can take into
account:
http://wiki.ipisoft.com/Depth_Sensors_Comparison
Device Pros Cons
Microsoft Kinect
▪ High quality of device drivers
▪ Stable work with various hardware
models
▪ Has motor that can be controlled
▪ Bigger size (12" x 3" x 2.5"
against 7" x 2" x 1.5")
▪ Higher weight (3.0 lb agains
0.5 lb)
▪ Require ACDC power supply
▪ Lower RGB image quality in
comparison with MS Kinect
ASUS Xtion /
PrimeSense
Carmine
▪ More compact ( 7" x 2" x 1.5" against
12" x 3" x 2.5")
▪ Lighter weight (0.5 lb agains 3.0 lb)
▪ Does not require power supply
except USB
▪ Better RGB image quality
▪ Less popular device
▪ Lower drivers quality
▪ Does not work with some
USB controllers (especially
USB 3.0)
▪ No motor, allow only manual
positioning
MS Kinect for Windows ASUS Xtion Live ASUS Xtion PrimeSense Camine 1.08
• ASUS Xtion Live or PrimeSense Carmine is recommended because it includes color sensor as well. Color
image is currently not used for tracking, but eventually will. Also it helps to operate the system
Simple-Openni
OpenNi library for Processing
https://code.google.com/p/simple-openni/wiki/Installation
Processing
• Open your processing (> 2.0)
• Go to the menu: 

Sketch->Import Liberary…-> Add Library…
• Select and install SimpleOpenNI
◉ Windows Need
Install Kinect SDK
• Download KinectSDK
• Start the Kinect SDK Install

If everything worked out, you should see the plugged
camera in your Device Manager(under 'Kinect for
Windows').
In case you have an error when you startup a
processing sketch with SimpleOpenNI, try to install the
Runtime Libraries from Microsoft.
Install SimpleOpenNi
Depth Image Interaction
In this section, you will learn how to play
with the pixels of depth image and
implement some simple interactive
codes.
Simple-Openni
OpenNi library for Processing Reference Index
Context
The context is the top-level object that encapsulates all the camera and image functionality. The context
is typically declared globally and instantiated within setup(). There is an optional flag argument for
forcing single or multi-threading, but in our experience we haven't found a difference between the two.
SimpleOpenNI context = new SimpleOpenNI(this)

SimpleOpenNI context = new SimpleOpenNI(this, SimpleOpenNI.RUN_MODE_SINGLE_THREADED)

SimpleOpenNI context = new SimpleOpenNI(this, SimpleOpenNI.RUN_MODE_MULTI_THREADED)
For each frame in Processing, the context needs to be updated with the most recent data from the
Kinect.

context.update()
The image drawn by the context defaults to showing the world from its point of view, so when facing
the kinect and looking at the resulting image, your movements are not mirrored. It is possible to easily
change the configuration so that the Kinect image acts as a mirror. The setMirror() method controls
this. Below the first line turns on mirroring and the second turns it off.

context.setMirror(true)

context.setMirror(false)
Simple-Openni
OpenNi library for Processing
Image
The different cameras provide different data and functionality.
RGB
The RGB camera is the simplest camera and does no more than a standard webcam. It should be noted that it cannot be used
when the IR (not depth) image is enabled. It first needs to be enabled within setup().

context.enableRGB()
To create a window the same size as the RGB camera image, use rgbHeight() and rgbWidth(). The context needs to be
instantiated and RGB enabled before these methods can be called.

size(context.rgbWidth(), context.rgbHeight())

To draw what the RGB camera sees, the current frame is drawn within an image() in draw().

image(context.rgbImage(), 0, 0)
Reference Index
Simple-Openni
OpenNi library for Processing
Depth
The depth image is calculated by the IR camera and the pseudorandom array of IR points projected onto the scene. It first
needs to be enabled within setup().

context.enableDepth()
To create a window the same size as the depth image, use depthHeight() and dpethWidth(). The context needs to be
instantiated and depth enabled before these methods can be called.

size(context.depthWidth(), context.depthHeight())

To draw a grayscale image of the depth values, the current frame is drawn within an image() in draw().

image(context.depthImage(), 0, 0)
The default colour of the drawn depth image is gray, but the colour can be changed. For instance, the below code shades the
image in blue instead of gray.

context.setDepthImageColor(100, 150, 200)

Like Processing, there are two colour modes for the depth image. The default is RGB, but can be switched to HSB.

context.setDepthImageColorMode(0) // for RGB

context.setDepthImageColorMode(1) // for HSB

An array containing all of the distances in millimetres can be requested with depthMap()

int[] dmap = context.depthMap()

The size of the depth map can also be requested.

int dsize = context.depthMapSize()

To draw the depth image, the current frame is drawn within an image() in draw().

image(context.depthImage(), 0, 0)
Reference Index
Simple-Openni
OpenNi library for Processing
IR
The IR image is what the IR camera sees and cannot be enabled while the RGB image is also enabled or the RGB image will
not appear. It first needs to be enabled within setup().

context.enableIR()

To create a window the same size as theIR camera image, use irHeight() and irWidth(). The context needs to be instantiated
and enabled before these methods can be called.

size(context.irWidth(), context.irHeight())

To draw what the IR camera sees, the current frame is drawn within an image() in draw().

image(context.irImage(), 0, 0)

The timestamp returns the number of frames that have passed since the ir was enabled.

context.depthMapTimeStamp()
Reference Index
that will contain all the values of the array and the int that comes before it as a
label telling us that everything that goes in this box must be an integer.
So, we have an array of integers. How can this box full of numbers store the
same kind of information we’ve so far seen in the pixels of an image? The Ki-
nect is, after all, a camera. The data that comes from it is two-dimensional,
representing all the depth values in its rectangular field of view, whereas an
array is one-dimensional, it can only store a single stack of numbers. How do
you represent an image as a box full of numbers?
Here’s how. Start with the pixel in the top-leftmost corner of the image. Put
it in the box. Then, moving to the right along the top row of pixels, put each
pixel into the box on top of the previous ones. When you get to the end of
the row, jump back to left side of the image, move down one row, and repeat
the procedure, continuing to stick the pixels from the second row on top of
the ever-growing stack you began in the first row. Continue this procedure for
each row of pixels in the image until you reach the very last pixel in the bottom
right. Now, instead of a rectangular image, you’ll have a single stack of pixels:
a one-dimensional array. All the pixels from each row will be stacked together,
and the last pixel from each row will be right in front of the first pixel from the
next row, as Figure 2-12 shows.
Pixels in the image
Pixels in an array
row 1
row 1 row 2 row 3
row 3
row 2
1
9
17
25
2
10
18
26
3
11
19
27
4
12
20
28
5
13
21
29
6
14
22
30
7
15
23
31
8
16
24
32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Figure 2-12. Pixels in a two-dimensional image get stored as a flat array. Understanding
how to split this array back into rows is key to processing images.
bject
Figure 2-14. Our red circle following my outstretched fist.
3 Next, let’s look at our two for loops. We know from our pseudocode that
we want to go through every row in the image, and within every row we
want to look at every point in that row. How did we translate that into
code?
What we’ve got here is two for loops, one inside the other. The outer
one increments a variable y from 0 up to 479. We know that the depth
image from the Kinect is 480 pixels tall. In other words, it consists of 480
rows of pixels. This outer loop will run once for each one of those rows,
setting y to the number of the current row (starting at 0).
4 This line kicks off a for loop that does almost the same thing, but with
a different variable, x, and a different constraint, 640. This inner loop will
run once per row. We want it to cover every pixel in the row. Since the
depth image from the Kinect is 640 pixels wide, we know that it’ll have to
run 640 times in order to do so.
The code inside of this inner loop, then, will run once per pixel in the
Gesture & Skeleton
In this section, you will learn how to play
with the the default gesture control and
play something with skeleton points.
GESTURE_WAVE
GESTURE_HAND_RAISE
GESTURE_CLICK
Gesture > Gesture Interaction Design
ign for variability
nput
evious experience and expectations
w they interact with your application.
mind that one person might not perform
e the same way as someone else.
Gesture interpretation
Simply “asking users to wave”
doesn’t guarantee the same
motion.
They might wave:
• From their wrist
• From their elbow
• With their whole arm
• With an open hand
moving from left to right
• By moving their fingers
up and down together
Simple-Openni
OpenNi library for Processing
Kinect for Windows | Human Interface Guidelines v1.8 22
Basics
In this document we use the term gesture
broadly to mean any form of movement that
can be used as an input or interaction to
control or influence an application. Gestures
can take many forms, from simply using your
hand to target something on the screen, to
specific, learned patterns of movement, to
long stretches of continuous movement using
the whole body.
Gesture is an exciting input method to
explore, but it also presents some intriguing
challenges. Following are a few examples of
commonly used gesture types.
Gesture > Basics
Hand Gesture
Simple-Openni
OpenNi library for Processing
Hand
To start capture hand gesture we need make it enabled within setup().

context.enableHand()

Then chose which gesture we need.

context.startGesture(SimpleOpenNI.GESTURE_CLICK)
context.startGesture(SimpleOpenNI.GESTURE_WAVE)
context.startGesture(SimpleOpenNI.GESTURE_HAND_RAISE)

Note:

Any skeleton data from SimpleOpenNi need to be convert form real world coordination to projective:

context.convertRealWorldToProjective(realworld_pos,converted_pos)
Hand Gesture
Skeleton Tracking
This tutorial will explain how to track human
skeletons using the Kinect. The OpenNI library
can identify the position of key joints on the
human body such as the hands, elbows,
knees, head and so on. These points form a
representation we call the 'skeleton'..
Simple-Openni
OpenNi library for Processing
User
To start capture user information we need enable depth and user within setup().

context.enableUser()

Get information of users, 

context.getUsers();
Check if the user is tracking, 

context.isTrackingSkeleton(userid)
Get user’s center

context.getCoM(userid,center_pos)
Detecting New Users and Losing Users
// when a person ('user') enters the field of view
void onNewUser(int userId)
{
println("New User Detected - userId: " + userId);
 
// start pose detection
context.startPoseDetection("Psi", userId);
}
 
// when a person ('user') leaves the field of view
void onLostUser(int userId)
{
println("User Lost - userId: " + userId);
}
Skeleton
Simple-Openni
OpenNi library for Processing
Drawing the Skeleton
Now we will use our drawSkeleton() function to draw lines between joints.

Each joint has an identifier (just a reference to a simple integer) and there are 15 joints in all. They are:

SimpleOpenNI.SKEL_HEAD

SimpleOpenNI.SKEL_NECK

SimpleOpenNI.SKEL_LEFT_SHOULDER

SimpleOpenNI.SKEL_LEFT_ELBOW

SimpleOpenNI.SKEL_LEFT_HAND

SimpleOpenNI.SKEL_RIGHT_SHOULDER

SimpleOpenNI.SKEL_RIGHT_ELBOW

SimpleOpenNI.SKEL_RIGHT_HAND

SimpleOpenNI.SKEL_TORSO

SimpleOpenNI.SKEL_LEFT_HIP

SimpleOpenNI.SKEL_LEFT_KNEE

SimpleOpenNI.SKEL_LEFT_FOOT

SimpleOpenNI.SKEL_RIGHT_HIP

SimpleOpenNI.SKEL_RIGHT_KNEE

SimpleOpenNI.SKEL_RIGHT_FOOT

Draw line between skeletons

context.drawLimb(userId, SimpleOpenNI.SKEL_HEAD, SimpleOpenNI.SKEL_NECK);
Get skeleton position

context.getJointPositionSkeleton(userId,SimpleOpenNI.SKEL_LEFT_HAND, pos);
Skeleton

Kinect v1+Processing workshot fabcafe_taipei

  • 1.
  • 2.
    吳 冠 穎MAO ⽊木天寮互動設計有限公司 負責⼈人 交⼤大建築研究所 博⼠士⽣生 mao@mutienliao.com
  • 3.
    Make Things See Kinect+ Processing Workshop MAO WU Fabcafe.Taipe 20141230
  • 4.
    Meet Kinect eet theKinect r Windows ensor and SDK inect for Windows sensor and SDK de the ears and eyes of your application. want to keep their capabilities in mind as esign your application. Introduction > Meet the Kinect for Windows Sensor and SDK RGB Camera Infrared projector and sensor Motorised till Microphone array “…intended for non-commercial use to enable experimentation in the world of natural user interface experiences”
  • 5.
    What Kinect sees Kinectfor Windows is versatile, and can see people holistically, not just smaller hand gestures. Six people can be tracked, including two whole skeletons. The sensor has an RGB (red-green-blue) camera for color video, and an infrared emitter and camera that measure depth. The measurements for depth are returned in millimeters. The Kinect for Windows sensor enables a wide variety of interactions, but any sensor has “sweet spots” and limitations. With this in mind, we defined its focus and limits as follows: Physical limits – The actual capabilities of the sensor and what it can see. Sweet spots – Areas where people experience optimal interactions, given that they’ll often have a large range of movement and need to be tracked with their arms or legs extended. Kinect for Window Human interface Guidelines v1.8 Kinect for Windows | Human Interface Guidelines v1.8 7 • Physical limits: 0.4m to 3m • Sweet spot: 0.8m to 2.5m What Kinect for Windows sees Kinect for Windows is versatile, and can see people holistically, not just smaller hand gestures. Six people can be tracked, including two whole skeletons. The sensor has an RGB (red-green-blue) camera for color video, and an infrared emitter and camera that measure depth. The measurements for depth are returned in millimeters. The Kinect for Windows sensor enables a wide variety of interactions, but any sensor has “sweet spots” and limitations. With this in mind, we defined its focus and limits as follows: Physical limits – The actual capabilities of the sensor and what it can see. Sweet spots – Areas where people experience optimal interactions, given that they’ll often have a large range of movement and need to be tracked with their arms or legs extended. 27° 43.5° 27° 57.5° 0.4m/1.3ft sweet spot physical limits 3m/9.8ft 0.8m/2.6ft 2.5m/8.2ft sweet spot physical limits0.8m/2.6ft 4m/13.1ft 1.2m/4ft 3.5m/11.5ft Near mode depth ranges • Physical limits: 0.8m to 4m (default) Extended depth (beyond 4m) can also be retrieved but skeleton and player tracking get noisier the further you get away, and therefore may be unreliable. • Sweet spot: 1.2m to 3.5m Default mode depth ranges • Horizontal: 57.5 degrees • Vertical: 43.5 degrees, with -27 to +27 degree tilt range up and down Angle of vision (depth and RGB) Note that Near mode is an actual setting for Kinect for Windows, and is different from the various ranges we detail in Interaction Ranges, later in this document. Introduction > Meet the Kinect for Windows Sensor and SDK
  • 6.
    Depth Sensor Comparison BothMicrosoft Kinect and ASUS Xtion (Live) / PrimeSense Carmine sensors are based on the same PrimeSense infra-red technology. So all basic characteristics critical for full-body motion capture are generally the same. But there are certain differences that you can take into account: http://wiki.ipisoft.com/Depth_Sensors_Comparison Device Pros Cons Microsoft Kinect ▪ High quality of device drivers ▪ Stable work with various hardware models ▪ Has motor that can be controlled ▪ Bigger size (12" x 3" x 2.5" against 7" x 2" x 1.5") ▪ Higher weight (3.0 lb agains 0.5 lb) ▪ Require ACDC power supply ▪ Lower RGB image quality in comparison with MS Kinect ASUS Xtion / PrimeSense Carmine ▪ More compact ( 7" x 2" x 1.5" against 12" x 3" x 2.5") ▪ Lighter weight (0.5 lb agains 3.0 lb) ▪ Does not require power supply except USB ▪ Better RGB image quality ▪ Less popular device ▪ Lower drivers quality ▪ Does not work with some USB controllers (especially USB 3.0) ▪ No motor, allow only manual positioning MS Kinect for Windows ASUS Xtion Live ASUS Xtion PrimeSense Camine 1.08 • ASUS Xtion Live or PrimeSense Carmine is recommended because it includes color sensor as well. Color image is currently not used for tracking, but eventually will. Also it helps to operate the system
  • 7.
    Simple-Openni OpenNi library forProcessing https://code.google.com/p/simple-openni/wiki/Installation Processing • Open your processing (> 2.0) • Go to the menu: 
 Sketch->Import Liberary…-> Add Library… • Select and install SimpleOpenNI ◉ Windows Need Install Kinect SDK • Download KinectSDK • Start the Kinect SDK Install
 If everything worked out, you should see the plugged camera in your Device Manager(under 'Kinect for Windows'). In case you have an error when you startup a processing sketch with SimpleOpenNI, try to install the Runtime Libraries from Microsoft. Install SimpleOpenNi
  • 8.
    Depth Image Interaction Inthis section, you will learn how to play with the pixels of depth image and implement some simple interactive codes.
  • 9.
    Simple-Openni OpenNi library forProcessing Reference Index Context The context is the top-level object that encapsulates all the camera and image functionality. The context is typically declared globally and instantiated within setup(). There is an optional flag argument for forcing single or multi-threading, but in our experience we haven't found a difference between the two. SimpleOpenNI context = new SimpleOpenNI(this) SimpleOpenNI context = new SimpleOpenNI(this, SimpleOpenNI.RUN_MODE_SINGLE_THREADED) SimpleOpenNI context = new SimpleOpenNI(this, SimpleOpenNI.RUN_MODE_MULTI_THREADED) For each frame in Processing, the context needs to be updated with the most recent data from the Kinect. context.update() The image drawn by the context defaults to showing the world from its point of view, so when facing the kinect and looking at the resulting image, your movements are not mirrored. It is possible to easily change the configuration so that the Kinect image acts as a mirror. The setMirror() method controls this. Below the first line turns on mirroring and the second turns it off. context.setMirror(true) context.setMirror(false)
  • 10.
    Simple-Openni OpenNi library forProcessing Image The different cameras provide different data and functionality. RGB The RGB camera is the simplest camera and does no more than a standard webcam. It should be noted that it cannot be used when the IR (not depth) image is enabled. It first needs to be enabled within setup(). context.enableRGB() To create a window the same size as the RGB camera image, use rgbHeight() and rgbWidth(). The context needs to be instantiated and RGB enabled before these methods can be called. size(context.rgbWidth(), context.rgbHeight()) To draw what the RGB camera sees, the current frame is drawn within an image() in draw(). image(context.rgbImage(), 0, 0) Reference Index
  • 11.
    Simple-Openni OpenNi library forProcessing Depth The depth image is calculated by the IR camera and the pseudorandom array of IR points projected onto the scene. It first needs to be enabled within setup(). context.enableDepth() To create a window the same size as the depth image, use depthHeight() and dpethWidth(). The context needs to be instantiated and depth enabled before these methods can be called. size(context.depthWidth(), context.depthHeight()) To draw a grayscale image of the depth values, the current frame is drawn within an image() in draw(). image(context.depthImage(), 0, 0) The default colour of the drawn depth image is gray, but the colour can be changed. For instance, the below code shades the image in blue instead of gray. context.setDepthImageColor(100, 150, 200) Like Processing, there are two colour modes for the depth image. The default is RGB, but can be switched to HSB. context.setDepthImageColorMode(0) // for RGB context.setDepthImageColorMode(1) // for HSB An array containing all of the distances in millimetres can be requested with depthMap() int[] dmap = context.depthMap() The size of the depth map can also be requested. int dsize = context.depthMapSize() To draw the depth image, the current frame is drawn within an image() in draw(). image(context.depthImage(), 0, 0) Reference Index
  • 12.
    Simple-Openni OpenNi library forProcessing IR The IR image is what the IR camera sees and cannot be enabled while the RGB image is also enabled or the RGB image will not appear. It first needs to be enabled within setup(). context.enableIR() To create a window the same size as theIR camera image, use irHeight() and irWidth(). The context needs to be instantiated and enabled before these methods can be called. size(context.irWidth(), context.irHeight()) To draw what the IR camera sees, the current frame is drawn within an image() in draw(). image(context.irImage(), 0, 0) The timestamp returns the number of frames that have passed since the ir was enabled. context.depthMapTimeStamp() Reference Index
  • 13.
    that will containall the values of the array and the int that comes before it as a label telling us that everything that goes in this box must be an integer. So, we have an array of integers. How can this box full of numbers store the same kind of information we’ve so far seen in the pixels of an image? The Ki- nect is, after all, a camera. The data that comes from it is two-dimensional, representing all the depth values in its rectangular field of view, whereas an array is one-dimensional, it can only store a single stack of numbers. How do you represent an image as a box full of numbers? Here’s how. Start with the pixel in the top-leftmost corner of the image. Put it in the box. Then, moving to the right along the top row of pixels, put each pixel into the box on top of the previous ones. When you get to the end of the row, jump back to left side of the image, move down one row, and repeat the procedure, continuing to stick the pixels from the second row on top of the ever-growing stack you began in the first row. Continue this procedure for each row of pixels in the image until you reach the very last pixel in the bottom right. Now, instead of a rectangular image, you’ll have a single stack of pixels: a one-dimensional array. All the pixels from each row will be stacked together, and the last pixel from each row will be right in front of the first pixel from the next row, as Figure 2-12 shows. Pixels in the image Pixels in an array row 1 row 1 row 2 row 3 row 3 row 2 1 9 17 25 2 10 18 26 3 11 19 27 4 12 20 28 5 13 21 29 6 14 22 30 7 15 23 31 8 16 24 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Figure 2-12. Pixels in a two-dimensional image get stored as a flat array. Understanding how to split this array back into rows is key to processing images. bject Figure 2-14. Our red circle following my outstretched fist. 3 Next, let’s look at our two for loops. We know from our pseudocode that we want to go through every row in the image, and within every row we want to look at every point in that row. How did we translate that into code? What we’ve got here is two for loops, one inside the other. The outer one increments a variable y from 0 up to 479. We know that the depth image from the Kinect is 480 pixels tall. In other words, it consists of 480 rows of pixels. This outer loop will run once for each one of those rows, setting y to the number of the current row (starting at 0). 4 This line kicks off a for loop that does almost the same thing, but with a different variable, x, and a different constraint, 640. This inner loop will run once per row. We want it to cover every pixel in the row. Since the depth image from the Kinect is 640 pixels wide, we know that it’ll have to run 640 times in order to do so. The code inside of this inner loop, then, will run once per pixel in the
  • 14.
    Gesture & Skeleton Inthis section, you will learn how to play with the the default gesture control and play something with skeleton points.
  • 15.
    GESTURE_WAVE GESTURE_HAND_RAISE GESTURE_CLICK Gesture > GestureInteraction Design ign for variability nput evious experience and expectations w they interact with your application. mind that one person might not perform e the same way as someone else. Gesture interpretation Simply “asking users to wave” doesn’t guarantee the same motion. They might wave: • From their wrist • From their elbow • With their whole arm • With an open hand moving from left to right • By moving their fingers up and down together Simple-Openni OpenNi library for Processing Kinect for Windows | Human Interface Guidelines v1.8 22 Basics In this document we use the term gesture broadly to mean any form of movement that can be used as an input or interaction to control or influence an application. Gestures can take many forms, from simply using your hand to target something on the screen, to specific, learned patterns of movement, to long stretches of continuous movement using the whole body. Gesture is an exciting input method to explore, but it also presents some intriguing challenges. Following are a few examples of commonly used gesture types. Gesture > Basics Hand Gesture
  • 16.
    Simple-Openni OpenNi library forProcessing Hand To start capture hand gesture we need make it enabled within setup(). context.enableHand() Then chose which gesture we need. context.startGesture(SimpleOpenNI.GESTURE_CLICK) context.startGesture(SimpleOpenNI.GESTURE_WAVE) context.startGesture(SimpleOpenNI.GESTURE_HAND_RAISE) Note: Any skeleton data from SimpleOpenNi need to be convert form real world coordination to projective: context.convertRealWorldToProjective(realworld_pos,converted_pos) Hand Gesture
  • 17.
    Skeleton Tracking This tutorialwill explain how to track human skeletons using the Kinect. The OpenNI library can identify the position of key joints on the human body such as the hands, elbows, knees, head and so on. These points form a representation we call the 'skeleton'..
  • 18.
    Simple-Openni OpenNi library forProcessing User To start capture user information we need enable depth and user within setup(). context.enableUser() Get information of users, context.getUsers(); Check if the user is tracking, context.isTrackingSkeleton(userid) Get user’s center context.getCoM(userid,center_pos) Detecting New Users and Losing Users // when a person ('user') enters the field of view void onNewUser(int userId) { println("New User Detected - userId: " + userId);   // start pose detection context.startPoseDetection("Psi", userId); }   // when a person ('user') leaves the field of view void onLostUser(int userId) { println("User Lost - userId: " + userId); } Skeleton
  • 19.
    Simple-Openni OpenNi library forProcessing Drawing the Skeleton Now we will use our drawSkeleton() function to draw lines between joints. Each joint has an identifier (just a reference to a simple integer) and there are 15 joints in all. They are: SimpleOpenNI.SKEL_HEAD SimpleOpenNI.SKEL_NECK SimpleOpenNI.SKEL_LEFT_SHOULDER SimpleOpenNI.SKEL_LEFT_ELBOW SimpleOpenNI.SKEL_LEFT_HAND SimpleOpenNI.SKEL_RIGHT_SHOULDER SimpleOpenNI.SKEL_RIGHT_ELBOW SimpleOpenNI.SKEL_RIGHT_HAND SimpleOpenNI.SKEL_TORSO SimpleOpenNI.SKEL_LEFT_HIP SimpleOpenNI.SKEL_LEFT_KNEE SimpleOpenNI.SKEL_LEFT_FOOT SimpleOpenNI.SKEL_RIGHT_HIP SimpleOpenNI.SKEL_RIGHT_KNEE SimpleOpenNI.SKEL_RIGHT_FOOT Draw line between skeletons context.drawLimb(userId, SimpleOpenNI.SKEL_HEAD, SimpleOpenNI.SKEL_NECK); Get skeleton position context.getJointPositionSkeleton(userId,SimpleOpenNI.SKEL_LEFT_HAND, pos); Skeleton