Upcoming SlideShare
×

# Introduction to Kinect - Update v 1.8

3,665 views

Published on

Intorduction to Kinect:
- SDKs
- Cameras
- Skeletono
- Gestures Design
- Gestures Implementation

Comparison with:
- Leap Motion
- Kinect 2.0
- Intel Perceptual Computing

Published in: Technology
5 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
3,665
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
194
0
Likes
5
Embeds 0
No embeds

No notes for slide
• The Kinect sensor captures depth and color images simultaneously at a frame rate of up to 30 fps. The integration of depth and color data results in a colored point cloud that contains about 300,000 points in every frame. By registering the consecutive depth images one can obtain an increased point density, but also create a complete point cloud of an indoor environment possibly in real time.
• Figure illustrates the relation between the distance of an object point k to the sensor relative to a reference plane and the measured disparity d. To express the 3D coordinates of the object points we consider a depth coordinate system with its origin at the perspective center of the infrared camera. The Z axis is orthogonal to the image plane towards the object, the X axis perpendicular to the Z axis in the direction of the baseline b between the infrared camera center and the laser projector, and the Y axis orthogonal to X and Z making a right handed coordinate system. Assume that an object is on the reference plane at a distance Zo to the sensor, and a speckle on the object is captured on the image plane of the infrared camera. If the object is shifted closer to (or further away from) the sensor the location of the speckle on the image plane will be displaced in the X direction. This is measured in image space as disparity d corresponding to a point k in the object space. From the similarity of triangles we have (formula1)where Zk denotes the distance (depth) of the point k in object space, b is the base length, f is the focal length of the infrared camera, D is the displacement of the point k in object space, and d is the observed disparity in image space. Substituting D from Equation (2) into Equation (1) and expressingZk in terms of the other variables yields (formula 2):Equation (3) is the basic mathematical model for the derivation of depth from the observed disparity provided that the constant parameters Zo, f, and b can be determined by calibration. The Z coordinate of a point together with f defines the imaging scale for that point. The planimetric object coordinates of each point can then be calculated from its image coordinates and the scalewhere xk and yk are the image coordinates of the point, xo and yo are the coordinates of the principal point, and δx and δy are corrections for lens distortion, for which several models with different coefficients exist; see for instance [28]. Note that here we assume that the image coordinate system is parallel with the base line and thus with the depth coordinate system.
• Kinect has a precision of up to 11 bits or 2^11=2048 valori
• 14:30
• 14:40
• 14:45
• The Bgra32 pixel format is also valid to use when working with others RGB resolution
• 15,00
• If this is our modern Vitruvian model, Skeleton is composed by twenty joint point that represent the principal articulation of human boby
• The seated tracking mode is designed to track people who are seated on a chair or couch, or whose lower body is not entirely visible to the sensor. The default tracking mode, in contrast, is optimized to recognize and track people who are standing and fully visible to the sensor.
• The seated tracking mode is designed to track people who are seated on a chair or couch, or whose lower body is not entirely visible to the sensor. The default tracking mode, in contrast, is optimized to recognize and track people who are standing and fully visible to the sensor.
• 15:10
• By default, the skeleton engine selects which availabl e skeletons to actively track. The skeleton engine chooses the first two skeletons available for tracking, which is not al ways desirable largely because the seletion process is unpredicatable. If you so choose, you have the option to select which skeletons to track using the AppChoosesSkeletons property and ChooseSkeletons method. The AppChoosesSkeletons property is false by default and so the skeleton engine selects skeletons for tracking. To manually select which skeletons to track, set the AppChoosesSkeletons property to true and call the ChooseSkeletons method passing in the TrackingIDs of the skeletons you want to track. The ChooseSkeletons method accepts one, two, or no TrackingIDs. The skeleton engine stops tracking all skeletons when the ChooseSkeletons method is passed no parameters. There are some nuances to selecting skeletons:
• 15:15----15:45
• 16:05
• Define clear context for when a gesture is expectedProvide clear feedback to the playerRun the gesture filter when the context warrants itCancel the gesture if context changes
• The first and easyer way is define gesture algorithmically:I describe the gesture as a list of condictionElbow = gomitoShpulder =spalla
• 16:25
• 16:30
• Il peak signal-to-noise ratio (spesso abbreviata con PSNR) è una misura adottata per valutare la qualità di una immagine compressa rispetto all’originale. Questo indice di qualità delle immagini è definito come il rapporto tra la massima potenza di un segnale e la potenza di rumore che può invalidare la fedeltà della sua rappresentanzione compressa. Poiché molti segnali hanno una gamma dinamica molto ampia, il PSNR è solitamente espresso in termini di scala logaritmica di decibel.Maggiore è il valore del PSNR maggiore è la &quot;somiglianza&quot; con l’immagine originale, nel senso che si “avvicina” maggiormente ad essa da un punto di vista percettivo umano.È più facile da definire attraverso l&apos;errore quadratico medio
• Il k-nearest neighbour (k-NN) è un algoritmo utilizzato nel riconoscimento di pattern per la classificazione di oggetti basandosi sulle caratteristiche degli oggetti vicini(distanza euclidea ad esempio la distanza Manhattan) a quello considerato.
• 16:45
• Ultima slide, obbligatoria
• ### Introduction to Kinect - Update v 1.8

1. 1. introduction to kinect NUI, artificial intelligence applications and programming Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
2. 2. WHO I AM… valoriani@elet.polimi.it @MatteoValoriani
3. 3. Follow me on Twitter or the Kitten gets it: @MatteoValoriani
4. 4. Lots of words… Ambient Intelligence Augmented reality Smart device Pervasive Computing Human-centered computing Internet of Things Ubiquitous computing Physical Computing
5. 5. … One concept
6. 6. Interface Evolution CLI GUI Command Line Interface Graphical User Interface NUI
7. 7. Natural User Interface MultiTouch Facial Recognitionpatial Recognition Computer Vision Single Touch Touch Augmented Reality Pen Input Voice Command Gesture Sensing Audio Recognition Geospatial Sensing Natural Speech Accelerometers Sensors Mind control Biometrics Ambient Light Brain Waves 7 Mood Recognition
8. 8. Kinect
9. 9. Kinect’s magic = “Any sufficiently advanced technology is indistinguishable from magic” (Arthur C. Clarke)
10. 10. Power Comes from the Sum The sum: This is where the magic is
11. 11. Application fields Video and examples available at: http://www.microsoft.com/en-us/kinectforwindows/discover/gallery.aspx
14. 14. introduction to kinect Hardware and sensors Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
15. 15. Hardware: Depth resolution: 640x480 px RGB resolution: 1600x1200 px FrameRate: 60 FPS Software Depth 3D DEPTH SENSOR MULTI-ARRAY MIC Color RGB CAMERA MOTORIZED TILT
16. 16. Kinect Sensors http://www.ifixit.com/Teardown/MicrosoftKinect-Teardown/4066/1
17. 17. Kinect Sensors Color Sensor IR Depth Sensor IR Emitter
18. 18. Field of View
19. 19. Depth Sensing
20. 20. see? what does it
21. 21. Depth Sensing IR Emitter IR Depth Sensor
22. 22. Mathematical Model 𝑏+𝑥 𝑙 − 𝑥 𝑟 𝑍−𝑓 = 𝑏 𝑍 𝑑 = 𝑥𝑙 − 𝑥 𝑟 Z= b 𝑏∗𝑓 𝑑
23. 23. Mathematical Model(2) Reference plane distance }= Disparity Image Plane
24. 24. Precision spatial x/y resolution depth z resolution operation range 3mm @ 2m distance 1cm @ 2m distance 10 cm @ 4m distance 0.8m ~ 4m | 0.5m ~ 3m
25. 25. introduction to kinect Microsoft Kinect SDK 1.8 Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
26. 26. Kinect SDKs Nov ‘10: Dec ‘10: Jun ’11: Feb ‘12:
27. 27. Microsoft SDK vs OpenNI Microsoft SDK
28. 28. Microsoft SDK vs OpenNI PrimeSense OpenNI/NITE
29. 29. GET STARTED
30. 30. demo Kinect Samples
31. 31. Potential and applications Depth sensor Skeletal tracking Background removal Object recognition Multi-user Easy Gesture Recognition Microphone array Sound source detection Speech recognition
32. 32. KINECT API BASICS
33. 33. The Kinect Stack App Joint Filtering Gesture Detection Character Retargeting Speech Commands Skeletal Tracking UI Control Identity Speech Recognition Drivers Depth Processing Color Processing Echo Cancellation Tilt Sensor Depth Sensor Color Sensor Microphones
34. 34. System Data Flow Skeletal Tracking Depth Processing Segmentation Human Finding Body Part Classification Not available Identity Facial Recognition Color/Skeleton Match Skeleton Model User Identified App Speech Pipeline Multichannel Echo Cancellation Sound Position Tracking Noise Suppression Speech Detection App App
35. 35. code Detecting a Kinect Sensor
36. 36. private KinectSensor _Kinect; public MainWindow() { InitializeComponent(); this.Loaded += (s, e) => { DiscoverKinectSensor(); }; } private void DiscoverKinectSensor() { KinectSensor.KinectSensors.StatusChanged += KinectSensors_StatusChanged; this.Kinect = KinectSensor.KinectSensors.FirstOrDefault(x => x.Status == KinectStatus.Connected); }
37. 37. private void KinectSensors_StatusChanged(object sender, StatusChangedEventArgs e) { switch(e.Status) { case KinectStatus.Connected: if(this.Kinect == null) { this.Kinect = e.Sensor; } break; case KinectStatus.Disconnected: if(this.Kinect == e.Sensor) { this.Kinect = null; this.Kinect = KinectSensor.KinectSensors .FirstOrDefault(x => x.Status == KinectStatus.Connected); if(this.Kinect == null){ //Notify the user that the sensor is disconnected } } } break; //Handle all other statuses according to needs }
38. 38. public KinectSensor Kinect { get { return this._Kinect; } set { if(this._Kinect != value) { if(this._Kinect != null) { //Uninitialize this._Kinect = null; } if(value != null && value.Status == KinectStatus.Connected) { this._Kinect = value; //Initialize } } } }
39. 39. KinectStatus VALUES KinectStatus What it means Undefined The status of the attached device cannot be determined. Connected The device is attached and is capable of producing data from its streams. DeviceNotGenuine The attached device is not an authentic Kinect sensor. Disconnected The USB connection with the device has been broken. Error Communication with the device produces errors. Error Initializing The device is attached to the computer, and is going through the process of connecting. InsufficientBandwidth Kinect cannot initialize, because the USB connector does not have the necessary bandwidth required to operate the device. NotPowered Kinect is not fully powered. The power provided by a USB connection is not sufficient to power the Kinect hardware. An additional power adapter is required. NotReady Kinect is attached, but is yet to enter the Connected state.
40. 40. code Move the camera
41. 41. Tilt private void setAngle(object sender, RoutedEventArgs e){ if (Kinect != null) { Kinect.ElevationAngle = (Int32)slider1.Value; } } <Slider Height="33" HorizontalAlignment="Left" Margin="0,278,0,0" Name="slider1" VerticalAlignment="Top" Width="308" SmallChange="1 IsSnapToTickEnabled="True" /> <Button Content="OK" Height="29" HorizontalAlignment="Left" Margin="396,278,0,0" Name="button1" VerticalAlignment="Top" Width="102" Click="setAngle" />
42. 42. introduction to kinect Camera Fundamentals Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
43. 43. Cameras Events
44. 44. The ImageStream object model
45. 45. The ImageFrame object model
46. 46. ColorImageFormat Member name Description InfraredResolution640x480Fps30 16 bits, using the top 10 bits from a PixelFormats.Gray16 format (with the 6 least significant bits always set to 0) whose resolution is 640 x 480 and frame rate is 30 frames per second. Introduced in 1.6. RawBayerResolution1280x960Fps12 Bayer data (8 bits per pixel, layout in alternating pixels of red, green and blue) whose resolution is 1280 x 960 and frame rate is 12 frames per second. Introduced in 1.6. RawBayerResolution640x480Fps30 Bayer data (8 bits per pixel, layout in alternating pixels of red, green and blue) whose resolution is 640 x 480 and frame rate is 30 frames per second. Introduced in 1.6. RawYuvResolution640x480Fps15 Raw YUV data whose resolution is 640 x 480 and frame rate is 15 frames per second. RgbResolution1280x960Fps12 RBG data whose resolution is 1280 x 960 and frame rate is 12 frames per second. RgbResolution640x480Fps30 RBG data whose resolution is 640 x 480 and frame rate is 30 frames per second. YuvResolution640x480Fps15 YUV data whose resolution is 640 x 480 and frame rate is 15 frames per second. Undefined The format is not defined. colorStream.Enable(ColorImageFormat.RgbResolution640x480Fps30);
47. 47. DepthImageFormat Member name Description Resolution320x240Fps30 The resolution is 320 x 240; the frame rate is 30 frames per second. Resolution640x480Fps30 The resolution is 640 x 480; the frame rate is 30 frames per second. Resolution80x60Fps30 The resolution is 80 x 60; the frame rate is 30 frames per second. Undefined The format is not defined. depthStream.Enable(DepthImageFormat.Resolution640x480Fps30);
48. 48. BYTES PER PIXEL The stream Format determines the pixel format and therefore the meaning of the bytes. Stride
49. 49. Depth data Distance Player Distance in mm from Kinect ex: 2,000mm 1-6 players
50. 50. Depth Range Near Mode Default Mode .4 .8 3 4 8
51. 51. Depth data int depth = depthPoint >> DepthImageFrame.PlayerIndexBitmaskWidth; int player = depthPoint & DepthImageFrame.PlayerIndexBitmask;
52. 52. Depth and Segmentation map
53. 53. code Processing & Displaying a Color Data
54. 54. private WriteableBitmap _ColorImageBitmap; private Int32Rect _ColorImageBitmapRect; private int _ColorImageStride; private void InitializeKinect(KinectSensor sensor) { if (sensor != null){ ColorImageStream colorStream = sensor.ColorStream; colorStream.Enable(); this._ColorImageBitmap = new WriteableBitmap(colorStream.FrameWidth, colorStream.FrameHeight, 96, 96, PixelFormats.Bgr32, null); this._ColorImageBitmapRect = new Int32Rect(0, 0, colorStream.FrameWidth, colorStream.FrameHeight); this._ColorImageStride = colorStream.FrameWidth * colorStream.FrameBytesPerPixel; ColorImageElement.Source = this._ColorImageBitmap; sensor.ColorFrameReady += Kinect_ColorFrameReady; sensor.Start(); } }
55. 55. private void Kinect_ColorFrameReady(object sender, ColorImageFrameReadyEventArgs e) { using (ColorImageFrame frame = e.OpenColorImageFrame()) { if (frame != null) { byte[] pixelData = new byte[frame.PixelDataLength]; frame.CopyPixelDataTo(pixelData); this._ColorImageBitmap.WritePixels(this._ColorImageBitmapRect, pixelData, this._ColorImageStride, 0); } } }
56. 56. code Taking a Picture
57. 57. private void TakePictureButton_Click(object sender, RoutedEventArgs e) string fileName = "snapshot.jpg"; if (File.Exists(fileName)) File.Delete(fileName); } { { using (FileStream savedSnapshot = new FileStream(fileName, FileMode.CreateNew)) BitmapSource image = (BitmapSource)VideoStreamElement.Source; JpegBitmapEncoder jpgEncoder = new JpegBitmapEncoder(); jpgEncoder.QualityLevel = 70; jpgEncoder.Frames.Add(BitmapFrame.Create(image)); jpgEncoder.Save(savedSnapshot); savedSnapshot.Flush(); savedSnapshot.Close(); savedSnapshot.Dispose(); } } {
58. 58. code Processing & Displaying a DepthData
59. 59. Kinect.DepthStream.Enable(DepthImageFormat.Resolution320x240Fps30); Kinect.DepthFrameReady += Kinect_DepthFrameReady; void Kinect_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e) { using (DepthImageFrame frame = e.OpenDepthImageFrame()) { if (frame != null) { short[] pixelData = new short[frame.PixelDataLength]; frame.CopyPixelDataTo(pixelData); int stride = frame.Width * frame.BytesPerPixel; ImageDepth.Source = BitmapSource.Create(frame.Width, frame.Height, 96, 96, PixelFormats.Gray16, null, pixelData, } } } } stride);
60. 60. introduction to kinect Skeletal Tracking Fundamentals Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
61. 61. Skeletal Tracking History
62. 62. Skeleton Data
63. 63. Tracking Modes
64. 64. Tracking Modes Details
65. 65. Traking in Near Mode // enable returning skeletons while depth is in Near Range this.kinect.SkeletonStream.EnableTrackingInNearRange = true; private void EnableNearModeSkeletalTracking() { if (this.kinect != null && this.kinect.DepthStream != null && this.kinect.SkeletonStream != null) { this.kinect.DepthStream.Range = DepthRange.Near; // Depth in near range enabled this.kinect.SkeletonStream.EnableTrackingInNearRange = true; // enable returning skeletons while depth is in Near Range this.kinect.SkeletonStream.TrackingMode = SkeletonTrackingMode.Seated; // Use seated tracking } }
66. 66. The SkeletonStream object model AllFramesReady and SkeletonFrameReady Events return a SkeletonFrame which contain skeleton data
67. 67. The Skeleton object model Each skeleton has a unique identifier TrackingID Each joint has a Position, which is of type SkeletonPoint that reports the X, Y, and Z of the joint.
68. 68. SkeletonTrakingState SkeletonTrakingState What it means NotTracked Skeleton object does not represent a tracked user. The Position field of the Skeleton and every Joint in the joints collection is a zero point PositionOnly The skeleton is detected, but is not actively being tracked. The Position field has a non-zero point, but the position of each Joint in the joints collection is a zero point. Tracked The skeleton is actively being tracked. The Position field and all Joint objects in the joints collection have non-zero points.
69. 69. JointsTrakingState JointsTrakingState What it means Inferred Occluded, clipped, or low confidence joints. The skeleton engine cannot see the joint in the depth frame pixels, but has made a calculated determination of the position of the joint. NotTracked The position of the joint is indeterminable. The Position value is a zero point. Tracked The joint is detected and actively followed. Use TransformSmoothParameters to smooth joint data to reduce jitter
70. 70. code Skeleton V1
71. 71. private KinectSensor _KinectDevice; private readonly Brush[] _SkeletonBrushes = { Brushes.Black, Brushes.Crimson, Brushes.Indigo, Brushes.DodgerBlue, Brushes.Purple, Brushes.Pink }; private Skeleton[] _FrameSkeletons; #endregion Member Variables private void InitializeKinect() { this._KinectDevice.SkeletonStream.Enable(); this._FrameSkeletons = new Skeleton[this._KinectDevice.SkeletonStream.FrameSkeletonArrayLength]; this.KinectDevice.SkeletonFrameReady += KinectDevice_SkeletonFrameReady; this._KinectDevice.Start(); } private void UninitializeKinect() { this._KinectDevice.Stop(); this._KinectDevice.SkeletonFrameReady -= KinectDevice_SkeletonFrameReady; this._KinectDevice.SkeletonStream.Disable(); this._FrameSkeletons = null; }
72. 72. private void KinectDevice_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e) { using (SkeletonFrame frame = e.OpenSkeletonFrame()) { if (frame != null) { Skeleton skeleton; Brush userBrush; Copy SkeletonsData LayoutRoot.Children.Clear(); variable frame.CopySkeletonDataTo(this._FrameSkeletons); for (int i = 0; i < this._FrameSkeletons.Length; i++) { skeleton = this._FrameSkeletons[i]; in local Actually, Length is 6 if (skeleton.TrackingState != SkeletonTrackingState.NotTracked) { Point p = GetJointPoint(skeleton.Position); Ellipse ell = new Ellipse(); ell.Height = ell.Width = 30; userBrush = this._SkeletonBrushes[i % this._SkeletonBrushes.Length]; ell.Fill = userBrush; LayoutRoot.Children.Add(ell); Canvas.SetTop(ell, p.Y - ell.Height / 2); Canvas.SetLeft(ell, p.X - ell.Width / 2); } } } } } Scale Position
73. 73. private Point GetJointPoint(SkeletonPoint skPoint) Mapping different { Coordinate systems // Change System 3D ->2 D DepthImagePoint point = this.KinectDevice.MapSkeletonPointToDepth(skPoint, this.KinectDevice.DepthStream.Format); // Scale point to actual dimension of container point.X = point.X * (int)this.LayoutRoot.ActualWidth / this.KinectDevice.DepthStream.FrameWidth; point.Y = point.Y * (int)this.LayoutRoot.ActualHeight / this.KinectDevice.DepthStream.FrameHeight; return new Point(point.X, point.Y); }
74. 74. Smoothing TransformSmoothParamet ers What it means Correction A float ranging from 0 to 1.0. The lower the number, the more correction is applied. JitterRadius Sets the radius of correction. If a joint position “jitters” outside of the set radius, it is corrected to be at the radius. Float value measured in meters. MaxDeviationRadius Used this setting in conjunction with the JitterRadius setting to determine the outer bounds of the jitter radius. Any point that falls outside of this radius is not considered a jitter, but a valid new position. Float value measured in meters. Prediction Sets the number of frames predicted. Smoothing Determines the amount of smoothing applied while processing skeletal frames. It is a float type with a range of 0 to 1.0. The higher the value, the more smoothing applied. A zero value does not alter the skeleton data.
75. 75. code Skeleton V2
76. 76. private void InitializeKinect() { var parameters = new TransformSmoothParameters { Smoothing = 0.3f, Correction = 0.0f, Prediction = 0.0f, JitterRadius = 1.0f, MaxDeviationRadius = 0.5f }; _KinectDevice.SkeletonStream.Enable(parameters); this._KinectDevice.SkeletonStream.Enable(); this._FrameSkeletons = new Skeleton[this._KinectDevice.SkeletonStream.FrameSkeletonArrayLength]; this.KinectDevice.SkeletonFrameReady += KinectDevice_SkeletonFrameReady; this._KinectDevice.Start(); }
77. 77. private void KinectDevice_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e) using (SkeletonFrame frame = e.OpenSkeletonFrame()) { if (frame != null) { Skeleton skeleton; Brush userBrush; LayoutRoot.Children.Clear(); frame.CopySkeletonDataTo(this._FrameSkeletons); for (int i = 0; i < this._FrameSkeletons.Length; i++) { skeleton = this._FrameSkeletons[i]; if (skeleton.TrackingState != SkeletonTrackingState.NotTracked) { Point p = GetJointPoint(skeleton.Position); Ellipse ell = new Ellipse(); ell.Height = ell.Width = 30; userBrush = this._SkeletonBrushes[i % this._SkeletonBrushes.Length]; ell.Fill = userBrush; LayoutRoot.Children.Add(ell); Canvas.SetTop(ell, p.Y - ell.Height / 2); Canvas.SetLeft(ell, p.X - ell.Width / 2); if (skeleton.TrackingState == SkeletonTrackingState.Tracked) DrawSkeleton(skeleton, userBrush); } } } } } } { {
78. 78. private void DrawSkeleton(Skeleton skeleton, Brush userBrush) { //Draws the skeleton’s head and torso joints = new[] { JointType.Head, JointType.ShoulderCenter, JointType.ShoulderLeft, JointType.Spine, JointType.ShoulderRight, JointType.ShoulderCenter, JointType.HipCenter, JointType.HipLeft, JointType.Spine, JointType.HipRight, JointType.HipCenter }; LayoutRoot.Children.Add(CreateFigure(skeleton, userBrush, joints)); //Draws the skeleton’s left leg joints = new[] { JointType.HipLeft, JointType.KneeLeft, JointType.AnkleLeft, JointType.FootLeft }; LayoutRoot.Children.Add(CreateFigure(skeleton, userBrush, joints)); //Draws the skeleton’s right leg joints = new[] { JointType.HipRight, JointType.KneeRight, JointType.AnkleRight, JointType.FootRight }; LayoutRoot.Children.Add(CreateFigure(skeleton, userBrush, joints)); //Draws the skeleton’s left arm joints = new[] { JointType.ShoulderLeft, JointType.ElbowLeft, JointType.WristLeft, JointType.HandLeft }; LayoutRoot.Children.Add(CreateFigure(skeleton, userBrush, joints)); //Draws the skeleton’s right arm joints = new[] { JointType.ShoulderRight, JointType.ElbowRight, JointType.WristRight, JointType.HandRight }; LayoutRoot.Children.Add(CreateFigure(skeleton, userBrush, joints)); }
79. 79. private Polyline CreateFigure(Skeleton skeleton, Brush brush, JointType[] joints) { Polyline figure = new Polyline(); figure.StrokeThickness = 4; figure.Stroke = brush; for (int i = 0; i < joints.Length; i++) { figure.Points.Add(GetJointPoint(skeleton.Joints[joints[i]].Position)); } return figure; }
80. 80. code Skeleton V3
81. 81. private void InitializeKinect() { var parameters = new TransformSmoothParameters{ Smoothing = 0.3f, Correction = 0.0f, Prediction = 0.0f, JitterRadius = 1.0f, MaxDeviationRadius = 0.5f }; _KinectDevice.SkeletonStream.Enable(parameters); this._KinectDevice.SkeletonStream.Enable(); this._FrameSkeletons = new Skeleton[this._KinectDevice.SkeletonStream.FrameSkeletonArrayLength]; this.KinectDevice.SkeletonFrameReady += KinectDevice_SkeletonFrameReady; this._KinectDevice.ColorStream.Enable(ColorImageFormat.RgbResolution640x480Fps30); this._KinectDevice.ColorFrameReady += new EventHandler<ColorImageFrameReadyEventArgs>(_KinectDevice_ColorFrameReady); this._ColorImageBitmap = new WriteableBitmap(_KinectDevice.ColorStream.FrameWidth, _KinectDevice.ColorStream.FrameHeight, 96, 96, PixelFormats.Bgr32, null); this._ColorImageBitmapRect = new Int32Rect(0, 0, _KinectDevice.ColorStream.FrameWidth, _KinectDevice.ColorStream.FrameHeight); this._ColorImageStride = _KinectDevice.ColorStream.FrameWidth * _KinectDevice.ColorStream.FrameBytesPerPixel; ColorImage.Source = this._ColorImageBitmap; this._KinectDevice.Start(); } Video Stream initialization
82. 82. private Point GetJointPoint(SkeletonPoint skPoint) { // Change System 3D ->2D this.KinectDevice.DepthStream.Format); Mapping on Color Coordinate system ColorImagePoint point = this.KinectDevice.MapSkeletonPointToColor(skPoint, this.KinectDevice.ColorStream.Format); // Scale point to actual dimension of container point.X = point.X * (int)this.LayoutRoot.ActualWidth / this.KinectDevice.DepthStream.FrameWidth; point.Y = point.Y * (int)this.LayoutRoot.ActualHeight / this.KinectDevice.DepthStream.FrameHeight; return new Point(point.X, point.Y); }
83. 83. Choosing Skeletons AppChoosesSkeletons ChooseSkeletons AppChoosesSkeletons What it means False(default) The skeleton engine chooses the first two skeletons available for tracking (selection process is unpredictable) True To manually select which skeletons to track call the ChooseSkeletons method passing in the TrackingIDs of the skeletons you want to track. The ChooseSkeletons method accepts one, two, or no TrackingIDs. The skeleton engine stops tracking all skeletons when the ChooseSkeletons method is passed no parameters.
84. 84. Choosing Skeletons(2)
85. 85. Gesture Interaction How to design a gesture? Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
86. 86. Gesture
87. 87. Interaction metaphors Depends on the task Important aspect in design of UI Cursors (hands tracking): Target an object Avatars (body tracking): Interaction with virtual space
88. 88. The shadow/mirror effect Shadow Effect I see the back of my avatar Problems with Z movements Mirror Effect I see the front of my avatar Problem with mapping left/right movements
89. 89. User Interaction Game Challenging = fun UI Challenging = easy and effective
90. 90. Gesture semantically fits user task
91. 91. User action fits UI reaction 1 2 3 4 5
92. 92. User action fits UI reaction 5 61 72 83 94 10 5
93. 93. Gestures family-up 1 2 3 4 5
94. 94. Handed gestures 1 2 3 4 5
95. 95. Repeting Gesture?
96. 96. Repeting Gesture?
97. 97. Number of Hands 1 2 3 4 5
98. 98. Symmetrical two-handed gesture
99. 99. Gesture payoff 1 2 3 4 5
100. 100. Fatigue kills gesture Fatigue increase messiness  poor performance  frustration  bad UX
101. 101. Gorilla Arm problem Try to raise your arm for 10 minutes… 
102. 102. Comfortable positions
103. 103. User Posture
104. 104. The challenges
105. 105. Gesture Recognition Artificial Intelligence for Kinect Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
106. 106. Heuristics Cost Heuristics Machine Learning Gesture Complexity
107. 107. Define What Constitutes a Gesture
108. 108. Define Key Stages of a Gesture Definite gesture Continuous gesture Contact or release point Direction Initial velocity Frequency Amplitude
109. 109. Detection Filter Only When Necessary!
110. 110. Causes of Missing Information
111. 111. Gesture Definition threshold threshold threshold threshold
112. 112. Implementation Overview
113. 113. code Static Postures: HandOnHead
114. 114. class GestureRecognizer { public Dictionary<JointType, List<Joint>> skeletonSerie = new Dictionary<JointType, List<Joint>>() { { JointType.AnkleLeft, new List<Joint>()}, { JointType.AnkleRight, new List<Joint>()}, { JointType.ElbowLeft, new List<Joint>()}, { JointType.ElbowRight, new List<Joint>()}, { JointType.FootLeft, new List<Joint>()}, { JointType.FootRight, new List<Joint>()}, { JointType.HandLeft, new List<Joint>()}, { JointType.HandRight, new List<Joint>()}, { JointType.Head, new List<Joint>()}, { JointType.HipCenter, new List<Joint>()}, { JointType.HipLeft, new List<Joint>()}, { JointType.HipRight, new List<Joint>()}, { JointType.KneeLeft, new List<Joint>()}, { JointType.KneeRight, new List<Joint>()}, { JointType.ShoulderCenter, new List<Joint>()}, { JointType.ShoulderLeft, new List<Joint>()}, { JointType.ShoulderRight, new List<Joint>()}, { JointType.Spine, new List<Joint>()}, { JointType.WristLeft, new List<Joint>()}, Key Value { JointType.WristRight, new List<Joint>()} }; AnkleLeft protected List<DateTime> timeList; <Vt1, Vt2, Vt3, Vt4,..> AnkleRight <Vt1, Vt2, Vt3, Vt4,..> ElbowLeft <Vt1, Vt2, Vt3, Vt4,..> private static List<JointType> typesList = new List<JointType>() {JointType.AnkleLeft, JointType.AnkleRight, JointType.ElbowLeft, JointType.ElbowRight, JointType.FootLeft, JointType.FootRight, JointType.HandLeft, JointType.HandRight, JointType.Head, JointType.HipCenter, JointType.HipLeft, JointType.HipRight, JointType.KneeLeft, JointType.KneeRight, JointType.ShoulderCenter, JointType.ShoulderLeft, JointType.ShoulderRight, JointType.Spine, JointType.WristLeft, JointType.WristRight }; //... continue }
115. 115. const int bufferLenght=10; public void Recognize(JointCollection jointCollection, DateTime date) timeList.Add(date); foreach (JointType type in typesList) { skeletonSerie[type].Add(jointCollection[type]); if (skeletonSerie[type].Count > bufferLenght) { skeletonSerie[type].RemoveAt(0); } } startRecognition(); } List<Gesture> gesturesList = new List<Gesture>(); private void startRecognition() { gesturesList.Clear(); gesturesList.Add(HandOnHeadReconizerRT(JointType.HandLeft, JointType.ShoulderLeft)); // Do ... } {
116. 116. Boolean isHOHRecognitionStarted; DateTime StartTimeHOH = DateTime.Now; private Gesture HandOnHeadReconizerRT (JointType hand, JointType shoulder) { // Correct Position if (skeletonSerie[hand].Last().Position.Y > skeletonSerie[shoulder].Last().Position.Y + 0.2f) { if (!isHOHRecognitionStarted) { isHOHRecognitionStarted = true; StartTimeHOH = timeList.Last(); } else { double totalMilliseconds = (timeList.Last() - StartTimeHOH).TotalMilliseconds; // time ok? if ((totalMilliseconds >= HandOnHeadMinimalDuration)) { isHOHRecognitionStarted = false; return Gesture.HandOnHead; } Alternative: count } number of } else {//Incorrect Position occurrences if (isHOHRecognitionStarted) { isHOHRecognitionStarted = false; } } return Gesture.None; }
118. 118. code Swipe
119. 119. const float SwipeMinimalLength = 0.08f; const float SwipeMaximalHeight = 0.02f; const int SwipeMinimalDuration = 200; const int SwipeMaximalDuration = 1000; const int MinimalPeriodBetweenGestures = 0; ∆x too small or ∆y too big  shift start private Gesture HorizzontalSwipeRecognizer(List<Joint> positionList) { int start = 0; for (int index = 0; index < positionList.Count - 1; index++) { ∆x > minimal lenght if ((Math.Abs(positionList[0].Position.Y - positionList[index].Position.Y) > SwipeMaximalHeight) || Math.Abs((positionList[index].Position.X - positionList[index + 1].Position.X)) < 0.01f) { start = index; } ∆t in the accepted range if ((Math.Abs(positionList[index].Position.X - positionList[start].Position.X) > SwipeMinimalLength)) { double totalMilliseconds = (timeList[index] - timeList[start]).TotalMilliseconds; if (totalMilliseconds >= SwipeMinimalDuration && totalMilliseconds <= SwipeMaximalDurati if (DateTime.Now.Subtract(lastGestureDate).TotalMilliseconds > MinimalPeriodBetweenGestures) lastGestureDate = DateTime.Now; if (positionList[index].Position.X - positionList[start].Position.X < 0) return Gesture.SwipeRightToLeft; else return Gesture.SwipeLeftToRight; } } } } return Gesture.None; } { {
120. 120. public delegate void SwipeHadler(object sender, GestureEventArgs e); public event SwipeHadler Swipe; private Gesture HorizzontalSwipeRecognizer(JointType jointType) { Gesture g = HorizzontalSwipeRecognizer(skeletonSerie[ jointType]); switch (g) { case Gesture.None: break; case Gesture.SwipeLeftToRight: if (Swipe != null) Swipe(this, new GestureEventArgs("SwipeLeftToRight")); break; case Gesture.SwipeRightToLeft: if (Swipe != null) Swipe(this, new GestureEventArgs("SwipeRightToLeft")); break; default: break; } } return g; ... public class GestureEventArgs : EventArgs { public string text; public GestureEventArgs(string text) { this.text = text; } Personalized EventArgs }
121. 121. demo Heuristic Based Gesture Detection: FAAST
122. 122. Pros & Cons PROs CONs Easy to understand Easy to implement (for simple gestures) Easy to debug Challenging to choose best values for parameters Doesn’t scale well for variants of same gesture Gets challenging for complex gestures Challenging to compensate for latency Recommendation Use for simple gestures (like Hand wave, Head movement, …)
123. 123. HeadAboveBaseLine LeftKneeAboveBaseLine RightKneeAboveBaseLine 1 2 3  Jump?
124. 124. P1 P2 Pn 1 2 3   n i 1 iPi  
125. 125. Hand.y HandAboveElbow Elbow.y Hand.z Shoulder.z HandInFrontOfShoulder 1 1 2 (HandAboveElbow * 1) + (HandInFrontOfShoulder * 1) >= 2
126. 126. Hand.y HandAboveElbow Elbow.y Hand.z Shoulder.z HandInFrontOfShoulder 1 1 1 (HandAboveElbow * 1) + (HandInFrontOfShoulder * 1) >= 1
127. 127. P1 P2 Pn 1 2 3   n  i 1 n iPi i 1 i  
128. 128. HeadAboveBaseLine LeftKneeAboveBaseLine RightKneeAboveBaseLine LegsStraightPreviouslyBent 0.3 0.1 0.8 0.1 0.5 Jump?
129. 129. HeadAboveBaseLine LeftKneeAboveBaseLine RightKneeAboveBaseLine LegsStraightPreviouslyBent 0.3 0.1 0.8 0.1 0.5 Jump?
130. 130. 0.3 HeadAboveBaseLine LeftKneeAboveBaseLine RightKneeAboveBaseLine LegsStraightPreviouslyBent 0.1 0.1 0.8 1 0.5 HeadBelowBaseLine LeftKneeBelowBaseLine 1 RightKneeBelowBaseLine 1 LeftAnkleBelowBaseLine 1 RightAnkleBelowBaseLine 1 BodyFaceUpwards 1 2 AND 1 1 OR -1 1 0 NOT Jump?
131. 131. 0.3 HeadAboveBaseLine LeftKneeAboveBaseLine RightKneeAboveBaseLine LegsStraightPreviouslyBent HeadFarAboveBaseLine 0.1 0.1 0.8 1 0.5 HeadBelowBaseLine LeftKneeBelowBaseLine 1 RightKneeBelowBaseLine 2 1 1 LeftAnkleBelowBaseLine 1 RightAnkleBelowBaseLine 1 BodyFaceUpwards 1 AND 1 1 OR -1 0 NOT 1 OR Jump?
132. 132. PROs Complex gestures can be detected Good CPU performance Scale well for variants of same gesture Nodes can be reused in different gestures CONs Not easy to debug Challenging to compensate for latency Small changes in parameters can have dramatic changes in results Very time consuming to choose manually parameters Recommendation Use for composed gestures (Jump, duck, punch,…) Break complex gestures into collection of simple gestures
133. 133. Gesture Definition
134. 134. Exemplar Matching 0.3 1 2 MSE   Distance i N 2 PSNR  10 * log 10 ( MAX / MSE)
135. 135. Exemplar Matching Neighbour
136. 136. Exemplar Matching 25 20 15 PSNR 10 5 0 1 2 3 4 5 6 7 8
137. 137. demo DTW Based Gesture Detection: Swipe
138. 138. Pros & Cons PROs CONs Very complex gestures can be detected Requires lots of resources to be robust DTW allows for different speeds Multiple recordings of multiple people for one gesture Can scale for variants of same gesture i.e. requires lots of CPU and memory Easy to visualize exemplar matching Recommendation Use for complex context-sensitive dynamic gestures - Dancing, fitness exercises,…
139. 139. Comparison 180 160 140 120 100 80 60 40 20 0 K-Nearest DTW Weighted Network
140. 140. Performance
141. 141. Posture Abstraction
142. 142. Distance Model d1 d2 d3 d4 Distances vector: d1: 33 d2: 30 d3: 49 d4: 53 …
143. 143. Displacement Model v1 v2 v3 v4 Displacement vector: v1: 0, 33, 0 v2: 15, 25, 0 v3: 35, 27, 0 v4: 43, 32, 0 …
144. 144. Hierarchical Model h1 h2 h3 h4 Hierarchical vector: h1: 0, 33, 0 h2: 15, -7, 0 h3: 20, 9, 0 h4: 18, 9, 0 …
145. 145. Normalization
146. 146. Relative Normalization N1
147. 147. Unit Normalization N1 N2 N4 N3
148. 148. new Choices Add new SemanticResultValue Add(new SemanticResultValue Add new SemanticResultValue Add new SemanticResultValue new SemanticResultValue Add new SemanticResultValue Add new SemanticResultValue Add new SemanticResultValue new GrammarBuilder Append new Grammar(gb "forward", "FORWARD" "forwards", "FORWARD" "straight", "FORWARD" "backward", "BACKWARD" "backwards", "BACKWARD" "back", "BACKWARD" "turn left", "LEFT" "turn right", "RIGHT" <grammar ...> <rule id="rootRule"> <one-of> <item> <tag>FORWARD</tag> <one-of> <item>forward</item> <item>straight</item> </one-of> </item> <item> <tag>BACKWARD</tag> <one-of> <item>backward</item> <item>backwards</item> <item>back</item> </one-of> </item> </one-of> </rule> </grammar>
149. 149. RecognizerInfo ri = GetKinectRecognizer(); if (null != ri) { recognitionSpans = new List<Span> { forwardSpan, backSpan, rightSpan, leftSpan }; this.speechEngine = new SpeechRecognitionEngine(ri.Id); using (var memoryStream = new MemoryStream(Encoding.ASCII.GetBytes(Properties.Resources.SpeechGrammar))) { var g = new Grammar(memoryStream); speechEngine.LoadGrammar(g); } speechEngine.SpeechRecognized += SpeechRecognized; speechEngine.SpeechRecognitionRejected += SpeechRejected; speechEngine.SetInputToAudioStream( sensor.AudioSource.Start(), new SpeechAudioFormatInfo (EncodingFormat.Pcm,16000, 16, 1, 32000, 2, null)); speechEngine.RecognizeAsync(RecognizeMode.Multiple); }
150. 150. private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { const double ConfidenceThreshold = 0.3; if (e.Result.Confidence >= ConfidenceThreshold) { switch (e.Result.Semantics.Value.ToString()) { case "FORWARD": // do something case "BACKWARD": // do something case "LEFT": // do something case "RIGHT": // do something } } . . . }
151. 151. private void WindowClosing(object sender, CancelEventArgs e) { if (null != this.sensor) { this.sensor.AudioSource.Stop(); this.sensor.Stop(); this.sensor = null; } if (null != this.speechEngine) { this.speechEngine.SpeechRecognized -= SpeechRecognized; this.speechEngine.SpeechRecognitionRejected -= SpeechRejected; this.speechEngine.RecognizeAsyncStop(); } }
152. 152. kinect Application Showcase Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
153. 153. What Next? Kinect 2, Leap Motion, Intel Perceptual Computing Matteo Valoriani mvaloriani AT gmail.com @MatteoValoriani
155. 155. Leap Motion for Developers
156. 156. Intel Perceptual Computing https://www.youtube.com/watch?v=WePIY7svVtg
157. 157. Xbox One - Kinect 2
158. 158. Xbox One - Kinect 2
159. 159. Xbox One - Kinect 2 http://youtu.be/Hi5kMNfgDS4
160. 160. Which to choose? ALL Best for: Controlled kiosk environments with a pointing-based UI. Generally best for general audience desktop apps which can be distributed in the Airspace store.
161. 161. Which to choose? ALL Best for: Desktop/laptop applications where the user will be seated in front of the PC. Close range applications where features, apart from hand tracking and recognition, are necessary without too much precision or accuracy.
162. 162. Which to choose? ALL Best for: Kiosks, installations, and digital signage projects where the user will be standing fairly far away from the display.
163. 163. … TIRED?
164. 164. Q&A http://www.communitydays.it/
165. 165. FOLLOW ME ON TWITTER OR THE KITTEN GETS IT: @MatteoValoriani
166. 166. So Long and Thanks for all the Fish
167. 167. Resources and tools http://channel9.msdn.com/Search?term=kinect&type=All http://kinecthacks.net/ http://www.modmykinect.com http://kinectforwindows.org/resources/ http://www.kinecteducation.com/blog/2011/11/13/9-excellent-programming-resources-for-kinect/ http://kinectdtw.codeplex.com/ http://kinectrecognizer.codeplex.com/ http://projects.ict.usc.edu/mxr/faast/ http://leenissen.dk/fann/wp/
168. 168. Credits & References http://campar.in.tum.de/twiki/pub/Chair/T eachingSs11Kinect/2011DSensors_LabCourse_Kinect.pdf