GESTURE RECOGNITION:
VIRTUALITY AND REALITY
OLEKSANDR BAIEV
PHD
SR. ENGINEER AT SAMSUNG R&D UKRAINE
AGENDA
• Domain and current devices
• Hardware details
• Data processing tricks
• Hand Localization
• Joint’s coordinates reconstruction
• Skeleton recovering
WHY WE SHOULD RECONSTRUCT GESTURES
WHY WE SHOULD RECONSTRUCT GESTURES
CURRENT SOLUTIONS
LeapMotion
MS Kinect
Intell RealSense
Pebble Interfaces
MS HoloLens
WHERE IS DATA SCIENCE
Stereo cameras
IR projector/camera
ToF camera
another sensors
RAW images
Structured light’s images
Depth image
Voodoo
is
here
Hand skeleton
HOW TO GET DEPTH
Stereo images ToF sensorIR projector/camera
1. Several images
2. Calculate disparities
3. Get depth
1. Project structured light
2. Evaluate structure distortion
3. Get depth
1. Measure time
of light flight
2. Get depth
VALUE OF EACH PIXEL IS DISTANCE TO POINT ON IMG
Hand Localization
Coordinates of
joints recovering
Hand skeleton
reconstruction
Tompson et al. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
VALUE OF EACH PIXEL IS DISTANCE TO POINT ON IMG
Hand Localization
Coordinates of
joints recovering
Hand skeleton
reconstruction
TASK1. LOCALIZATION: LETS FIND A HAND
Use Random Forest for pixel-by-
pixel background subtraction
𝐼 𝑢 +
∆𝑢
𝐼 𝑢, 𝑣
, 𝑣 +
∆𝑣
𝐼 𝑢, 𝑣
− I 𝑢, 𝑣 ≥ 𝑑 𝑡
Shotton et al. 2011. Real-Time Human Pose Recognition in Parts from Single Depth Images
VALUE OF EACH PIXEL IS DISTANCE TO POINT ON IMG
Hand Localization
Coordinates of
joints recovering
Hand skeleton
reconstruction
Tompson et al. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
TASK2. REGRESSION: LETS FIND EACH JOINT
Coordinates of each join as output of CNN
TASK2. REGRESSION: LETS FIND EACH JOINT
Trick #2: heat-maps as outputTrick #1: multiscale convolutions
Tompson et al. 2014. Real-
Time Continuous Pose
Recovery of Human Hands
Using Convolutional Networks
VALUE OF EACH PIXEL IS DISTANCE TO POINT ON IMG
Hand Localization
Coordinates of
joints recovering
Hand skeleton
reconstruction
Tompson et al. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
TASK3. INFERENCE: FIND SKELETON
Find sum of max values in
appropriate heat-map’s
values with distances
between joints as set of
constraints
• Accurate prediction
• Real time solution
• Work on standard equipment
Gesture recognition: virtual and reality

Gesture recognition: virtual and reality

  • 1.
    GESTURE RECOGNITION: VIRTUALITY ANDREALITY OLEKSANDR BAIEV PHD SR. ENGINEER AT SAMSUNG R&D UKRAINE
  • 2.
    AGENDA • Domain andcurrent devices • Hardware details • Data processing tricks • Hand Localization • Joint’s coordinates reconstruction • Skeleton recovering
  • 3.
    WHY WE SHOULDRECONSTRUCT GESTURES
  • 4.
    WHY WE SHOULDRECONSTRUCT GESTURES
  • 5.
    CURRENT SOLUTIONS LeapMotion MS Kinect IntellRealSense Pebble Interfaces MS HoloLens
  • 6.
    WHERE IS DATASCIENCE Stereo cameras IR projector/camera ToF camera another sensors RAW images Structured light’s images Depth image Voodoo is here Hand skeleton
  • 7.
    HOW TO GETDEPTH Stereo images ToF sensorIR projector/camera 1. Several images 2. Calculate disparities 3. Get depth 1. Project structured light 2. Evaluate structure distortion 3. Get depth 1. Measure time of light flight 2. Get depth
  • 8.
    VALUE OF EACHPIXEL IS DISTANCE TO POINT ON IMG Hand Localization Coordinates of joints recovering Hand skeleton reconstruction Tompson et al. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
  • 9.
    VALUE OF EACHPIXEL IS DISTANCE TO POINT ON IMG Hand Localization Coordinates of joints recovering Hand skeleton reconstruction
  • 10.
    TASK1. LOCALIZATION: LETSFIND A HAND Use Random Forest for pixel-by- pixel background subtraction 𝐼 𝑢 + ∆𝑢 𝐼 𝑢, 𝑣 , 𝑣 + ∆𝑣 𝐼 𝑢, 𝑣 − I 𝑢, 𝑣 ≥ 𝑑 𝑡 Shotton et al. 2011. Real-Time Human Pose Recognition in Parts from Single Depth Images
  • 11.
    VALUE OF EACHPIXEL IS DISTANCE TO POINT ON IMG Hand Localization Coordinates of joints recovering Hand skeleton reconstruction Tompson et al. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
  • 12.
    TASK2. REGRESSION: LETSFIND EACH JOINT Coordinates of each join as output of CNN
  • 13.
    TASK2. REGRESSION: LETSFIND EACH JOINT Trick #2: heat-maps as outputTrick #1: multiscale convolutions Tompson et al. 2014. Real- Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
  • 14.
    VALUE OF EACHPIXEL IS DISTANCE TO POINT ON IMG Hand Localization Coordinates of joints recovering Hand skeleton reconstruction Tompson et al. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
  • 15.
    TASK3. INFERENCE: FINDSKELETON Find sum of max values in appropriate heat-map’s values with distances between joints as set of constraints
  • 16.
    • Accurate prediction •Real time solution • Work on standard equipment