A Hand Gesture Recognition System
Based on Local Linear Embedding
Presented by Chang Liu
2006. 3
Outline
 Introduction
 CSL and Pre-processing
 Locally Linear Embedding
 Experiments
 Conclusion
Introduction
 Interaction with computers are not
comfortable experience
 Computers should communicate with
people with body language.
 Hand gesture recognition becomes
important
 Interactive human-machine interface and
virtual environment
Introduction
 Two common technologies for hand
gesture recognition
 glove-based method

Using special glove-based device to extract
hand posture

Annoying
 vision-based method

3D hand/arm modeling

Appearance modeling
Introduction
 3D hand/arm modeling
 Highly computational complexity
 Using many approximation process
 Appearance modeling
 Low computational complexity
 Real-time processing
Introduction
 Overview of algorithm proposed in the
paper
 Vision-based method to be used for the
problem of CSL real-time recognition
 Input: 2D video sequences
 two major steps

Hand gesture region detection

Hand gesture recognition
CSL and Pre-processing
 Sign Language
 Rely on the hearing society
 Two main elements:

Low and simple level signed alphabet, mimics
the letters of the native spoken language

Higher level signed language, using actions to
mimic the meaning or description of the sign
CSL and Pre-processing
 CSL is the abbreviation for Chinese
Sign Language
 30 letters in CSL alphabet  Objects
in recognition
Pre-processing of
Hand Gesture Recognition
 Detection of Hand Gesture Regions
 Aim to fix on the valid frames and locate
the hand region from the rest of the
image.
 Low time consuming  fast processing
rate  real time speed
Pre-processing of
Hand Gesture Recognition
 Detect skin region from the rest of the
image by using color.
 Each color has three components
 hue, saturation, and value
 chroma consists of hue and saturation is
separated from value
 Under different condition, chroma is
invariant.
Pre-processing of
Hand Gesture Recognition
 Color is represented in RGB space,
also in YUV and YIQ space.
 In YUV space
 saturation  displacement
 hue -> amplitude
 In YIQ space
 The color saturation cue I is combined with
Θto reinforce the segmentation effect
22
|||| VUC +=
)/(tan 1
UV−
=θ
Pre-processing of
Hand Gesture Recognition
 Skins are between red and yellow
 Transform color pixel point P from
RGB to YUV and YIQ space
 Skin region is:
 105 º <= Θ<= 150 º
 30 <= I <= 100
 Hands and faces
Pre-processing of
Hand Gesture Recognition
Pre-processing of
Hand Gesture Recognition
 On-line video stream containing
hand gestures can be considered
as a signal S(x, y, t)
 (x,y) denotes the image coordinate
 t denotes time
 Convert image from RGB to HIS to
extract intensity signal I(x,y,t)
Pre-processing of
Hand Gesture Recognition
 Based on the representation by YUV
and YIQ, skin pixels can be detected
and form a binary image sequence
M’(x,y,t) – region mask
 Another binary image sequence
M’’(x,y,t) which reflects the motion
information is produced between every
consecutive pair of intensity images –
motion mask
Pre-processing of
Hand Gesture Recognition
 M(x,y,t) delineating the moving skin
region by using logical AND between
the corresponding region mask and
motion mask sequence
Pre-processing of
Hand Gesture Recognition
 Normalization
 Transformed the detection results into
gray-scale images with 36*36 pixels.
Locally Linear Embedding
 Sparse data vs. High dimensional
space
 30 different gestures, 120 samples/gesture
 36*36 pixels
 3600 training samples vs. d = 1296
 Difficult to describe the data distribution
 Reduce the dimensionality of hand gesture
images
Locally Linear Embedding
 Locally Linear Embedding maps the high-
dimensional data to a single global
coordinate system to preserve the
neighbouring relations.
 Given n input vectors {x1, x2, …, xn},
 LLE algorithm
 {y1, y2, …, yn} (m<<d)
m
Ryi∈
d
Rxi∈
Locally Linear Embedding
 Find the k nearest neighbours of each point xi
 Measure reconstruction error from the
approximation of each point by the neighbour
points and compute the reconstruction weights
which minimize the error
 Compute the low-embedding by minimizing an
embedding cost function with the reconstruction
weights
Experiments
 4125 images including all 30 hand
gestures
 60% for training , 40% for testing
 For each image:
 320*240 image, 24b color depth
 Taken from camera with different distance
and orientation
 Sampled at 25 frames/s
Experiment Results
Data # of
Samples
Recognized
Samples
Recognition
Rate (%)
Training 2475 2309 93.3
Testing 1650 1495 90.6
Total 4125 3804 92.2
Conclusion
 Robust against similar postures in
different light conditions and
backgrounds
 Fast detection process, allows the real
time video application with low cost
sensors, such as PC and USB camera
Thank You!
Questions?

non verbal handoff

  • 1.
    A Hand GestureRecognition System Based on Local Linear Embedding Presented by Chang Liu 2006. 3
  • 2.
    Outline  Introduction  CSLand Pre-processing  Locally Linear Embedding  Experiments  Conclusion
  • 3.
    Introduction  Interaction withcomputers are not comfortable experience  Computers should communicate with people with body language.  Hand gesture recognition becomes important  Interactive human-machine interface and virtual environment
  • 4.
    Introduction  Two commontechnologies for hand gesture recognition  glove-based method  Using special glove-based device to extract hand posture  Annoying  vision-based method  3D hand/arm modeling  Appearance modeling
  • 5.
    Introduction  3D hand/armmodeling  Highly computational complexity  Using many approximation process  Appearance modeling  Low computational complexity  Real-time processing
  • 6.
    Introduction  Overview ofalgorithm proposed in the paper  Vision-based method to be used for the problem of CSL real-time recognition  Input: 2D video sequences  two major steps  Hand gesture region detection  Hand gesture recognition
  • 7.
    CSL and Pre-processing Sign Language  Rely on the hearing society  Two main elements:  Low and simple level signed alphabet, mimics the letters of the native spoken language  Higher level signed language, using actions to mimic the meaning or description of the sign
  • 8.
    CSL and Pre-processing CSL is the abbreviation for Chinese Sign Language  30 letters in CSL alphabet  Objects in recognition
  • 9.
    Pre-processing of Hand GestureRecognition  Detection of Hand Gesture Regions  Aim to fix on the valid frames and locate the hand region from the rest of the image.  Low time consuming  fast processing rate  real time speed
  • 10.
    Pre-processing of Hand GestureRecognition  Detect skin region from the rest of the image by using color.  Each color has three components  hue, saturation, and value  chroma consists of hue and saturation is separated from value  Under different condition, chroma is invariant.
  • 11.
    Pre-processing of Hand GestureRecognition  Color is represented in RGB space, also in YUV and YIQ space.  In YUV space  saturation  displacement  hue -> amplitude  In YIQ space  The color saturation cue I is combined with Θto reinforce the segmentation effect 22 |||| VUC += )/(tan 1 UV− =θ
  • 12.
    Pre-processing of Hand GestureRecognition  Skins are between red and yellow  Transform color pixel point P from RGB to YUV and YIQ space  Skin region is:  105 º <= Θ<= 150 º  30 <= I <= 100  Hands and faces
  • 13.
  • 14.
    Pre-processing of Hand GestureRecognition  On-line video stream containing hand gestures can be considered as a signal S(x, y, t)  (x,y) denotes the image coordinate  t denotes time  Convert image from RGB to HIS to extract intensity signal I(x,y,t)
  • 15.
    Pre-processing of Hand GestureRecognition  Based on the representation by YUV and YIQ, skin pixels can be detected and form a binary image sequence M’(x,y,t) – region mask  Another binary image sequence M’’(x,y,t) which reflects the motion information is produced between every consecutive pair of intensity images – motion mask
  • 16.
    Pre-processing of Hand GestureRecognition  M(x,y,t) delineating the moving skin region by using logical AND between the corresponding region mask and motion mask sequence
  • 17.
    Pre-processing of Hand GestureRecognition  Normalization  Transformed the detection results into gray-scale images with 36*36 pixels.
  • 18.
    Locally Linear Embedding Sparse data vs. High dimensional space  30 different gestures, 120 samples/gesture  36*36 pixels  3600 training samples vs. d = 1296  Difficult to describe the data distribution  Reduce the dimensionality of hand gesture images
  • 19.
    Locally Linear Embedding Locally Linear Embedding maps the high- dimensional data to a single global coordinate system to preserve the neighbouring relations.  Given n input vectors {x1, x2, …, xn},  LLE algorithm  {y1, y2, …, yn} (m<<d) m Ryi∈ d Rxi∈
  • 20.
    Locally Linear Embedding Find the k nearest neighbours of each point xi  Measure reconstruction error from the approximation of each point by the neighbour points and compute the reconstruction weights which minimize the error  Compute the low-embedding by minimizing an embedding cost function with the reconstruction weights
  • 21.
    Experiments  4125 imagesincluding all 30 hand gestures  60% for training , 40% for testing  For each image:  320*240 image, 24b color depth  Taken from camera with different distance and orientation  Sampled at 25 frames/s
  • 22.
    Experiment Results Data #of Samples Recognized Samples Recognition Rate (%) Training 2475 2309 93.3 Testing 1650 1495 90.6 Total 4125 3804 92.2
  • 23.
    Conclusion  Robust againstsimilar postures in different light conditions and backgrounds  Fast detection process, allows the real time video application with low cost sensors, such as PC and USB camera
  • 24.