SlideShare a Scribd company logo
1 of 16
Gesture-based Human-computer
Interaction
Alen Stojanov
a.stojanov@jacobs-university.de
Electrical Engineering and Computer Science
Jacobs University Bremen
28759 Bremen, Germany
May 14, 2009
c⃝ Jacobs University Bremen, 2009 – p. 1/16
Contents
Introduction
Prerequisites
Application design and dataflow
Skin Color Segmentation
Skin profile creator
Real-time skin segmentation
Feature Recognition
Gesture Tracking
Discussion and future work
c⃝ Jacobs University Bremen, 2009 – p. 2/16
Introduction
Figure 1: The gesture-based devices available today
In the last decade, new classes of devices for accessing information have emerged
along with increased connectivity. In parallel to the proliferation of these devices, new
interaction styles have been explored. One major problem is the restrictive human
machine interface like gloves, magnetic trackers, and head mounted displays. Such
interfaces with excessive connections are hampering the interactivity.
This presentation introduces a method to develop a robust computer vision module to
capture and recognize the human gestures using a regular video camera and use the full
potential of these gestures in a user friendly graphical environment.
c⃝ Jacobs University Bremen, 2009 – p. 3/16
Prerequisites
(a) OpenGL (b) Qt (c) V4L
OpenGL (Open Graphics Library) is a standard specification defining a
cross-language, cross-platform API for writing applications that produce 2D and 3D
computer graphics. OpenGL routines simplify the development of graphics software.
Qt is a cross-platform application development framework, widely used for the
development of GUI programs and also used for developing non-GUI programs such
as console tools and servers.
Video4Linux or V4L is a video capture application programming interface for Linux. It
is closely inegrated into the Linux kernel and provides support for many USB
(Universal Serial Bus) video input devices.
c⃝ Jacobs University Bremen, 2009 – p. 4/16
Design and dataflow
ImageBuffer: Processed Images
GestureBuffer: Processed Gestures
ImageBuffer: Raw Images
Engine
Thread
Gesture
Thread
CamThread
V4L
V4L2
Raw
Display
Processed
Display
OpenGL
Display
/dev/video* - video driver
Figure 2: Design and dataflow in GMove
The OpenGL architecture is structured as a state-based pipeline. A similar approach
is used in the implementation of GMove, such that a rasterized image is being
vectorized with the purpose of extracting the semantic of the image.
A highly threaded aproach has been used to synhronize and avoid delayed gestures
obtained from the input stream The data share between the threads is stored in
special buffers, the command control is done via callback functions provided by Qt.
c⃝ Jacobs University Bremen, 2009 – p. 5/16
Design and dataflow (2)
The system is consisted of 3 threads:
CamThread – responsible for reading the video input stream directly from the driver
EngineThread – this module receives the raw image input from the camera,
processes the image and parses the position of the hands and their state.
GestureThread – the hand gestures obtained from the EngineThread are used
in a process where the transitions from the previous image, stored in the gesture
buffer, are mapped to the new image frame and an appropriate action is being
invoked accordingly.
The data share is given with the following buffers:
RawImage – stores the input stream obtained in the CamThread.
ProcessedImage – stores the processed output generated from the
EngineThread
GestureBuffer – stores the gestures provided by the GestureThread
c⃝ Jacobs University Bremen, 2009 – p. 6/16
Skin color segmentation
The RGB (Red, Gree and Blue) color space alone is not reliable for identifying
skin-colored pixels, it represents not only color but also luminance
HSV (Hue, Saturation, and Value) and HSI (Hue, Saturation and Intensity) color
space models emulate the colors in the way that humans perceive them; therefore
the luminance factor in the skin determination can be minimized.
Charged Coupled Devices (CCD) or Complementary Metal Oxide Semiconductor
(CMOS) sensors are used in the most video input devices. There are major
differences in the operation of these sensors during daylight and during artificial
light, especially in the exposure time. The prolonged exposure time, required during
artificial light, is a source of image noise and this noise should be taking into
consideration in the process of skin segmentation.
A “skin profile generator” is being implemented as skin color classifier to handle
different light modes and different race colors.
c⃝ Jacobs University Bremen, 2009 – p. 7/16
Skin color segmentation (2)
0
50
100
150
200
250
300
350 0
20
40
60
80
100
0
20
40
60
80
100
(a) Daylight
0
50
100
150
200
250
300
350 0
20
40
60
80
100
0
20
40
60
80
100
(b) Artifical light
Figure 3: 3D Histogram of skin pixel frequencies in HSI color space
The perception of the video input device is quite different during daylight and artifical
light.
c⃝ Jacobs University Bremen, 2009 – p. 8/16
Skin profile creator
Figure 4: Skin profile creator
When the user clicks on a certain part of the acquired image, a flood fill algorithm is
used to traverse through the skin pixels with a low threshold in the color pixel
difference. The top 20 most frequent pixels are then written in the user profile.
c⃝ Jacobs University Bremen, 2009 – p. 9/16
Real-time skin segmentation
Figure 5: Skin color segmentation
The real-time skin segmentation is a brute-force method, applied to all the pixel of
the captured image frame.
The pixels are then compared to the 20 values stored in the user profile. If the pixels
are in the range defined by the threshold, then the pixel is classified as a skin,
otherwise it is ignored.
Noise is being removed using a discreet Gaussian filter.
c⃝ Jacobs University Bremen, 2009 – p. 10/16
Real-time skin segmentation
Gaussian function has the form:
G(x, y) =
1
2πσ2
e
− x2+y2
2σ2
the Gaussian smoothing can be performed using standard convolution methods. A
convolution mask is usually much smaller than the actual image. For example a
convolution mask with σ = 1.4 and mask width k = 4 has the following form:
M(σ, k) =
1
115
0
B
B
B
B
B
B
B
@
2 4 5 4 2
4 9 12 9 4
5 12 15 12 5
4 9 12 9 4
2 4 5 4 2
1
C
C
C
C
C
C
C
A
The larger the width of the Gaussian mask, the lower is the sensitivity to noise and the
higher is the computing complexity.
c⃝ Jacobs University Bremen, 2009 – p. 11/16
Feature Recognition
Figure 6: Distinct regions and bounding box
Two or more hands are allowed to be captured by the video stream. The feature
recognition is applied to every hand separately.
Flood-fill algorythms is used for identifying the the distinct hand regions. The
bounding box and the weighted center of the regions is determined.
If there is intersection between two bounding boxes, such that the center of one of
the boxes lies inside the other bounding box, the boxes are merged together.
c⃝ Jacobs University Bremen, 2009 – p. 12/16
Feature Recognition(2)
Figure 7: Hand state using concentrical circles
The hands can have any orientation in the video input stream and the method to
determine the state of the hand should be robust towards all cases.
A geometrical model is generated by analyzing the skin pixels intersections when
concentric circles are drawn in each of the bounding boxes.
The far most intersection points define the fingertips of the hand.
c⃝ Jacobs University Bremen, 2009 – p. 13/16
Gesture Tracking
The recognized feature will only denote the state of the hand, and the hand movement
can denote the amount of action that is influenced to the current state. In the current
implementation of the feature extraction, the system is able to differentiate two hand
states:
Fist (the palm has all the fingers “inside”) which denotes the idle mode
The fingers of the palm are stretched outside which denotes the active mode
For every newly obtained image frame, a match of the hand bounging box with the
previous frames is being performed. The matching is defined with the following criteria:
Distance between the weighted centers of two bounding boxes
The difference of the amount of skin pixels between the bounding boxes
The difference of the size of the bounding box
With this implementation, the system is able to issue commands to move the scene in
every direction, as well as zoom and and zoom out.
c⃝ Jacobs University Bremen, 2009 – p. 14/16
Discusion and future work
Developing a system such as GMove is a challenging task, especially for all the
technical implementations required to get started and create a basic environment
where the logic of the program is applied.
GMove is able to understand basic hand gestures, is able to track this gestures in
real-time, and has a linear computing complexity.
Although the application has already a handful of nifty features to offer, it is by no
means a full–fledged system able to be used in the daily human to computer
communication.
The ultimate challenge of this project would be a design of a very intuitive human
interaction system, available in the futuristic science fiction movies
One idea of further development is extending the system to use two or more
cameras, such that a stereoscopic view is created and a real 3D model of the hand
is being created from the input.
c⃝ Jacobs University Bremen, 2009 – p. 15/16
Acknowledgment
I would like to express my gratitude to my supervisor, Prof. Dr. Lars Linsen who was
abundantly helpful and offered invaluable assistance, support and guidance. This
research project would not have been possible without his support.
Deepest gratitude to Prof. Dr. Heinrich Stamerjohanns, his knowledge, advices and
discussions invaluably shaped my course of work and study.
Any questions, please?
c⃝ Jacobs University Bremen, 2009 – p. 16/16

More Related Content

What's hot

Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkijcga
 
Synthetic training data for deep cn ns in reidentification
Synthetic training data for deep cn ns in reidentificationSynthetic training data for deep cn ns in reidentification
Synthetic training data for deep cn ns in reidentificationAbdulrahman Kerim
 
Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network ijcga
 
Intelligent indoor mobile robot navigation using stereo vision
Intelligent indoor mobile robot navigation using stereo visionIntelligent indoor mobile robot navigation using stereo vision
Intelligent indoor mobile robot navigation using stereo visionsipij
 
Final Year IEEE Project 2013-2014 - Digital Image Processing Project Title a...
Final Year IEEE Project 2013-2014  - Digital Image Processing Project Title a...Final Year IEEE Project 2013-2014  - Digital Image Processing Project Title a...
Final Year IEEE Project 2013-2014 - Digital Image Processing Project Title a...elysiumtechnologies
 
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Darius Burschka
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...ijcsa
 
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
 
SLAM Zero to One
SLAM Zero to OneSLAM Zero to One
SLAM Zero to OneGavin Gao
 
Introduction to computer vision and
Introduction to computer vision andIntroduction to computer vision and
Introduction to computer vision andcodeprogramming
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...c.choi
 
Ieee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image ProcessingIeee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image ProcessingK Sundaresh Ka
 

What's hot (20)

Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
 
Synthetic training data for deep cn ns in reidentification
Synthetic training data for deep cn ns in reidentificationSynthetic training data for deep cn ns in reidentification
Synthetic training data for deep cn ns in reidentification
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
 
Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network
 
06 robot vision
06 robot vision06 robot vision
06 robot vision
 
Intelligent indoor mobile robot navigation using stereo vision
Intelligent indoor mobile robot navigation using stereo visionIntelligent indoor mobile robot navigation using stereo vision
Intelligent indoor mobile robot navigation using stereo vision
 
Final Year IEEE Project 2013-2014 - Digital Image Processing Project Title a...
Final Year IEEE Project 2013-2014  - Digital Image Processing Project Title a...Final Year IEEE Project 2013-2014  - Digital Image Processing Project Title a...
Final Year IEEE Project 2013-2014 - Digital Image Processing Project Title a...
 
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
 
Stereo vision
Stereo visionStereo vision
Stereo vision
 
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
 
SLAM Zero to One
SLAM Zero to OneSLAM Zero to One
SLAM Zero to One
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
Tele-immersion
Tele-immersionTele-immersion
Tele-immersion
 
Introduction to computer vision and
Introduction to computer vision andIntroduction to computer vision and
Introduction to computer vision and
 
CGV 18CS62 VTU CSE
CGV 18CS62 VTU CSECGV 18CS62 VTU CSE
CGV 18CS62 VTU CSE
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
 
Ieee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image ProcessingIeee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image Processing
 
Video processing on dsp
Video processing on dspVideo processing on dsp
Video processing on dsp
 

Similar to Gesture-based Human-computer Interaction

Scanning 3 d full human bodies using kinects
Scanning 3 d full human bodies using kinectsScanning 3 d full human bodies using kinects
Scanning 3 d full human bodies using kinectsFensa Saj
 
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSETIMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSETIJCSEA Journal
 
Hand gesture recognition using support vector machine
Hand gesture recognition using support vector machineHand gesture recognition using support vector machine
Hand gesture recognition using support vector machinetheijes
 
Introduction to Machine Vision
Introduction to Machine VisionIntroduction to Machine Vision
Introduction to Machine VisionNasir Jumani
 
An efficient color image compression technique
An efficient color image compression techniqueAn efficient color image compression technique
An efficient color image compression techniqueTELKOMNIKA JOURNAL
 
Computer Based Human Gesture Recognition With Study Of Algorithms
Computer Based Human Gesture Recognition With Study Of AlgorithmsComputer Based Human Gesture Recognition With Study Of Algorithms
Computer Based Human Gesture Recognition With Study Of AlgorithmsIOSR Journals
 
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...mlaij
 
MOUSE SIMULATION USING NON MAXIMUM SUPPRESSION
MOUSE SIMULATION USING NON MAXIMUM SUPPRESSIONMOUSE SIMULATION USING NON MAXIMUM SUPPRESSION
MOUSE SIMULATION USING NON MAXIMUM SUPPRESSIONIRJET Journal
 
Gesture Recognition Based Mouse Events
Gesture Recognition Based Mouse EventsGesture Recognition Based Mouse Events
Gesture Recognition Based Mouse Eventsijcsit
 
Depth-DensePose: an efficient densely connected deep learning model for came...
Depth-DensePose: an efficient densely connected deep learning  model for came...Depth-DensePose: an efficient densely connected deep learning  model for came...
Depth-DensePose: an efficient densely connected deep learning model for came...IJECEIAES
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
 
Motion capture document
Motion capture documentMotion capture document
Motion capture documentharini501
 
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
A Novel Background Subtraction Algorithm for Dynamic Texture ScenesA Novel Background Subtraction Algorithm for Dynamic Texture Scenes
A Novel Background Subtraction Algorithm for Dynamic Texture ScenesIJMER
 
Sign Language Recognition using Machine Learning
Sign Language Recognition using Machine LearningSign Language Recognition using Machine Learning
Sign Language Recognition using Machine LearningIRJET Journal
 
Pedestrian detection under weather conditions using conditional generative ad...
Pedestrian detection under weather conditions using conditional generative ad...Pedestrian detection under weather conditions using conditional generative ad...
Pedestrian detection under weather conditions using conditional generative ad...IAESIJAI
 
Traffic Automation System
Traffic Automation SystemTraffic Automation System
Traffic Automation SystemPrabal Chauhan
 
Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
 
Design and implementation of video tracking system based on camera field of view
Design and implementation of video tracking system based on camera field of viewDesign and implementation of video tracking system based on camera field of view
Design and implementation of video tracking system based on camera field of viewsipij
 

Similar to Gesture-based Human-computer Interaction (20)

Scanning 3 d full human bodies using kinects
Scanning 3 d full human bodies using kinectsScanning 3 d full human bodies using kinects
Scanning 3 d full human bodies using kinects
 
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSETIMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
 
Hand gesture recognition using support vector machine
Hand gesture recognition using support vector machineHand gesture recognition using support vector machine
Hand gesture recognition using support vector machine
 
Introduction to Machine Vision
Introduction to Machine VisionIntroduction to Machine Vision
Introduction to Machine Vision
 
An efficient color image compression technique
An efficient color image compression techniqueAn efficient color image compression technique
An efficient color image compression technique
 
Computer Based Human Gesture Recognition With Study Of Algorithms
Computer Based Human Gesture Recognition With Study Of AlgorithmsComputer Based Human Gesture Recognition With Study Of Algorithms
Computer Based Human Gesture Recognition With Study Of Algorithms
 
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
 
MOUSE SIMULATION USING NON MAXIMUM SUPPRESSION
MOUSE SIMULATION USING NON MAXIMUM SUPPRESSIONMOUSE SIMULATION USING NON MAXIMUM SUPPRESSION
MOUSE SIMULATION USING NON MAXIMUM SUPPRESSION
 
Tele immersion | 2021
Tele immersion | 2021Tele immersion | 2021
Tele immersion | 2021
 
Gesture Recognition Based Mouse Events
Gesture Recognition Based Mouse EventsGesture Recognition Based Mouse Events
Gesture Recognition Based Mouse Events
 
Depth-DensePose: an efficient densely connected deep learning model for came...
Depth-DensePose: an efficient densely connected deep learning  model for came...Depth-DensePose: an efficient densely connected deep learning  model for came...
Depth-DensePose: an efficient densely connected deep learning model for came...
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGA
 
Motion capture document
Motion capture documentMotion capture document
Motion capture document
 
50620130101001
5062013010100150620130101001
50620130101001
 
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
A Novel Background Subtraction Algorithm for Dynamic Texture ScenesA Novel Background Subtraction Algorithm for Dynamic Texture Scenes
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
 
Sign Language Recognition using Machine Learning
Sign Language Recognition using Machine LearningSign Language Recognition using Machine Learning
Sign Language Recognition using Machine Learning
 
Pedestrian detection under weather conditions using conditional generative ad...
Pedestrian detection under weather conditions using conditional generative ad...Pedestrian detection under weather conditions using conditional generative ad...
Pedestrian detection under weather conditions using conditional generative ad...
 
Traffic Automation System
Traffic Automation SystemTraffic Automation System
Traffic Automation System
 
Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...
 
Design and implementation of video tracking system based on camera field of view
Design and implementation of video tracking system based on camera field of viewDesign and implementation of video tracking system based on camera field of view
Design and implementation of video tracking system based on camera field of view
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Gesture-based Human-computer Interaction

  • 1. Gesture-based Human-computer Interaction Alen Stojanov a.stojanov@jacobs-university.de Electrical Engineering and Computer Science Jacobs University Bremen 28759 Bremen, Germany May 14, 2009 c⃝ Jacobs University Bremen, 2009 – p. 1/16
  • 2. Contents Introduction Prerequisites Application design and dataflow Skin Color Segmentation Skin profile creator Real-time skin segmentation Feature Recognition Gesture Tracking Discussion and future work c⃝ Jacobs University Bremen, 2009 – p. 2/16
  • 3. Introduction Figure 1: The gesture-based devices available today In the last decade, new classes of devices for accessing information have emerged along with increased connectivity. In parallel to the proliferation of these devices, new interaction styles have been explored. One major problem is the restrictive human machine interface like gloves, magnetic trackers, and head mounted displays. Such interfaces with excessive connections are hampering the interactivity. This presentation introduces a method to develop a robust computer vision module to capture and recognize the human gestures using a regular video camera and use the full potential of these gestures in a user friendly graphical environment. c⃝ Jacobs University Bremen, 2009 – p. 3/16
  • 4. Prerequisites (a) OpenGL (b) Qt (c) V4L OpenGL (Open Graphics Library) is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics. OpenGL routines simplify the development of graphics software. Qt is a cross-platform application development framework, widely used for the development of GUI programs and also used for developing non-GUI programs such as console tools and servers. Video4Linux or V4L is a video capture application programming interface for Linux. It is closely inegrated into the Linux kernel and provides support for many USB (Universal Serial Bus) video input devices. c⃝ Jacobs University Bremen, 2009 – p. 4/16
  • 5. Design and dataflow ImageBuffer: Processed Images GestureBuffer: Processed Gestures ImageBuffer: Raw Images Engine Thread Gesture Thread CamThread V4L V4L2 Raw Display Processed Display OpenGL Display /dev/video* - video driver Figure 2: Design and dataflow in GMove The OpenGL architecture is structured as a state-based pipeline. A similar approach is used in the implementation of GMove, such that a rasterized image is being vectorized with the purpose of extracting the semantic of the image. A highly threaded aproach has been used to synhronize and avoid delayed gestures obtained from the input stream The data share between the threads is stored in special buffers, the command control is done via callback functions provided by Qt. c⃝ Jacobs University Bremen, 2009 – p. 5/16
  • 6. Design and dataflow (2) The system is consisted of 3 threads: CamThread – responsible for reading the video input stream directly from the driver EngineThread – this module receives the raw image input from the camera, processes the image and parses the position of the hands and their state. GestureThread – the hand gestures obtained from the EngineThread are used in a process where the transitions from the previous image, stored in the gesture buffer, are mapped to the new image frame and an appropriate action is being invoked accordingly. The data share is given with the following buffers: RawImage – stores the input stream obtained in the CamThread. ProcessedImage – stores the processed output generated from the EngineThread GestureBuffer – stores the gestures provided by the GestureThread c⃝ Jacobs University Bremen, 2009 – p. 6/16
  • 7. Skin color segmentation The RGB (Red, Gree and Blue) color space alone is not reliable for identifying skin-colored pixels, it represents not only color but also luminance HSV (Hue, Saturation, and Value) and HSI (Hue, Saturation and Intensity) color space models emulate the colors in the way that humans perceive them; therefore the luminance factor in the skin determination can be minimized. Charged Coupled Devices (CCD) or Complementary Metal Oxide Semiconductor (CMOS) sensors are used in the most video input devices. There are major differences in the operation of these sensors during daylight and during artificial light, especially in the exposure time. The prolonged exposure time, required during artificial light, is a source of image noise and this noise should be taking into consideration in the process of skin segmentation. A “skin profile generator” is being implemented as skin color classifier to handle different light modes and different race colors. c⃝ Jacobs University Bremen, 2009 – p. 7/16
  • 8. Skin color segmentation (2) 0 50 100 150 200 250 300 350 0 20 40 60 80 100 0 20 40 60 80 100 (a) Daylight 0 50 100 150 200 250 300 350 0 20 40 60 80 100 0 20 40 60 80 100 (b) Artifical light Figure 3: 3D Histogram of skin pixel frequencies in HSI color space The perception of the video input device is quite different during daylight and artifical light. c⃝ Jacobs University Bremen, 2009 – p. 8/16
  • 9. Skin profile creator Figure 4: Skin profile creator When the user clicks on a certain part of the acquired image, a flood fill algorithm is used to traverse through the skin pixels with a low threshold in the color pixel difference. The top 20 most frequent pixels are then written in the user profile. c⃝ Jacobs University Bremen, 2009 – p. 9/16
  • 10. Real-time skin segmentation Figure 5: Skin color segmentation The real-time skin segmentation is a brute-force method, applied to all the pixel of the captured image frame. The pixels are then compared to the 20 values stored in the user profile. If the pixels are in the range defined by the threshold, then the pixel is classified as a skin, otherwise it is ignored. Noise is being removed using a discreet Gaussian filter. c⃝ Jacobs University Bremen, 2009 – p. 10/16
  • 11. Real-time skin segmentation Gaussian function has the form: G(x, y) = 1 2πσ2 e − x2+y2 2σ2 the Gaussian smoothing can be performed using standard convolution methods. A convolution mask is usually much smaller than the actual image. For example a convolution mask with σ = 1.4 and mask width k = 4 has the following form: M(σ, k) = 1 115 0 B B B B B B B @ 2 4 5 4 2 4 9 12 9 4 5 12 15 12 5 4 9 12 9 4 2 4 5 4 2 1 C C C C C C C A The larger the width of the Gaussian mask, the lower is the sensitivity to noise and the higher is the computing complexity. c⃝ Jacobs University Bremen, 2009 – p. 11/16
  • 12. Feature Recognition Figure 6: Distinct regions and bounding box Two or more hands are allowed to be captured by the video stream. The feature recognition is applied to every hand separately. Flood-fill algorythms is used for identifying the the distinct hand regions. The bounding box and the weighted center of the regions is determined. If there is intersection between two bounding boxes, such that the center of one of the boxes lies inside the other bounding box, the boxes are merged together. c⃝ Jacobs University Bremen, 2009 – p. 12/16
  • 13. Feature Recognition(2) Figure 7: Hand state using concentrical circles The hands can have any orientation in the video input stream and the method to determine the state of the hand should be robust towards all cases. A geometrical model is generated by analyzing the skin pixels intersections when concentric circles are drawn in each of the bounding boxes. The far most intersection points define the fingertips of the hand. c⃝ Jacobs University Bremen, 2009 – p. 13/16
  • 14. Gesture Tracking The recognized feature will only denote the state of the hand, and the hand movement can denote the amount of action that is influenced to the current state. In the current implementation of the feature extraction, the system is able to differentiate two hand states: Fist (the palm has all the fingers “inside”) which denotes the idle mode The fingers of the palm are stretched outside which denotes the active mode For every newly obtained image frame, a match of the hand bounging box with the previous frames is being performed. The matching is defined with the following criteria: Distance between the weighted centers of two bounding boxes The difference of the amount of skin pixels between the bounding boxes The difference of the size of the bounding box With this implementation, the system is able to issue commands to move the scene in every direction, as well as zoom and and zoom out. c⃝ Jacobs University Bremen, 2009 – p. 14/16
  • 15. Discusion and future work Developing a system such as GMove is a challenging task, especially for all the technical implementations required to get started and create a basic environment where the logic of the program is applied. GMove is able to understand basic hand gestures, is able to track this gestures in real-time, and has a linear computing complexity. Although the application has already a handful of nifty features to offer, it is by no means a full–fledged system able to be used in the daily human to computer communication. The ultimate challenge of this project would be a design of a very intuitive human interaction system, available in the futuristic science fiction movies One idea of further development is extending the system to use two or more cameras, such that a stereoscopic view is created and a real 3D model of the hand is being created from the input. c⃝ Jacobs University Bremen, 2009 – p. 15/16
  • 16. Acknowledgment I would like to express my gratitude to my supervisor, Prof. Dr. Lars Linsen who was abundantly helpful and offered invaluable assistance, support and guidance. This research project would not have been possible without his support. Deepest gratitude to Prof. Dr. Heinrich Stamerjohanns, his knowledge, advices and discussions invaluably shaped my course of work and study. Any questions, please? c⃝ Jacobs University Bremen, 2009 – p. 16/16