The document describes a real-time interactive presentation system that uses hand gestures to control slides. The system employs a thermal camera to segment the presenter's body despite varying lighting conditions from the projector. It then localizes the presenter's head, torso, and arm to finally localize the hand. Hand trajectories are recognized as gestures to perform interactions like changing slides. A dual-camera calibration process maps the interaction regions between the thermal camera and projected slides. The system achieves high gesture recognition accuracy for interactive presentations.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
TURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODELcscpconf
In past years, there were a lot of researches made in order to provide more accurate and
comfortable interaction between human and machine. Developing a system which recognizes
human gestures, is an important study to improve interaction between human and machine.
Sign language is a way of communication for hearing-impaired people which enables them to
communicate among themselves and with other people around them. Sign language consists of
hand gestures and facial expressions. During the past 20 years, researches were made to
facilitate communication of hearing-impaired people with others.
Sign language recognition systems are designed in various countries. This paper presents a sign
language recognition system, which uses Kinect camera to obtain skeletal model. Our aim was
to recognize expressions, which are used widely in Turkish Sign Language (TSL). For that
purpose we have selected 15 words/expressions randomly (repeated 4 times each by 3 different
signers) which belong to Turkish Sign Language. We have used 180 records in total. Videos are
recorded using Microsoft Kinect Camera and Nui Capture. Joint angles and joint positions have
been used as features of gesture and achieved close to 100% recognition rates.
Turkish Sign Language Recognition Using Hidden Markov Model csandit
In past years, there were a lot of researches made in order to provide more accurate and comfortable interaction between human and machine. Developing a system which recognizes human gestures, is an important study to improve interaction between human and machine.
Sign language is a way of communication for hearing-impaired people which enables them to communicate among themselves and with other people around them. Sign language consists of hand gestures and facial expressions. During the past 20 years, researches were made to facilitate communication of hearing-impaired people with others.
Sign language recognition systems are designed in various countries. This paper presents a sign language recognition system, which uses Kinect camera to obtain skeletal model. Our aim was to recognize expressions, which are used widely in Turkish Sign Language (TSL). For that purpose we have selected 15 words/expressions randomly (repeated 4 times each by 3 different signers) which belong to Turkish Sign Language. We have used 180 records in total. Videos are recorded using Microsoft Kinect Camera and Nui Capture. Joint angles and joint positions have been used as features of gesture and achieved close to 100% recognition rates.
An algorithm to quantify the swelling by reconstructing 3D model of the face with stereo images is presented. We
analyzed the primary problems in computational stereo, which include correspondence and depth calculation. Work has been carried out to determine suitable methods for depth estimation and standardizing volume estimations. Finally we designed software for reconstructing 3D images from 2D stereo images, which was built on Matlab and Visual C++. Utilizing
techniques from multi-view geometry, a 3D model of the face was constructed and refined. An explicit analysis of the stereo
disparity calculation methods and filter elimination disparity estimation for increasing reliability of the disparity map was
used. Minimizing variability in position by using more precise positioning techniques and resources will increase the accuracy of this technique and is a focus for future work
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A novel approach to face recognition based on thermal imagingeSAT Journals
Abstract A novel approach to face recognition system based on thermal image is presented. The thermal images are captured using Thermal Mid Wave Infrared (MWIR) camera system at various temperature to take into consideration of subtle variations that may occur over time. The vasculature information from the image is extracted and used for matching process. Four images are used for registration process. The image registration is done with the Rigid Body Image Registration tool. Before registering Gabor filter is applied to the images and then filtered images are fused and registered. Face segmentation is done by localized region based active counters. The anisotropic diffusion filter is applied to an image and fused with the segmented image. The signature is then generated from the segmented image. Then templates are generated and stored for matching purpose. To perform authenticate the signatures are generated first and matched with the templates stored in the system Keywords – Thermal image, Mid Wave Infrared (MWIR) camera, Rigid Body Image Registration tool, Parona-malik anisotrophic filter, Face segmentation.
A Study on Sparse Representation and Optimal Algorithms in Intelligent Comput...MangaiK4
Abstract -Computer vision is a dynamic research field which involves analyzing, modifying, and high-level understanding of images. Its goal is to determine what is happening in front of a camera and use the facts understood to control a computer or a robot, or to provide the users with new images that are more informative or esthetical pleasing than the original camera images. It uses many advanced techniques in image representation to obtain efficiency in computation. Sparse signal representation techniqueshave significant impact in computer vision, where the goal is to obtain a compact high-fidelity representation of the input signal and to extract meaningful information. Segmentation and optimal parallel processing algorithms are expected to further improve the efficiency and speed up in processing.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
TURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODELcscpconf
In past years, there were a lot of researches made in order to provide more accurate and
comfortable interaction between human and machine. Developing a system which recognizes
human gestures, is an important study to improve interaction between human and machine.
Sign language is a way of communication for hearing-impaired people which enables them to
communicate among themselves and with other people around them. Sign language consists of
hand gestures and facial expressions. During the past 20 years, researches were made to
facilitate communication of hearing-impaired people with others.
Sign language recognition systems are designed in various countries. This paper presents a sign
language recognition system, which uses Kinect camera to obtain skeletal model. Our aim was
to recognize expressions, which are used widely in Turkish Sign Language (TSL). For that
purpose we have selected 15 words/expressions randomly (repeated 4 times each by 3 different
signers) which belong to Turkish Sign Language. We have used 180 records in total. Videos are
recorded using Microsoft Kinect Camera and Nui Capture. Joint angles and joint positions have
been used as features of gesture and achieved close to 100% recognition rates.
Turkish Sign Language Recognition Using Hidden Markov Model csandit
In past years, there were a lot of researches made in order to provide more accurate and comfortable interaction between human and machine. Developing a system which recognizes human gestures, is an important study to improve interaction between human and machine.
Sign language is a way of communication for hearing-impaired people which enables them to communicate among themselves and with other people around them. Sign language consists of hand gestures and facial expressions. During the past 20 years, researches were made to facilitate communication of hearing-impaired people with others.
Sign language recognition systems are designed in various countries. This paper presents a sign language recognition system, which uses Kinect camera to obtain skeletal model. Our aim was to recognize expressions, which are used widely in Turkish Sign Language (TSL). For that purpose we have selected 15 words/expressions randomly (repeated 4 times each by 3 different signers) which belong to Turkish Sign Language. We have used 180 records in total. Videos are recorded using Microsoft Kinect Camera and Nui Capture. Joint angles and joint positions have been used as features of gesture and achieved close to 100% recognition rates.
An algorithm to quantify the swelling by reconstructing 3D model of the face with stereo images is presented. We
analyzed the primary problems in computational stereo, which include correspondence and depth calculation. Work has been carried out to determine suitable methods for depth estimation and standardizing volume estimations. Finally we designed software for reconstructing 3D images from 2D stereo images, which was built on Matlab and Visual C++. Utilizing
techniques from multi-view geometry, a 3D model of the face was constructed and refined. An explicit analysis of the stereo
disparity calculation methods and filter elimination disparity estimation for increasing reliability of the disparity map was
used. Minimizing variability in position by using more precise positioning techniques and resources will increase the accuracy of this technique and is a focus for future work
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A novel approach to face recognition based on thermal imagingeSAT Journals
Abstract A novel approach to face recognition system based on thermal image is presented. The thermal images are captured using Thermal Mid Wave Infrared (MWIR) camera system at various temperature to take into consideration of subtle variations that may occur over time. The vasculature information from the image is extracted and used for matching process. Four images are used for registration process. The image registration is done with the Rigid Body Image Registration tool. Before registering Gabor filter is applied to the images and then filtered images are fused and registered. Face segmentation is done by localized region based active counters. The anisotropic diffusion filter is applied to an image and fused with the segmented image. The signature is then generated from the segmented image. Then templates are generated and stored for matching purpose. To perform authenticate the signatures are generated first and matched with the templates stored in the system Keywords – Thermal image, Mid Wave Infrared (MWIR) camera, Rigid Body Image Registration tool, Parona-malik anisotrophic filter, Face segmentation.
A Study on Sparse Representation and Optimal Algorithms in Intelligent Comput...MangaiK4
Abstract -Computer vision is a dynamic research field which involves analyzing, modifying, and high-level understanding of images. Its goal is to determine what is happening in front of a camera and use the facts understood to control a computer or a robot, or to provide the users with new images that are more informative or esthetical pleasing than the original camera images. It uses many advanced techniques in image representation to obtain efficiency in computation. Sparse signal representation techniqueshave significant impact in computer vision, where the goal is to obtain a compact high-fidelity representation of the input signal and to extract meaningful information. Segmentation and optimal parallel processing algorithms are expected to further improve the efficiency and speed up in processing.
A Study on Sparse Representation and Optimal Algorithms in Intelligent Comput...MangaiK4
Abstract -Computer vision is a dynamic research field which involves analyzing, modifying, and high-level understanding of images. Its goal is to determine what is happening in front of a camera and use the facts understood to control a computer or a robot, or to provide the users with new images that are more informative or esthetical pleasing than the original camera images. It uses many advanced techniques in image representation to obtain efficiency in computation. Sparse signal representation techniqueshave significant impact in computer vision, where the goal is to obtain a compact high-fidelity representation of the input signal and to extract meaningful information. Segmentation and optimal parallel processing algorithms are expected to further improve the efficiency and speed up in processing.
CHARACTERIZING HUMAN BEHAVIOURS USING STATISTICAL MOTION DESCRIPTORsipij
Identifying human behaviors is a challenging research problem due to the complexity and variation of
appearances and postures, the variation of camera settings, and view angles. In this paper, we try to
address the problem of human behavior identification by introducing a novel motion descriptor based on
statistical features. The method first divide the video into N number of temporal segments. Then for each
segment, we compute dense optical flow, which provides instantaneous velocity information for all the
pixels. We then compute Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32
bins. We then compute statistical features from the obtained HOOF forming a descriptor vector of 192- dimensions. We then train a non-linear multi-class SVM that classify dif erent human behaviors with the
accuracy of 72.1%. We evaluate our method by using publicly available human action data set. Experimental results shows that our proposed method out performs state of the art methods.
Nowadays crowd analysis, essential factor about decision management of brand strategy, is not a controllable field by individuals. Therefore a technology, software products is needed. In this paper we focused on what we have done about crowd analysis and examination the problem of human detection with fish-eye lenses cameras. In order to identify human density, one of the machine learning algorithm, which is Haar Classification algorithm, is used to distinguish human body under different conditions. First, motion analysis is used to search for meaningful data, and then the desired object is detected by the trained classifier. Significant data has been sent to the end user via socket programming and human density analysis is presented.
Nowadays crowd analysis, essential factor about decision management of brand strategy, is not a
controllable field by individuals. Therefore a technology, software products is needed. In this paper we
focused on what we have done about crowd analysis and examination the problem of human detection with
fish-eye lenses cameras. In order to identify human density, one of the machine learning algorithm, which
is Haar Classification algorithm, is used to distinguish human body under different conditions. First,
motion analysis is used to search for meaningful data, and then the desired object is detected by the trained
classifier. Significant data has been sent to the end user via socket programming and human density
analysis is presented.
HUMAN BODY DETECTION AND SAFETY CARE SYSTEM FOR A FLYING ROBOTcscpconf
Image-processing is one the challenging issue in robotic as well as electrical engineering research contexts. This study proposes a system for extract and tracking objects by a quadcopter’s flying robot and how to extract the human body. It is observed in image taken from real-time camera that is embedded bottom of the quadcopter, there is a variance in human behaviour being tracked or recorded such as position and, size, of the human. In the regard, the paper tries to investigate an image-processing method for tracking humans’ body, concurrently .For this process, an extraction method, which defines features to istinguish a human body, is proposed. The proposed method creates a virtual shape of bodies for recognizing the body of
humans, also, generate an extractor according to its edge information. This method shows better performance in term of precision as well as speed experimentally.
Obstacle detection for autonomous systems using stereoscopic images and bacte...IJECEIAES
This paper presents a low cost strategy for real-time estimation of the position of ob- stacles in an unknown environment for autonomous robots. The strategy was intended for use in autonomous service robots, which navigate in unknown and dynamic indoor environments. In addition to human interaction, these environments are characterized by a design created for the human being, which is why our developments seek morphological and functional similarity equivalent to the human model. We use a pair of cameras on our robot to achieve a stereoscopic vision of the environment, and we analyze this information to determine the distance to obstacles using an algorithm that mimics bacterial behavior. The algorithm was evaluated on our robotic platform demonstrating high performance in the location of obstacles and real-time operation.
AN ALGORITHM TO DETECT DRIVER’S DROWSINESS BASED ON NODDING BEHAVIOURijscmcj
Driver’s drowsiness is one of the major causes of serious accidents in road traffic. Thus, special effort in
searching for better assistant technology has been paid. However, several existing approaches fail to work
effectively as the head of a drowsy driver is usually in slanting state. Moreover, the shaking of vehicle or
the driver’s winking even makes the problem much more complicated. Anyway, head bend posture also
signifies a drowsy state. Consequently, this paper proposes a novel approach by considering head nodding
behaviour as an input in our detection model. After detecting a human face, some significant facial features
are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is
evaluated based on these thresholds. In our empirical experiments, the proposed algorithm can successfully
and accurately detect 96.56% of cases.
Driver's drowsiness is one of the major causes of serious accidents in road traffic. Thus, special effort in searching for better assistant technology has been paid. However, several existing approaches fail to work effectively as the head of a drowsy driver is usually in slanting state. Moreover, the shaking of vehicle or the driver's winking even makes the problem much more complicated. Anyway, head bend posture also signifies a drowsy state. Consequently, this paper proposes a novel approach by considering head nodding behavior as an input in our detection model. After detecting a human face, some significant facial features are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is
evaluated based on these thresholds. In our empirical experiments, the proposed algorithm can successfully
and accurately detect 96.56% of cases.
Driver’s drowsiness is one of the major causes of serious accidents in road traffic. Thus, special effort in searching for better assistant technology has been paid. However, several existing approaches fail to work effectively as the head of a drowsy driver is usually in slanting state. Moreover, the shaking of vehicle or the driver’s winking even makes the problem much more complicated. Anyway, head bend posture also signifies a drowsy state. Consequently, this paper proposes a novel approach by considering head nodding behaviour as an input in our detection model. After detecting a human face, some significant facial features are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is evaluated based on these thresholds. In our empirical experiments, the proposed algorithm can successfully and accurately detect 96.56% of cases.
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...csandit
This paper proposes a robust tracking method which concatenates appearance and geometrical
features to re-identify human in non-overlapping views. A uniformly-partitioning method is
proposed to extract local HSV(Hue, Saturation, Value) color features in upper and lower
portion of clothing. Then adaptive principal view selecting algorithm is presented to locate
principal view which contains maximum appearance feature dimensions captured from different
visual angles. For each appearance feature dimension in principal view, all its inner frames get
involved in training a support vector machine (SVM). In matching process, human candidate
filtering is first operated with an integrated geometrical feature which connects height estimate
with gait feature. The appearance features of the remaining human candidates are later tested
by SVMs to determine the object’s existence in new cameras. Experimental results show the
feasibility and effectiveness of this proposal and demonstrate the real-time in appearance
feature extraction and robustness to illumination and visual angle change.
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...cscpconf
This paper proposes a robust tracking method which concatenates appearance and geometrical
features to re-identify human in non-overlapping views. A uniformly-partitioning method is
proposed to extract local HSV(Hue, Saturation, Value) color features in upper and lower
portion of clothing. Then adaptive principal view selecting algorithm is presented to locate
principal view which contains maximum appearance feature dimensions captured from different
visual angles. For each appearance feature dimension in principal view, all its inner frames get
involved in training a support vector machine (SVM). In matching process, human candidate
filtering is first operated with an integrated geometrical feature which connects height estimate
with gait feature. The appearance features of the remaining human candidates are later tested
by SVMs to determine the object’s existence in new cameras. Experimental results show the
feasibility and effectiveness of this proposal and demonstrate the real-time in appearance
feature extraction and robustness to illumination and visual angle change.
Interactive full body motion capture using infrared sensor networkijcga
Traditional motion capture (mocap) has been
well
-
stud
ied in visual science for
the last decades
. However
the fie
ld is mostly about capturing
precise animation to be used in
specific
application
s
after
intensive
post
processing such as studying biomechanics or rigging models in movies. These data set
s are normally
captured in complex laboratory environments with
sophisticated
equipment thus making motion capture a
field that is mostly exclusive to professional animators.
In
addition
, obtrusive sensors must be attached to
actors and calibrated within t
he capturing system, resulting in limited and unnatural motion.
In recent year
the rise of computer vision and interactive entertainment opened the gate for a different type of motion
capture which focuses on producing
optical
marker
less
or mechanical sens
orless
motion capture.
Furtherm
ore a wide array of low
-
cost
device are released that are easy to use
for less mission critical
applications
.
This paper
describe
s
a new technique of using multiple infrared devices to process data from
multiple infrared sensors to enhance the flexibility and accuracy of the markerless mocap
using commodity
devices such as Kinect
. The method involves analyzing each individual sensor
data, decompose and rebuild
them into a uniformed skeleton across all sensors. We then assign criteria to define the confidence level of
captured signal from
sensor. Each sensor operates on its own process and communicates through MPI.
Our method emphasize
s on the need of minimum calculation overhead for better real time performance
while being able to maintain good scalability
HUMAN BODY DETECTION AND SAFETY CARE SYSTEM FOR A FLYING ROBOTcsandit
Image-processing is one the challenging issue in robotic as well as electrical engineering
research contexts. This study proposes a system for extract and tracking objects by a
quadcopter’s flying robot and how to extract the human body. It is observed in image taken
from real-time camera that is embedded bottom of the quadcopter, there is a variance in human
behaviour being tracked or recorded such as position and, size, of the human. In the regard, the
paper tries to investigate an image-processing method for tracking humans’ body, concurrently.
For this process, an extraction method, which defines features to distinguish a human body, is
proposed. The proposed method creates a virtual shape of bodies for recognizing the body of
humans, also, generate an extractor according to its edge information. This method shows
better performance in term of precision as well as speed experimentally.
Pose estimation algorithm for mobile augmented reality based on inertial sen...IJECEIAES
Augmented reality (AR) applications have become increasingly ubiquitous as it integrates virtual information such as images, 3D objects, video, and more to the real world, which further enhances the real environment. Many researchers have investigated the augmentation of the 3D object on the digital screen. However, certain loopholes exist in the existing system while estimating the object’s pose, making it inaccurate for mobile augmented reality (MAR) applications. Objects augmented in the current system have much jitter due to frame illumination changes, affecting the accuracy of vision-based pose estimation. This paper proposes to estimate the pose of an object by blending both vision-based techniques and micro electrical mechanical system (MEMS) sensor (gyroscope) to minimize the jitter problem in MAR. The algorithm used for feature detection and description is oriented FAST rotated BRIEF (ORB), whereas to evaluate the homography for pose estimation, random sample consensus (RANSAC) is used. Furthermore, gyroscope sensor data is incorporated with the vision-based pose estimation. We evaluated the performance of augmenting the 3D object using the techniques, vision-based, and incorporating the sensor data using the video data. After extensive experiments, the validity of the proposed method was superior to the existing vision-based pose estimation algorithms.
Interactive Full-Body Motion Capture Using Infrared Sensor Network ijcga
Traditional motion capture (mocap) has been well-studied in visual science for the last decades. However the field is mostly about capturing precise animation to be used in specific applications after intensive post processing such as studying biomechanics or rigging models in movies. These data sets are normally captured in complex laboratory environments with sophisticated equipment thus making motion capture a
field that is mostly exclusive to professional animators. In addition, obtrusive sensors must be attached to actors and calibrated within the capturing system, resulting in limited and unnatural motion. In recent year the rise of computer vision and interactive entertainment opened the gate for a different type of motion capture which focuses on producing optical markerless or mechanical sensorless motion capture. Furthermore a wide array of low-cost device are released that are easy to use for less mission critical applications. This paper describes a new technique of using multiple infrared devices to process data from multiple infrared sensors to enhance the flexibility and accuracy of the markerless mocap using commodity
devices such as Kinect. The method involves analyzing each individual sensor data, decompose and rebuild
them into a uniformed skeleton across all sensors. We then assign criteria to define the confidence level of
captured signal from sensor. Each sensor operates on its own process and communicates through MPI.
Our method emphasizes on the need of minimum calculation overhead for better real time performance
while being able to maintain good scalability.
Trajectory reconstruction for robot programming by demonstration IJECEIAES
The reproduction of hand movements by a robot remains difficult and conventional learning methods do not allow us to faithfully recreate these movements because it is very difficult when the number of crossing points is very large. Programming by Demonstration gives a better opportunity for solving this problem by tracking the user’s movements with a motion capture system and creating a robotic program to reproduce the performed tasks. This paper presents a Programming by Demonstration system in a trajectory level for the reproduction of hand/tool movement by a manipulator robot; this was realized by tracking the user’s movement with the ArToolkit and reconstructing the trajectories by using the constrained cubic spline. The results obtained with the constrained cubic spline were compared with cubic spline interpolation. Finally the obtained trajectories have been simulated in a virtual environment on the Puma 600 robot.
A Study on Sparse Representation and Optimal Algorithms in Intelligent Comput...MangaiK4
Abstract -Computer vision is a dynamic research field which involves analyzing, modifying, and high-level understanding of images. Its goal is to determine what is happening in front of a camera and use the facts understood to control a computer or a robot, or to provide the users with new images that are more informative or esthetical pleasing than the original camera images. It uses many advanced techniques in image representation to obtain efficiency in computation. Sparse signal representation techniqueshave significant impact in computer vision, where the goal is to obtain a compact high-fidelity representation of the input signal and to extract meaningful information. Segmentation and optimal parallel processing algorithms are expected to further improve the efficiency and speed up in processing.
CHARACTERIZING HUMAN BEHAVIOURS USING STATISTICAL MOTION DESCRIPTORsipij
Identifying human behaviors is a challenging research problem due to the complexity and variation of
appearances and postures, the variation of camera settings, and view angles. In this paper, we try to
address the problem of human behavior identification by introducing a novel motion descriptor based on
statistical features. The method first divide the video into N number of temporal segments. Then for each
segment, we compute dense optical flow, which provides instantaneous velocity information for all the
pixels. We then compute Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32
bins. We then compute statistical features from the obtained HOOF forming a descriptor vector of 192- dimensions. We then train a non-linear multi-class SVM that classify dif erent human behaviors with the
accuracy of 72.1%. We evaluate our method by using publicly available human action data set. Experimental results shows that our proposed method out performs state of the art methods.
Nowadays crowd analysis, essential factor about decision management of brand strategy, is not a controllable field by individuals. Therefore a technology, software products is needed. In this paper we focused on what we have done about crowd analysis and examination the problem of human detection with fish-eye lenses cameras. In order to identify human density, one of the machine learning algorithm, which is Haar Classification algorithm, is used to distinguish human body under different conditions. First, motion analysis is used to search for meaningful data, and then the desired object is detected by the trained classifier. Significant data has been sent to the end user via socket programming and human density analysis is presented.
Nowadays crowd analysis, essential factor about decision management of brand strategy, is not a
controllable field by individuals. Therefore a technology, software products is needed. In this paper we
focused on what we have done about crowd analysis and examination the problem of human detection with
fish-eye lenses cameras. In order to identify human density, one of the machine learning algorithm, which
is Haar Classification algorithm, is used to distinguish human body under different conditions. First,
motion analysis is used to search for meaningful data, and then the desired object is detected by the trained
classifier. Significant data has been sent to the end user via socket programming and human density
analysis is presented.
HUMAN BODY DETECTION AND SAFETY CARE SYSTEM FOR A FLYING ROBOTcscpconf
Image-processing is one the challenging issue in robotic as well as electrical engineering research contexts. This study proposes a system for extract and tracking objects by a quadcopter’s flying robot and how to extract the human body. It is observed in image taken from real-time camera that is embedded bottom of the quadcopter, there is a variance in human behaviour being tracked or recorded such as position and, size, of the human. In the regard, the paper tries to investigate an image-processing method for tracking humans’ body, concurrently .For this process, an extraction method, which defines features to istinguish a human body, is proposed. The proposed method creates a virtual shape of bodies for recognizing the body of
humans, also, generate an extractor according to its edge information. This method shows better performance in term of precision as well as speed experimentally.
Obstacle detection for autonomous systems using stereoscopic images and bacte...IJECEIAES
This paper presents a low cost strategy for real-time estimation of the position of ob- stacles in an unknown environment for autonomous robots. The strategy was intended for use in autonomous service robots, which navigate in unknown and dynamic indoor environments. In addition to human interaction, these environments are characterized by a design created for the human being, which is why our developments seek morphological and functional similarity equivalent to the human model. We use a pair of cameras on our robot to achieve a stereoscopic vision of the environment, and we analyze this information to determine the distance to obstacles using an algorithm that mimics bacterial behavior. The algorithm was evaluated on our robotic platform demonstrating high performance in the location of obstacles and real-time operation.
AN ALGORITHM TO DETECT DRIVER’S DROWSINESS BASED ON NODDING BEHAVIOURijscmcj
Driver’s drowsiness is one of the major causes of serious accidents in road traffic. Thus, special effort in
searching for better assistant technology has been paid. However, several existing approaches fail to work
effectively as the head of a drowsy driver is usually in slanting state. Moreover, the shaking of vehicle or
the driver’s winking even makes the problem much more complicated. Anyway, head bend posture also
signifies a drowsy state. Consequently, this paper proposes a novel approach by considering head nodding
behaviour as an input in our detection model. After detecting a human face, some significant facial features
are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is
evaluated based on these thresholds. In our empirical experiments, the proposed algorithm can successfully
and accurately detect 96.56% of cases.
Driver's drowsiness is one of the major causes of serious accidents in road traffic. Thus, special effort in searching for better assistant technology has been paid. However, several existing approaches fail to work effectively as the head of a drowsy driver is usually in slanting state. Moreover, the shaking of vehicle or the driver's winking even makes the problem much more complicated. Anyway, head bend posture also signifies a drowsy state. Consequently, this paper proposes a novel approach by considering head nodding behavior as an input in our detection model. After detecting a human face, some significant facial features are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is
evaluated based on these thresholds. In our empirical experiments, the proposed algorithm can successfully
and accurately detect 96.56% of cases.
Driver’s drowsiness is one of the major causes of serious accidents in road traffic. Thus, special effort in searching for better assistant technology has been paid. However, several existing approaches fail to work effectively as the head of a drowsy driver is usually in slanting state. Moreover, the shaking of vehicle or the driver’s winking even makes the problem much more complicated. Anyway, head bend posture also signifies a drowsy state. Consequently, this paper proposes a novel approach by considering head nodding behaviour as an input in our detection model. After detecting a human face, some significant facial features are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is evaluated based on these thresholds. In our empirical experiments, the proposed algorithm can successfully and accurately detect 96.56% of cases.
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...csandit
This paper proposes a robust tracking method which concatenates appearance and geometrical
features to re-identify human in non-overlapping views. A uniformly-partitioning method is
proposed to extract local HSV(Hue, Saturation, Value) color features in upper and lower
portion of clothing. Then adaptive principal view selecting algorithm is presented to locate
principal view which contains maximum appearance feature dimensions captured from different
visual angles. For each appearance feature dimension in principal view, all its inner frames get
involved in training a support vector machine (SVM). In matching process, human candidate
filtering is first operated with an integrated geometrical feature which connects height estimate
with gait feature. The appearance features of the remaining human candidates are later tested
by SVMs to determine the object’s existence in new cameras. Experimental results show the
feasibility and effectiveness of this proposal and demonstrate the real-time in appearance
feature extraction and robustness to illumination and visual angle change.
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...cscpconf
This paper proposes a robust tracking method which concatenates appearance and geometrical
features to re-identify human in non-overlapping views. A uniformly-partitioning method is
proposed to extract local HSV(Hue, Saturation, Value) color features in upper and lower
portion of clothing. Then adaptive principal view selecting algorithm is presented to locate
principal view which contains maximum appearance feature dimensions captured from different
visual angles. For each appearance feature dimension in principal view, all its inner frames get
involved in training a support vector machine (SVM). In matching process, human candidate
filtering is first operated with an integrated geometrical feature which connects height estimate
with gait feature. The appearance features of the remaining human candidates are later tested
by SVMs to determine the object’s existence in new cameras. Experimental results show the
feasibility and effectiveness of this proposal and demonstrate the real-time in appearance
feature extraction and robustness to illumination and visual angle change.
Interactive full body motion capture using infrared sensor networkijcga
Traditional motion capture (mocap) has been
well
-
stud
ied in visual science for
the last decades
. However
the fie
ld is mostly about capturing
precise animation to be used in
specific
application
s
after
intensive
post
processing such as studying biomechanics or rigging models in movies. These data set
s are normally
captured in complex laboratory environments with
sophisticated
equipment thus making motion capture a
field that is mostly exclusive to professional animators.
In
addition
, obtrusive sensors must be attached to
actors and calibrated within t
he capturing system, resulting in limited and unnatural motion.
In recent year
the rise of computer vision and interactive entertainment opened the gate for a different type of motion
capture which focuses on producing
optical
marker
less
or mechanical sens
orless
motion capture.
Furtherm
ore a wide array of low
-
cost
device are released that are easy to use
for less mission critical
applications
.
This paper
describe
s
a new technique of using multiple infrared devices to process data from
multiple infrared sensors to enhance the flexibility and accuracy of the markerless mocap
using commodity
devices such as Kinect
. The method involves analyzing each individual sensor
data, decompose and rebuild
them into a uniformed skeleton across all sensors. We then assign criteria to define the confidence level of
captured signal from
sensor. Each sensor operates on its own process and communicates through MPI.
Our method emphasize
s on the need of minimum calculation overhead for better real time performance
while being able to maintain good scalability
HUMAN BODY DETECTION AND SAFETY CARE SYSTEM FOR A FLYING ROBOTcsandit
Image-processing is one the challenging issue in robotic as well as electrical engineering
research contexts. This study proposes a system for extract and tracking objects by a
quadcopter’s flying robot and how to extract the human body. It is observed in image taken
from real-time camera that is embedded bottom of the quadcopter, there is a variance in human
behaviour being tracked or recorded such as position and, size, of the human. In the regard, the
paper tries to investigate an image-processing method for tracking humans’ body, concurrently.
For this process, an extraction method, which defines features to distinguish a human body, is
proposed. The proposed method creates a virtual shape of bodies for recognizing the body of
humans, also, generate an extractor according to its edge information. This method shows
better performance in term of precision as well as speed experimentally.
Pose estimation algorithm for mobile augmented reality based on inertial sen...IJECEIAES
Augmented reality (AR) applications have become increasingly ubiquitous as it integrates virtual information such as images, 3D objects, video, and more to the real world, which further enhances the real environment. Many researchers have investigated the augmentation of the 3D object on the digital screen. However, certain loopholes exist in the existing system while estimating the object’s pose, making it inaccurate for mobile augmented reality (MAR) applications. Objects augmented in the current system have much jitter due to frame illumination changes, affecting the accuracy of vision-based pose estimation. This paper proposes to estimate the pose of an object by blending both vision-based techniques and micro electrical mechanical system (MEMS) sensor (gyroscope) to minimize the jitter problem in MAR. The algorithm used for feature detection and description is oriented FAST rotated BRIEF (ORB), whereas to evaluate the homography for pose estimation, random sample consensus (RANSAC) is used. Furthermore, gyroscope sensor data is incorporated with the vision-based pose estimation. We evaluated the performance of augmenting the 3D object using the techniques, vision-based, and incorporating the sensor data using the video data. After extensive experiments, the validity of the proposed method was superior to the existing vision-based pose estimation algorithms.
Interactive Full-Body Motion Capture Using Infrared Sensor Network ijcga
Traditional motion capture (mocap) has been well-studied in visual science for the last decades. However the field is mostly about capturing precise animation to be used in specific applications after intensive post processing such as studying biomechanics or rigging models in movies. These data sets are normally captured in complex laboratory environments with sophisticated equipment thus making motion capture a
field that is mostly exclusive to professional animators. In addition, obtrusive sensors must be attached to actors and calibrated within the capturing system, resulting in limited and unnatural motion. In recent year the rise of computer vision and interactive entertainment opened the gate for a different type of motion capture which focuses on producing optical markerless or mechanical sensorless motion capture. Furthermore a wide array of low-cost device are released that are easy to use for less mission critical applications. This paper describes a new technique of using multiple infrared devices to process data from multiple infrared sensors to enhance the flexibility and accuracy of the markerless mocap using commodity
devices such as Kinect. The method involves analyzing each individual sensor data, decompose and rebuild
them into a uniformed skeleton across all sensors. We then assign criteria to define the confidence level of
captured signal from sensor. Each sensor operates on its own process and communicates through MPI.
Our method emphasizes on the need of minimum calculation overhead for better real time performance
while being able to maintain good scalability.
Trajectory reconstruction for robot programming by demonstration IJECEIAES
The reproduction of hand movements by a robot remains difficult and conventional learning methods do not allow us to faithfully recreate these movements because it is very difficult when the number of crossing points is very large. Programming by Demonstration gives a better opportunity for solving this problem by tracking the user’s movements with a motion capture system and creating a robotic program to reproduce the performed tasks. This paper presents a Programming by Demonstration system in a trajectory level for the reproduction of hand/tool movement by a manipulator robot; this was realized by tracking the user’s movement with the ArToolkit and reconstructing the trajectories by using the constrained cubic spline. The results obtained with the constrained cubic spline were compared with cubic spline interpolation. Finally the obtained trajectories have been simulated in a virtual environment on the Puma 600 robot.
Similar to A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous Cameras (20)
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
A Hand Gesture Based Interactive Presentation System Utilizing Heterogeneous Cameras
1. TSINGHUA SCIENCE AND TECHNOLOGY
ISSNll1007-0214ll15/18llpp329-336
Volume 17, Number 3, June 2012
A Hand Gesture Based Interactive Presentation System Utilizing
Heterogeneous Cameras
Bobo Zeng, Guijin Wang**
, Xinggang Lin
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Abstract: In this paper, a real-time system that utilizes hand gestures to interactively control the presentation
is proposed. The system employs a thermal camera for robust human body segmentation to handle the
complex background and varying illumination posed by the projector. A fast and robust hand localization al-
gorithm is proposed, with which the head, torso, and arm are sequentially localized. Hand trajectories are
segmented and recognized as gestures for interactions. A dual-step calibration algorithm is utilized to map
the interaction regions between the thermal camera and the projected contents by integrating a Web cam-
era. Experiments show that the system has a high recognition rate for hand gestures, and corresponding
interactions can be performed correctly.
Key words: hand gesture detection; gesture recognition; thermal camera; interactive presentation
Introduction
Interactive presentation systems use advanced Human
Computer Interaction (HCI) techniques to provide a
more convenient and user-friendly interface for con-
trolling presentation displays, such as page up/down
controls in a slideshow. Compared with traditional
mouse and keyboard control, the presentation experi-
ence is significantly improved with these techniques.
Hand gesture has wide-ranging applications[1]
. In this
study, we apply it to an interactive presentation system
to create an easy-to-understand interaction interface.
The key to this system is robust localization of the
presenter’s interaction hand. Traditionally, visible light
cameras are used for this task. However, in the presen-
tation scenario, complicated and varying illumination
by the projector changes the appearance of both the
presenter and the hand in the visible light camera, in-
validating methods such as detecting hand skin color[2]
or training hand detectors[3,4]
. To address this short-
coming, we introduce a thermal camera whose imaging
is invariant to the projector’s illumination.
During the presentation, the interaction hand is
small and no salient features exist, so direct localiza-
tion is not feasible. Hu et al.[5]
built a sophisticated
human model, but it is burdened by a heavy computa-
tional cost. Iwasawa et al.[6]
located the head and hand
by analyzing the skeleton heuristically. However, only
a limited number of postures can be estimated. Juang
et al.[7]
found significant body points by analyzing
convex points on a silhouette contour, but the convex
points are not stable when the silhouette’s contour is
missing. Head localization is important as the first step
in many pose estimation methods. Zou et al.[8]
ex-
tracted quadrant arcs to fit an elliptical head contour,
which needs to integrate the arcs combinatorially, so it
is still not fast enough.
Calibration of the camera and the projected contents
is needed for corresponding interaction regions. Spe-
cial patterns need to be projected to find the point
correspondences in Refs. [9,10], which are inconven-
ient and not fully automatic. Wang et al.[11]
found
Received: 2011-12-30; revised: 2012-05-07
** To whom correspondence should be addressed.
E-mail: wangguijin@tsinghua.edu.cn; Tel: 86-10-62781430
2. Tsinghua Science and Technology, June 2012, 17(3): 329-336
330
correspondence between shots and slides by utilizing
the positions of recognized texts. However, this ap-
proach requires the texts to be presented and recog-
nized robustly, which is not easy to do.
1 System Overview
The system setup is illustrated in Fig. 1. It consists of a
projector with a projection screen, a thermal camera
with 324×256 resolution, and a Web camera. The two
cameras are placed facing the screen and close to each
other for better correspondence. The presenter stands at
the front of the projection screen so the cameras can
see the presenter and the screen simultaneously. The
presenter uses his/her hand for interaction.
The system flowchart is illustrated in Fig. 2. The
thermal camera detects and tracks the presenter’s hand
(gesture detection), and the hand trajectory is
segmented and recognized (gesture recognition). Once
predefined gestures are recognized, the corresponding
interactions are performed and projected to the screen
with the projector-camera calibration. The calibration
is computed with the thermal camera and the Web
camera together to map the interaction regions.
Robust and fast gesture detection by locating the in-
teraction hand largely determines the system’s per-
formance and serves as our major emphasis. Being
aware of the difficulties in directly locating hands, we
first segment the human body by background modeling
with the thermal camera. Based on the segmented hu-
man silhouette, we develop a fast hand localization
algorithm for our real-time application. The human
body is roughly divided into three parts: the head, the
torso, and the arm, and they are localized sequentially.
The hand is finally localized by skeletonizing the arm.
With the hand trajectory, we segment and recognize
three types of gestures (lining, circling, and pointing)
and apply the corresponding interactions given the
pre-computed calibration. Direct calibration from the
thermal camera to the projected contents is impossible,
as no projected contents can be viewed from the ther-
mal camera. Therefore, we develop a dual-step auto-
matic calibration method with the aid of a common
Web camera. The first step is thermal camera and Web
camera calibration based on line detection, and the
second step is Web camera and projected contents (e.g.,
PPT slide) calibration based on Scale Invariant Feature
Transform (SIFT) feature points[12]
.
2 Gesture Detection
We propose a part-based algorithm for localizing the
hand point, including a contour-based head detection
and shape matching based head tracking module, a
histogram-based torso and arm localization module,
and a skeleton-based hand localization module. The
localization algorithm is based on the observations that
the human body is upright and, in interactions, the
hand is located away from the torso, either beside the
torso (horizontally) or above the head (vertically). The
algorithm flow chart is illustrated in Fig. 3.
2.1 Human body segmentation
Human body segmentation is carried out using the
thermal camera, as its images are invariant to the pro-
jector’s light, which is ideal for segmentation. Figure
Fig. 1 System hardware setup
Fig. 2 System flowchart
3. Bobo Zeng et al.:A Hand Gesture Based Interactive Presentation System Utilizing … 331
4c shows a typical image captured by the thermal
camera; no content on the projection screen can be
observed. We use a Gaussian background model[13]
to
extract the foreground as the segmented human body.
Since the background is simple and static in the indoor
environment, unimodal Gaussian distribution is
employed.
For post-processing, erosion and dilation operations
are performed to remove the noise and connect the
possible gaps caused by inaccurate segmentation. Then,
we use a connected component analysis algorithm[14]
to
select the largest connected component as the human
body and its contour. Figure 4 illustrates that the hu-
man body is well segmented in the thermal camera.
The inferior results of the Web camera are also shown
for comparison.
2.2 Head localization
Head localization is performed for three reasons. First,
directly localizing the hand is not feasible due to the
lack of salient hand features. In contrast, the head has a
salient elliptical shape, which is stable and invariant to
head poses (e.g., frontal or profile view). Second, the
localized head can guide the localization of the torso
and arms by its position and size. Third, since the head
and arm are quite different in size, the head’s localiza-
tion can prevent it from being falsely identified as the
arm. We propose to detect the head by contour-based
ellipse fitting and track it by robust shape matching.
2.2.1 Contour-based elliptical head detection
The head contour is segmented from the human body
contour by using a gradient angle distribution of its
elliptical shape. Then, an ellipse is detected by least
squares fitting with the head contour’s points. The
head contour’s gradient angle distribution is illustrated
in Fig. 5. Searching from the start point (the highest
point on the head contour), the gradient angles of the
contour points vary from the first quadrant [0, π / 2 ]
to the fourth quadrant [ π / 2
− , 0] in the counterclock-
wise direction, and from the second quadrant [ π/2, π ]
to the third quadrant [ , / 2
−π − π ] in the clockwise
direction. The gradient angle of each contour point is
computed by
Fig. 3 Hand localization flowchart
Fig. 4 Comparison of human body segmentation re-
sults of the two cameras: (a) image captured by the
Web camera; (b) the inferior segmentation result of the
Web camera; (c) image captured by the FLIR thermal
camera; and (d) the superior segmentation result of the
thermal camera
Fig. 5 Illustration of gradient angle distribution of an
elliptical head when searching from the start point in
clockwise and counterclockwise directions. The arrow
line shows the dx and dy involved in the gradient angle
computation.
4. Tsinghua Science and Technology, June 2012, 17(3): 329-336
332
arctan( ), 0;
arctan( ) π, 0 and 0;
arctan(
/
) π, 0 a
/
/ nd 0
y x x
y x x y
y x x y
d d d
d d d d
d d d d
θ
⎧ >
⎪
= +
⎨
⎪ − <
⎩
- .
-
(1)
where x
d and y
d are x and y derivatives obtained
with a [−1 0 1] mask.
The head detection process is illustrated in Fig. 6.
First, the search start point is identified as the intersec-
tion of the human body contour and the principal axis
of the human body. The principle axis is a vertical line
p argmax ( )
x h x
= (2)
where h(x) is the horizontal histogram of human body
pixels. From the start point, the gradient angles are
computed along the contour in counterclockwise and
clockwise directions. The image is smoothed by a 5×5
Gaussian kernel before computing the angles to re-
move noises. The head contour is segmented after
traversing the 4 quadrants in order. The symmetry of
contour segments in clockwise and counterclockwise
directions is employed to reduce false segmentations.
If the segment of one side is exceptionally longer than
another side, then it is truncated to maintain symmetry.
An ellipse is detected with the segmented head con-
tour points by least squares fitting[15]
using the ellipse
equation. Then, the ellipse’s long axis a, short axis b,
and the orientation θ to the y-axis are obtained. The
ellipse is verified as a valid head if the following con-
ditions are satisfied:
(a) The fitting error is less than a specified threshold.
(b) According to the head’s size, the ellipse’s aspect
ratio a/b should fall into a range of [1.0, 2.5] and the
orientation should fall into a range of [ π / 4, π / 4
− ].
If the head is unverified, further head tracking is re-
quired. An unverified head is illustrated in Fig. 7,
which is caused by a wrongly segmented head contour
due to distraction by the arm.
2.2.2 Shape matching based head tracking
When head detection fails, a head template tracking
method utilizing robust shape matching is utilized to
recover the head. We assume that the head has the
same size as the last detected head, but it is now lo-
cated in a different position. The task is to find the best
state hypothesis, ˆ ( , )
t t t
s x y
= , which denotes the posi-
tion of the head at time t. Its probability is measured by
ˆ argmax ( | )
t
t t t
s
s p s o
= (3)
1
ˆ
( | ) ( | ) ( | )
t t t t t t
p s o p o s p s s −
∝ (4)
where ( | )
t t
p o s is the observation probability and
1
ˆ
( | )
t t
p s s − is the state transition probability.
For the observation probability, we develop a shape
matching measure inspired by chamfer matching[16]
, a
popular fast shape matching technique. In our match-
ing method, the last detected head ellipse T is the
“feature template”, and the current body silhouette
contour I is the “feature image”. The original chamfer
matching approach does not consider shape incom-
pleteness. However, in our problem, the head contour
may be missing in the joints position with the neck or
in the possible intersection position with the arm when
it is lifted above the head. So, we introduce a robust
π
π/2
0
−π/2
π
Start point
Head
contour
Principle axis
Fig. 6 A detected head showing the segmented principal
axis of the human body and head contour with the gradient
angle curve. The arrow lines indicate the start point, the left
endpoint, the right endpoint of the head contour, and their
corresponding positions in the gradient angle curve.
Fig. 7 Head tracking illustration in which the head
contour is wrongly segmented due to distraction by the
arm; hence, the fitted head is unverified. The last de-
tected head is used as a template, and shape matching
is employed to track the head. The head template and
its distance transform image are also shown.
Head template
Head distance
image
Search region
Unverified
head
Tracked
head
5. Bobo Zeng et al.:A Hand Gesture Based Interactive Presentation System Utilizing … 333
matching measure that detects matched points with a
distance greater than a threshold as outliers and rejects
them. We also apply the distance transform to the tem-
plate image (see Fig. 7 for the head template and dis-
tance image) rather than the feature image for conven-
ience. The proposed confidence measure is
th
th
( ), th
( )
( )
| (1 )
( )
t
I
T T
t t
i I s T T
d i d
d d i N
P o s
N d P
ω ω
∈
−
= + −
∑
-
(5)
th ( )
d a b
β
= + (6)
where the first term of Eq. (5) measures the matching
confidence, with ( )
T
d i denoting the chamfer distance
between contour point i in the hypothesis image patch
( )
t
I s and the closest edge point in T, th
d denoting
the distance threshold, and T
N denoting the number
of inlying matches. The second term measures the
matching completeness, with T
P denoting the pe-
rimeter of the head ellipse in the template and ω de-
noting the weighting factor between the two terms.
th
d is set in proportion to the head template’s
semi-axes a and b, and the coefficient β is set to be
0.1.
For the motion model, since the head is near the
principle axis p
x horizontally and obeys the motion
continuity vertically, the state transition probability is
defined as follows:
2 2
1 p 1
ˆ ˆ
( | ) ( | 0, ) ( | 0, )
t t t x t t y
P s s x x y y
σ σ
− −
= − −
N N (7)
where N is the normal distribution; 2
x
σ and 2
y
σ
represent the covariance in the x and y directions,
respectively.
The hypothesis space is generated in a neighborhood
of p 1
ˆ
( , )
t
x y − , and the hypothesis with the highest
probability is obtained by exhaustive search. Figure 7
gives a head tracking example.
2.3 Hand localization
Compared to the head, localization of the torso and the
arm region is relatively easy, and Fig. 8 illustrates the
results for two different poses. The torso is modeled as
a rectangle, whose height extends from the bottom of
the head to the bottom of the body bounding box. The
horizontal histogram of human body pixels below the
head is calculated and binarized with a threshold. Then,
the longest 1 segment is localized as the torso rectan-
gle’s width. Apart from the head and torso, only the
arm remained, which is localized as the largest com-
ponent with connected component analysis.
The identified arm region is skeletonized with the
thinning algorithm in Ref. [17], and two endpoints of
the skeleton are extracted. Based on the layout of the
head, torso, and arm, the endpoint far away from the
body is identified as the initial hand point. Since the
skeleton is easily affected by local noises, we further
extracted the initial hand point’s neighborhood with the
hand size estimated from the head size. The centroid of
body pixels in the neighborhood is computed as the
final hand point.
3 Gesture Recognition
The moving trajectory 1
{ ( , )}
i i i i N
x y
=
p - - of the hand
consists of the localized hand point in each frame. Ges-
ture trajectory is segmented with static start and end
positions by keeping the hand stationary for several
frames[18]
. Features are extracted from the gesture tra-
jectory for recognition. Center of gravity ˆ ˆ ˆ
( , )
x y
=
p
of the segmented trajectory is calculated by
0
1
ˆ
N
i
i
N =
= ∑
p p (8)
The distance feature i
l is calculated by
2 2
2
ˆ ˆ ˆ
|| || ( ) ( )
i i i i
L x x y y
= − = − + −
p p (9)
max 1
max
, ma .
x
i
i i N i
L
l L L
L
= = - -
The orientation feature i
θ is calculated by
1 ˆ
tan
ˆ
i
i
i
y y
x x
θ − −
=
−
(10)
In the recognition, we used a rule-based approach
rather than a learning-based approach (e.g., an Hidden
Markov Model (HMM[19]
) or a Finite-State Machine
(FSM[20]
)). Rule-based approaches are simple, fast, and
Fig. 8 The localized head, torso, arm, arm skeleton,
and the hand point for two different poses. The hori-
zontal histogram without the head and the threshold
for locating the torso are also shown.
6. Tsinghua Science and Technology, June 2012, 17(3): 329-336
334
require no offline training. The rules are defined as
follows for the predefined three gestures:
Lining The distance features decrease and then
increase linearly, and the orientation features have two
opposite values.
Circling The distance features are constant, and
the orientation features have a 2π value range and in-
crease or decrease monotonically.
Pointing The distance features are near to zero.
4 Gesture-Based Interaction
After a gesture is recognized, the interaction region
needs to be mapped from the camera to the original
contents. Camera projector calibration is employed to
accomplish the mapping.
4.1 Camera projector calibration
The planar correspondence of pixels in the projection
screen between the camera and the original projected
contents is determined by a 3×3 homography matrix H
as follows:
T T
p p c c
[ ] [ 1]
sx sy s x y
= H (11)
where c c
( , )
x y is the projection screen’s point ob-
served in the image, and p p
( , )
x y is the corresponding
point in the original contents. At least four such corre-
spondence pairs are required in order to solve H. Di-
rectly finding such pairs with the thermal camera is
impossible because of the lack of texture of the imag-
ing. We therefore develop a calibration method in two
steps aided by the Web camera: 2cam 2cont ,
= ×
H H H
where H2cam is the homography matrix from the ther-
mal camera to the Web camera, and H2cont is the ho-
mography matrix from the Web camera to the pro-
jected contents.
H2cam is calculated in the first step. The four vertices
of the projection screen in both cameras are detected
by finding lines using the Hough transform[21]
and ob-
taining the quadrilateral. The vertices in both cameras
are used to calculate H2cam . The results are illustrated
in Fig. 9.
To get the homography matrix from the Web camera
to the projected contents, we employed SIFT
keypoints[12]
to detect candidate correspondence points.
SIFT points are scale-space extrema detected in a set
of difference-of-Gaussian images in the scale space.
Each keypoint is assigned a location, scale, orientation,
and a descriptor; it is thereby invariant to scale and
rotation and robust to illumination change and view-
point change. Given the detected SIFT points in two
images, the nearest neighbor matching algorithm from
Ref. [12] is used. After this matching step, the corre-
sponding point pairs are only putative. We use RAN-
dom SAmple Consensus (RANSAC)[17]
to impose the
homography constraints on the previous matched point
pairs and to find the correct correspondences, which
are finally used to calculate H2cont . The whole process
is illustrated in Fig. 10.
Finally, we obtained the final homography by
H=H2cam×H2cont . Though the image quality of the Web
camera was not very good and no special pattern was
projected for calibration, the experiment demonstrated
the effectiveness of our method: it had a maximum
pixel error of less than 8, which was sufficiently pre-
cise for the interaction.
4.2 Interaction
Upon identifying the interaction region of the recog-
nized gesture with the calibration, the semantic inter-
action is carried out. In a PPT interactive presentation
scenario, lining upward or downward controls the
(a)
(b)
(c)
Fig. 9 Vertices detection with the Hough transform:
(a) contrast-adjusted thermal image for viewing the
screen better; (b) corner detection in thermal camera
images; and (c) corner detection in Web camera images
7. Bobo Zeng et al.:A Hand Gesture Based Interactive Presentation System Utilizing … 335
forward or backward slide transitions, respectively;
lining leftward or rightward horizontally makes anno-
tations on the slide; circling controls the zoom in/out
operations of the interaction regions in the slide; and
pointing controls the activation of buttons in the slide.
Once the gesture is detected and recognized, the cor-
responding actions can be performed in the PPT (such
as annotation) and projected to the screen. Figures 11a
and 11b illustrate a circling gesture and a lining gesture,
respectively, in the presentation.
5 Performance Evaluation
We quantitatively evaluate the proposed system in
terms of head and hand localization accuracy, gesture
recognition rate, and computation time. The system
runs on a Windows PC with an Intel Dual Core 3 GHz
CPU and could be run at 20 frames per second in a
single CPU core without any specific programming
optimization. We conduct the experiments in online
real-time scenarios with complex slide contents (e.g.,
colored figures and backgrounds).
The head and hand localization accuracy is key for
the system performance and is defined as
Localization accuracy =
Number of correctly localized head (or hand)
.
Number of total head (or hand)
For the head, correct localization means that the over-
lap area of localized head ellipse l
D and ground truth
gt
D must exceed 50%:
l gt
l gt
area( )
OverlapRatio = 0.5.
area( )
D D
D D
>
∩
∪
For the hand point, correct localization means that the
point is localized in the actual hand region. The
evaluation results are shown in Table 1, and Fig. 12
shows certain hand point localization results.
Table 2 shows the recognition rates of the three
types of gestures. These results show that our system
has achieved high recognition performance. The main
error in recognition is caused by wrong gesture seg-
mentation.
Table 1 Localization accuracy of the head and hand point
Number
of total
Number of
correctly localized
Localization
accuracy (%)
Head 6895 6839 99.2
Hand point 5092 5044 99.1
Fig. 12 Hand point localization results
Table 2 Gesture recognition results
Gesture
Number of
gestures
Number
recognized
Recognition
rate (%)
Overall
rate (%)
Lining 84 81 96.4
Circling 52 50 96.2
Pointing 43 42 97.7
96.7
(a)
(b)
(c)
Fig. 10 Illustration of finding the homography from
the Web camera to the projected contents: (a) detected
SIFT points in the visible camera and the original con-
tents; (b) the putative matched point pairs; and (c) the
final matched pairs by imposing homography con-
straints with RANSAC
(a) (b)
Fig. 11 (a) Circling gesture for zooming into the
graph in the presentation; and (b) Lining gesture for
annotating a PPT slide. The red annotation line is
drawn on the original PPT slide and projected to the
screen after the lining gesture is recognized.
8. Tsinghua Science and Technology, June 2012, 17(3): 329-336
336
6 Conclusions
In this study, we design a hand gesture based presenta-
tion system by integrating a thermal camera with a
Web camera; this system provides a natural and con-
venient interaction interface for presentations. Quanti-
tative experimental results show that the system per-
forms efficiently. Currently, our gesture interaction is
limited to one hand, and we plan to extend the work to
handle two hands to enrich the system’s interaction
capability. The Web camera is only used for calibration,
but additional information could be extracted for in-
teraction, such as the shape of the hand (e.g., palm or
fist). Also, other HCI applications such as gaming will
be explored in future studies.
Acknowledgments
The authors would like to thank Nokia China Research Center
for their support of this research.
References
[1] Erol A, Bebis G, Nicolescu M, et al. Vision-based hand
pose estimation: A review. Computer Vision and Image
Understanding, 2007, 108(1-2): 52-73.
[2] Wang F, Ngo C W, Pong T C. Simulating a smartboard by
real-time gesture detection in lecture videos. IEEE
Transactions on Multimedia, 2008, 10(5): 926-935.
[3] Francke H, Ruiz-Del-Solar J, Verschae R. Real-time hand
gesture detection and recognition using boosted classifiers
and active learning. In: Proceedings of the 2nd Pacific Rim
Conference on Advances in Image and Video Technology.
Santiago, Chile, 2007: 533-547.
[4] Fang Y K, Wang K Q, Cheng J, et al. A real-time hand
gesture recognition method. In: Proceedings of the 2007
IEEE International Conference on Multimedia and Expo,
Vols 1-5. Beijing, China, 2007: 995-998.
[5] Hu Z, Wang G, Lin X, et al. Recovery of upper body poses
in static images based on joints detection. Pattern
Recognition Letters, 2009, 30(5): 503-512.
[6] Iwasawa S, Ebihara K, Ohya J, et al. Real-time estimation
of human body posture from monocular thermal images. In:
Proceedings of the 1997 IEEE Computer Vision and
Pattern Recognition. San Juan, Puerto Rico, 1997: 15-20.
[7] Juang C F, Chang C M, Wu J R, et al. Computer
vision-based human body segmentation and posture
estimation. Systems, Man and Cybernetics, Part A: Systems
and Humans, IEEE Transactions on, 2009, 39(1): 119-133.
[8] Zou W, Li Y, Yuan K, et al. Real-time elliptical head
contour detection under arbitrary pose and wide distance
range. Journal of Visual Communication and Image
Representation, 2009, 20(3): 217-228.
[9] Sukthankar R, Stockton R G, Mullin M D. Smarter
presentations: Exploiting homography in camera-projector
systems. In: Proceedings of the Eighth IEEE International
Conference on Computer Vision. Kauai, HI, USA, 2001:
247-253.
[10] Okatani T, Deguchi K. Autocalibration of a projector-
camera system. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 2005, 27(12): 1845-1855.
[11] Wang F, Ngo C W, Pong T C. Lecture video enhancement
and editing by integrating posture, gesture, and text. IEEE
Transactions on Multimedia, 2007, 9(2): 397-409.
[12] Lowe D G. Distinctive image features from scale-invariant
keypoints. International Journal of Computer Vision, 2004,
60(2): 91-110.
[13] Stauffer C, Grimson W E L. Adaptive background mixture
models for real-time tracking. In: Proceedings of the IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition. Ft. collins, CO, USA, 1999: 246-252.
[14] Chang F, Chen C J, Lu C J. A linear-time
component-labeling algorithm using contour tracing
technique. Computer Vision and Image Understanding,
2004, 93(2): 206-220.
[15] Gander W, Golub G H, Strebel R. Least-squares fitting of
circles and ellipses. BIT, 1994, 34(4): 558-578.
[16] Barrow H G, Tenenbaum J M, Bolles R C, et al. Parametric
correspondence and chamfer matching: Two new
techniques for image matching. In: Proceedings of the 5th
International Joint Conference on Artificial Intelligence.
San Francisco, CA, USA, 1977: 659-663.
[17] Lam L, Lee S W, Suen C Y. Thinning methodologies — A
comprehensive survey. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 1992, 14(9): 869-885.
[18] Davis J, Shah M. Visual gesture recognition. IEE
Proceedings-Vision Image and Signal Processing, 1994,
141(2): 101-106.
[19] Lee H K, Kim J H. An HMM-based threshold model
approach for gesture recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 1999, 21(10):
961-973.
[20] Bobick A F, Wilson A D. A state-based approach to the
representation and recognition of gesture. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
1997, 19(12): 1325-1337.
[21] Duda R O, Hart P E. Use of the Hough transformation to
detect lines and curves in pictures. Commun. ACM, 1972,
15(1): 11-15.