MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
Ib2
1. iBrutus-Computer Vision Module
iBrutus-Computer Vision Module
Rajagopal Vasudevan, Hareendra Manuru, Ashok Sasidharan
http://iss.osu.edu/iBrutus/
Introduction Person Tracking
Overview
The computer vision system is responsible for adding/removing/tracking
people in the scene. For every frame it updates the location and the
segment of the source. This is necessary to enable the dialog system to
interact with multiple parties. Its primary purpose is to know where
everyone is in the scene so Brutus could look at them when talking or Figure 3. Person Tracking using Kinect
listening to them.
Person tracking was done using a variant of Depth Forest algorithm used
Giving Brutus Eyes by Microsoft Kinect.
The algorithm takes care of partial occlusion and also provides the skeleton
640 x 480 Microsoft Kinect color camera with 5 7 º horizontal and 43º
for two people(active) closest to the Kinect. The range of tracking is 0.8 - 4m.
vertical FOVs with a tilt of 27 º up or down.
Depth map provide by kinect is used to limit Brutus' region of interest This algorithm can track upto 6 people.
up-to a certain distance.
Person Interest Determination
Figure 1. Microsoft Kinect
Figure 4. Person Interest Determination
Face(facial features) detection was performed using Viola J ones Classifiers.
Person interest made binary depending on the features detected.
Sound Source Localization
Sound Source Localization
Gaze vector estimation
Figure 2. Kinect images
Objectives and Goals
Identifies a person
PERSON entering the scene Figure 4. Highlighted segments
TRACKING
Tracks his/her location The beam and source angles are provided by the SDK.
through all the frames
The segment in which the person talks is highlighted.
Can focus on a particular segment which helps in speech recognition.
CV PERSON INTEREST
DETERMINATION
Detects the face and its
features
Determines start/continuity
of conversation
Future Scope
Kalman filtering to manage full occlusion.
Face recognition.
Microphone array used to find Gesture recognition.
SOUND SOURCE the source segment
LOCALIZATION
Helps in directing the avatar' s Acknowledgements
gaze
This project is a collaboration between CETI and ISS in association
with iShoe and Ohio State Athletics.
A special thanks to Dr. Rajiv Ramnath and Dr. Lee Potter for their
continued support.