This document summarizes a research paper on Video Based Human Interaction (VBHI) using a 4D Touchpad. It discusses how VBHI uses computer vision to track hand gestures as input in a region of interest, rather than global user tracking. The 4D Touchpad allows for intuitive gesture inputs in 3D space plus time. It works by using stereo cameras and a projector to project an interface onto a table, then recognizes gestures like flipping or twisting based on their spatiotemporal signatures. The 4D Touchpad provides a natural gesture language for interaction without devices like a mouse.
2. OVERVIEW :
• What is VBHI ?
• Introduction to 4D Touch pad.
• Structure & working of 4D ……
3. • Using computer vision in human-computer
interaction systems has become a popular
approach to enhance current interfaces.
• The majority of techniques that use vision rely
on global user tracking and modeling.
• But VBHI, watches a region-of-interest (ROI) in
the video stream and waits for recognizable
user-input; i.e. interaction is site-centric.
VIDEO BASED HUMAN INTERACTION ?
4. VIDEO BASED HUMAN INTERACTION ? Cont…
• User can give the input in the form of gestures
in the perceptual space which is governed by
its interaction model.
• In simplest case, tracked hand motion and
gesture recognition could replace mouse in
traditional applications.
6. INTRODUCTION TO 4D
• We live in a world of three dimensions.
• 2D and 3D are very familiar to all which contains
only the spatial dimension (length, width, height).
• But the concept of 4D is new. The fourth
dimension can be:
• Fragnance
• Sense of touch
• Time
7. Introduction Cont . . .
• Here the concept of 4D combines space and time
within a single coordinate system, typically with three
spatial dimensions and one temporal dimension(time).
• In spacetime, a coordinate grid that spans the 3+1
dimensions locates events i.e you have where and
when something is.
• Example: Consider an ant at a point (1,1,1) which is an
example for 3D. But the presence of an ant at (1,1,1) at
9am i.e (1,1,1,9) is an illustration for 4D.
8. 4D Example: If the inner cube becomes the outer
one and vice-versa instantaneously, i.e
coordinates can be represented only with the
help of a fourth dimention, time.
9. What is 4D Touchpad?
• A touch pad is a device for pointing on a computer
display screen, originally incorporated in
laptop, computers etc.
• 4D Touchpad (4DT) is a type of touchpad where the
intuitive gestures of users can be used to give the
input.
• It contains a platform for human-machine interfaces
that provides direct interaction with interface
components through intuitive actions and gestures.
10. .
Example : Here for flipping a coin in the screen
user have to just imitate the action of flipping
in the Region Of Interface(ROI) of the 4DT.
• The 4DT is based on the 3D-2D Projection-
based mode of the VICs framework.
• The visual interaction cues (VICs) paradigm
uses a shared perceptual space between the
user and the computer.
12. • 4DT contains a pair of cameras with a wide-baseline
and a projector that are directed at a table.
• The region above the surface of the table between the
two cameras is the ROI of the 4DT.
• The image of the screen to be displayed to the user is
projected on to the table from the projector through
the mirror.
• Mirror is mounted such that screen is shown clearly in
the table.
• Special adjustment is done in caliberating the camera
so as to obtain the correct sequence of actions by the
user.
Structure of 4DT Cont...d
13. WORKING OF 4D TOUCHPAD:
The 4DT is based on the 3D-2D Projection-based
mode of the VICs framework. It involves following steps:
a] 3D-2D Projection:
b]Image Rectification:
Homography
c]Analysis of Stereo Properties:
Stereo analysis
Gesture Recognition
A natural Gesture Language
14. 3D-2D Projection:
First the 3D view of the projected camera is
projected on to the plane by representing a
user’s pint of view in this 3D space.
This view is drawn below as the pyramid. All
ray of lights that pass this ”pyramid” will
originate from objects that the user sees.
In the end, the whole 3D to 2D projection
will be about projection the entire 3D world
on that plane in front of the pyramid.
Example : Imagine we want to project a
cube on a plane. If we want to draw the
situation, it would look somewhat like this.
15. Image Rectification Using
Homography:
• A Homography is an invertible transformation from the
real projective plane to the projective plane that maps
corresponding points.
• Homography has got mainly 2 properties:
1. For a stationary camera with its fixed centre of
projection, it does not depend on the scene
structure (i.e., depth of the scene points).
2. It applies even if the camera pans and
zooms, which means to change the focal length of
the camera while it is rotating about its centre.
16. In 4D touchpad the projection of the camera and the
actual figure is homographycally analysed.
Projected and Rectified Image
17. STEREO ANALYSIS :
• For both cameras, we can subtract the current frame
from a stored background frame yielding mask of
modified regions. This can be used for simple stereo
calculation.
• Then, we can take the difference between two
modified masks to find all pixels not on the plane and
use it in more intensive computations like 3D gesture
recognition.
18. Gesture Recognition:
• The action of a user gesturing over an interface
component presents a sequence of visual cues
gestures spatiotemporal signature.
Consider a standard push button :
1. The user enters the local region: Disturbance
2. The finger moves onto the button: large color blob
3. The finger pushes the button: fixed duration
4. The button is pressed and processing completes.
19. A natural Gesture Language:
• A natural gesture language implements the visual
interaction model on the 4DT plat-form.
• The gesture language comprises a vocabulary of
individual gestures.
Gesture Description
Press Finger on centre
Press-left Finger on left edge
Press-right Finger on right edge
Stop An open hand
Flips Mimics flipping a coin over
Twist Clockwise twisting
Drop
Pick
Index finger and thumb open
Index finger and thumb half closed
20. Since the cameras are fixed we lose the ability
to extract the panel orientation.
Set up is expensive as we use 2 cameras to
cover the wide base line of ROI.
Disadvantages of 4D Touch Pad
21. Conclusion :
• In the local regions of the video images, gesture recognition is solved by
modeling the spatio-temporal pattern of visual cues that correspond to
gestures.
• In the VICs paradigm, we approach gesture recognition without globally
tracking and modeling the user.
• The paradigm is applicable in both conventional 2D interface settings
and unconventional 3D virtual/augmented reality settings.
• The devices like the mouse is replaced with a direct, natural language of
interaction.
• It acts as a coherent, linguistic model that integrates heterogeneous
forms of the low-level gestures into a single framework.
• VICs modeling technique consistently orders image based on
overlapping pixel content and is robust to view point change and
occlusion.
22. References :
• Jason J. Corso, Guangqi Ye, Darius Burschka, and
Gregory D. Hager, ”A Practical Paradigm and Platform
for Video-Based Human-Computer Interaction” ,2008
• Jason J. Corso, ”The 4D Touchpad: Unencumbered
HCI With VICs”,2005
• A.D. Jepson, ”Computing a homography between 2D
scene and image”,2006
• F.J. Estrada ,A.D. Jepson, D. Fleet, ”Planar
Homographies”,2004
• Johnny Bistrom, Alessandro Cogliati, ”Post-WIMP
User Interface Model for 3D Web Applications”,2005