This document discusses research on detecting deception in real-time audio and video streams. It outlines challenges in synchronizing, capturing, indexing and analyzing multiple streams. It proposes using MPEG-7 semantic annotations to generate knowledge bases for analysis. The research tests infrastructure for capturing, storing and retrieving segmented streams in SQL Server 2008. It also demonstrates prototype avatar animation controlled by Python scripts. Further studies are needed on the visual concept models and detection analysis engine.
1. .
Acknowledgments
We thank Jay F. Nunamaker, Jr., Elyse Golob, Riley McIsaac, and the
Deception Detection Research Team at CMI, University of Arizona, Tucson, AZ
MS SQL Server 2008 combines several features, including
Filestream, full-text indexing and searching, to support multimedia
data management. However, synchronizing, capturing , and indexing
multiple audio/video streams in real-time is challenging.
Performing reliable deception detection analysis on live
streams will depend on the accuracy of the anchor models and the
detection analysis engine. Low level features (color, texture, shape)
which can be automatically extracted are not adequate; semantic
feature of images convey a higher level of abstraction. Annotation
generated using MPEG-7 (Part 5) provide several benefits to
generate knowledge base for semantic features analysis, including
interoperability with standardized framework, such as RDF.
To support near real-time detection, audio/video streams
must be segmented; thus, allows each segment to be captured,
stored, and then retrieved for annotation.
Further studies are required to design yhe visual concept
models and deception detection analysis engine.
Eranna K. Guruvadoo and Christopher Clarke (Bethune-Cookman University, Daytona Beach, Florida)
University of Arizona (BORDERS) , Center for the Management of Information (CMI).
References
Angelides, M. C. , Agius, Harry; “An MPEG-7 scheme for semantic
content modeling and filtering of digital video”, Multimedia Systems
(2006) 11(4): pp 320–339.
Derrick, D. C., Elkins, a. c., Burgoon, j. k., Nunamaker Jr. j. f., and Zeng,
D. . “Border Security Credibility Assessments via Heterogeneous
Sensor Fusion”, University of Arizona, IEEE Intelligent Systems,
May/June 2010; pp 41-49.
Goularte, R., Cattelan, R. G., Camacho-Guerrero, R. J. Inácio Jr., V. T.
Pimentel, M. C. “Interactive Multimedia Annotations: Enriching and
Extending Content”. Proceedings of the 2004 ACM symposium on
Document engineering, pp: 84 – 86.
Sivrikaya, F., Yener, B. “Time Synchronization inSensor Network: A
Survey”, IEEE Networks, June 2004, pp 45-50.
Richardson, I. E., [H.264 and MPEG-4 video compression: video coding
for next generation multimedia], Wiley, 1 ed. (August 2003).
Link for:
Blender - http://www.blender.org/documentation/248PythonDoc/
Poser - http://poser.smithmicro.com/poser.html
For further information
Please contact Guruvadoo at : guruvado@cookman.edu and Christopher
Clarke at: christclarke@live.com
More information on related projects can be obtained at
http://www.borders.arizona.edu/files/reports/templates-annual.html
How it’s done?
Part 5 of MPEG-7 contains Multimedia Description Scheme(MDS)
MPEG-7 is for description of multimedia content Descriptors (D) -> basic
descriptive features of media; Description schemes (DS) (complex descriptive
units made up of DS’s and D’s
MPEG-7 essentially standardize two things:
• The description Definition Language (DDL) for scheme definition
•Using the DDL, produce a comprehensive set of media description
schemes useful for many applications
New description schemes can be created by extending the predefined set or
created from scratch
Automatic Labeling process
Goal : to create anchor models for Visual Concept
Use surrogate video/audio frames to build knowledge base
Use Shot segmentation to perform shot boundaries
Use Object segmentation; remove background; Select objects of interest
Annotation – use labels from trained visual concept models and ontology to
associate labels to objects
Build classifier
Output: XML file of annotations; XML output used for indexing
Deception Detection Research
A proof of Concept for a Data Infrastructure and 3D Avatar Control
Sensors detect 5
physiological/behavioral patterns
•Near Infrared Camera
•Gaze behavior
(saccade and gaze duration)
•Pupil dilations
• Video
•Kinesics
•Audio
•Vocalic stress
•Linguistic cues
•Identification
•RFID
•Fingerprint Reader
•Card Reader
The Deception Detection Initiative consists of multi-phase projects to evaluate
technologies and interviewing techniques that might aid human screeners to
achieve rapid, reliable, and non-invasive credibility assessment in high traffic
scenarios. The objective is to identify and evaluate those non-contact methods that
hold promise for real-time tracking of large volumes of people and determining their
physiological and psychological states in a manner that minimizes disruptions for
innocent citizenry, yet shows high sensitivity to individuals who may pose national
security risks.
Experiments have been conducted using sensor technologies, avatar kiosk,
and devices as shown below. The collection, management, and analysis of the
various data streams in real-time represent significant challenges for ongoing
research. This project demonstrates a proof of concept for a data infrastructure and
that will meet the critical need of the ongoing research projects. A secondary
component of this project investigates the control of avatar as automated agent.
Deception Detection Model Interview Equipment Layout
Avatar Kiosk – 2nd generation
Video Cameras
Laser-Doppler
Vibrometer
Thermal Camera
Blink Camera
Pulillometry Cameras
Eye-Tracking
System
Linguistics
Vocalics
Kinesics
Blood Pressure
Heart Rate
Respiration
Eyes, Ears
Forehead /Face
Blink Rate/ Frequency
Change in Blink Activity
Iris identification
Pupil Dilation
Gaze Behavior
Object Recognition
Pattern Classification
Truth or
Deception
Data Fusion Model
Issues
• No infrastructure to capture, synchronize, and index
massive data streams
• Tape storage systems; Off-line/post annotation and
processing (labor intensive)
• Minimal avatar animation
Goal
• Real-time data capture & indexing to support Near Real-
time deception detection
• Real-time automated lip-synchronization & facial animation
with text-to-speech engines/raw voice data for avatar
Tasks
• Develop proof of concept for data infrastructure
• Identify /investigate suitable rendering
engine/programming languages for avatar animation
Real- time capture & Synchronization of audio/video stream
Time code generator/receiver; IRIG-B card
VBrick's Video Encoder Appliances; Recorders; Network Video
Encoders
Real-time Capture and store multiple audio/video stream
Using Stream 5 Software with frame grabbers
Retrieve captured stream; collect metadata (for indexing); thread data
to SQL Server 2008.
Inject real time textual data synchronized with live or stored video
Segmentation of Audio/Video Streams
Hierarchical Structure
Segment streams (question number used as primary key
Next level: segmentation by frame #
Time Intervals of event of interest
MS Sql Server 2008
FILSTREAM feature for storage
Full-text Engine & Full-text index - Linguistic search more accurate
that “like” operator; IFilter, IWordBreaker, and IStemmer.
Windows server 2008 Multimedia class scheduler
Media description : technical or semantic level
Technical –Low level aspect (automatic extraction)
Semantic - conveys higher level of abstraction
Use technical to build domain ontology and semantic annotation
Semantic Content Modeling
Why? Low-level features not adequate
Knowledge base built from collection of annotated images to serve as
surrogate for the actual data stream
Object and object properties form the basis of various multimedia
specific ontology
Spatial relationship between objects
Semantic content aspects can have generic applicability since virtually
all domains require some representations of events and objects,
including relationship between them
Data Infrastructure
(a) Literature reviews of current audio/video technologies for: streams
synchronization; capture, indexing, and storage; automated labeling to create anchor
models to be used in detection engines for deception detection.
Technologies surveyed:
Common Multimedia DB - Oracle Multimedia, IBM DB2 Image Extender,
IBM Informix Data Blades, Special purpose MMDBMS, e.g. MARS, RETIN
Microsoft FILESTREAM indexing/storage
MPEG-7 (part 5) for semantic content modeling
Use of encoders fro compression and indexing
(b) Prototyping/scripting MS SQL Server 2008 for multimedia data
Avatar Animation
3D- Software – Blender (for modeling/animation/rendering)
Game Engine – Unity (to create interactive content)
Programming/scripting in Python
Create fully modeled creature
Find best ways to produce and control actions of the model
Develop animation to make creature “lifelike” including saying phrases
Use Python code to control actions
Used poser as the second software for testing
Used downloadable character
Import the character into the 3D engine
Results
Methods Used
Encoder
Annotation Manager
Segmentation
Manager
Deception/Detection
Analysis Engine
Database Output
Audio/video
stream
MPEG-7 XML file
Avatar Animation
Real-time Audio/Video Analysis
Introduction
• Characters modeled in Blender
• Animation of characters
• Control actions with Python codes
• Control animation through time-line
• Poser tested simarly
• using the NLA action strips to control
predefined actions
• Posers downloadable characters and
animations
Avatar Animation
Using Blender and Poser restrict animation to time-line
The 3D software can be used to create the character before importing
them into game engine.
Limitation on number of poly count for importing into game engine
Other game engines need to be tested for the real-time constraint the
avatar animation must meet
Anchor Models
Conclusions
Editor's Notes
This poster template is from http://www.swarthmore.edu/NatSci/cpurrin1/posteradvice.htm. It is free, free, free for non-commercial use. If you really like it, I’m always thrilled to get postcards from wherever you happen to be presenting your poster. -- Colin Purrington, Department of Biology, Swarthmore College, Swarthmore, PA 19081, USA. cpurrin1@swarthmore.edu