More Related Content Similar to Video Analytics on Hadoop webinar victor fang-201309 (20) Video Analytics on Hadoop webinar victor fang-2013092. 2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.
What You Can Do
With Hadoop
Webinar Series
Unstructured Data – Video Analytics
September 6, 2013
Dr. Chunsheng (Victor) Fang, Sr. Data Scientist
Annika Jimenez, Global Head of Data Science Services
Nikesh Shah, Sr. Product Marketing Manager
3. 3© Copyright 2013 Pivotal. All rights reserved.
What You Will Learn
Pivotal Data Science Lab Services
New Emerging Trends for Unstructured Data
Video Analytics on Hadoop
Analytics with SQL
4. © Copyright 2013 Pivotal. All rights reserved.
Pivotal Platform
Cloud Storage
Virtualization
Data &
Analytics
Platform
Cloud
Application
Platform
Data-Driven
Application
Development
Pivotal Data
Science Labs
6. © Copyright 2013 Pivotal. All rights reserved.
Data Science Value Chain
Instrume
n-tation
Logs
Capture
Store
Transfor
m and
Prepare
Access
Model
Developm
ent
Deploy
Applicatio
ns
Process
Change
Product
Engineer
Platform
Engineer
DBA
Data
Engineer/Progr
ammer
Data
Engineer Data
Scientist
Platform
Engineer
Application
Developer
PMO
7. © Copyright 2013 Pivotal. All rights reserved.
How We Help Our Customers
1. Data Science Strategy Definition
2. Point Proof-of-Value Model Development
3. Multiple Model Development + Apps
4. DSIC Transformation to “Predictive Enterprise”
5. Also:
– Algorithm development
– Pushing the envelope in problem-solving
Pivotal Data
Science Labs
8. © Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science Knowledge Development
9. © Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science Dream Team
• Derek Lin – Network Security, Fraud Detection, Speech and Language
Processing, (Principal Scientist at RSA, M.S. in Signal Processing, USC)
• Hulya Farinas – Optimization, Resource Allocation in Healthcare (Modeler
at M-Factor, IBM, Ph.D. in Operations Research, University of Florida)
• Kaushik Das – Mathematical Modeling in Energy, Retail and Telco(Director
of Analytics at M-Factor, M.S. in Mineral Engineering, UC Berkeley)
• Sarah Aerni – Genomics and Machine Learning (Ph.D. in Biomedical
Informatics, Stanford)
• Mariann Micsinai – Next Generation Sequencing (Market Risk Management
Associate at Lehman Brothers, Ph.D. in Computational Biology, NYU and
Yale)
• Victor Fang – Imaging and Graph Analytics, Machine Learning (Sr. Scientist
at Riverain Medical, SDE at Amazon.com, Ph.D. in Computer Sciences,
University of Cincinnati)
• Emily Kawaler – Clinical Informatics and Machine Learning (M.S. in
Computer Sciences, University of Wisconsin-Madison)
• Anirudh Kondaveeti – Trajectory Data Mining and Machine Learning (Ph.D.
in Computing & Dec. Systems Eng, Arizona State University)
• Hong Ooi – Insurance and Finance Risk Modeling (Statistician at ANZ,
Ph.D. in Statistics, Australian National University)
• Michael Brand –Text, Speech and Video Research for Retail, Finance and
Gaming (Chief Scientist at Verint Systems, M.S. in Applied Mathematics,
Weizmann Institute)
• Kee Siong Ng – Data Mining in Healthcare
(Sr. Data Miner at Medicare Australia, Ph.D. in Computer Science, and
Postdoctoral Fellow, Australian National University)
• Noelle Sio – Digital Media Analytics and Mathematical Modeling(Sr. Analyst at
eHarmony, Fox Interactive Media (Myspace), M.S. in Applied Mathematics, Cal
Poly Pomona)
• Jin Yu – Stochastic Optimization, Robust Statistics in Machine Learning,
Computer Vision (Research Associate at U of Adelaide, Ph.D. in Machine
Learning, Australian National University)
• Rashmi Raghu – Computational Methods and Analysis (Ph.D. in Mechanical
Engineering, Stanford)
• Woo Jung – Bayesian Inference and Demand Analysis (Sr. Statistician at M-
Factor, M.S. in Statistics, Stanford)
• Jarrod Vawdrey – Marketing Analytics & SAS (Analytics Consultant at Aspen
Marketing, B.S. in Mathematics, Kennesaw State University)
• Niels Kasch – Text Analytics and NLP (Ph.D. in Computer Science, UMBC)
• Vivek Ramamurthy – Online Learning, Stochastic Modeling, Convex
Optimization (Ph.D. in Operations Research, UC Berkeley)
• Srivatsan Ramanujam – NLP and Text Mining
(Natural Language Scientist at Sony, Salesforce.com, M.S. in Computer
Sciences, UT Austin)
• Alexander Kagoshima – Time Series, Statistics and Machine Learning (M.S. in
Economics/Computer Science, TU Berlin)
10. © Copyright 2013 Pivotal. All rights reserved.
Data Science Labs: Packaged Services
LAB PRIMER
(2-Week Strategy)
• Customized Analytics
Roadmap
• 1-day Moderated
Brainstorming Session
• Prioritized
Opportunities
• Architectural
Recommendations
LAB 600
(6-Week Lab)
• Prof. Services
(Data Load)
• Data Science
Model Building
• Project
Management
• Ready-to-Deploy
Model(s)
LAB 1200
(12-Week Lab)
• Prof. Services
(Data Load)
• Data Science
Model Building
• Project
• Management
• Ready-to-Deploy
Model(s)
LAB 100
(2-Week Lab)
• On-site Pivotal
Analytics
Training
• Rapid Model/Insight
Build on Customer
Data
(2 weeks)
11. © Copyright 2013 Pivotal. All rights reserved.
Approach: Data Science Lab 1200
Week
1 2 3 4 5 6 7 8 9 10 11 12
Data
Exploration
Features Building
Model Development
Code QA and
Scoring
Model Optimization
& Validation
Data
Loaded
Insights
Presentation
Training
Preliminary
Model Review
Feature Review
Data Review
Documentation
12. © Copyright 2013 Pivotal. All rights reserved.
Program Management Data Architecture and
Engineering
Data Scientists
Training and Skills
Development
Facilitate data loading
processes from source
systems to Pivotal Data
Fabric
Coordinate data needs
with Data Scientists
Best practice education
for analytics performance
Data migration to
support new applications
Oversight and
communication plans
Organizational alignment
Risk mitigation
Resource planning
Prioritize deliverables
Socialize progress of
overall initiative
Instill data collaboration
culture
Execute Data Science
Lab engagements around
revenue generation or
cost saving efforts
Hands on education with
new data analysis
techniques
Introduce new analytics
tools and methodologies
Identify candidates for
deeper data science training
Create training curriculum
Recruiting Methodology
Parallel computing
techniques defined and
demonstrated
Build institutional
knowledge for client data
science team
Data Science Innovation Center (DSIC)
Key Principles
• Building a predictive enterprise is, first and foremost, about building a human infrastructure.
• Analytics is an iterative knowledge discovery process and needs to be managed as such.
• Discovery starts from asking the right questions – that can be as important as finding
answers to those questions.
13. © Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.
Large Scale Video Analytics
Platform on Hadoop
Dr. Chunsheng (Victor) Fang, Sr. Data Scientist
14. © Copyright 2013 Pivotal. All rights reserved.
Pivotal Video Analytics Taskforce
Chunsheng (Victor) Fang, Ph.D.
– Sr. Data Scientist
Regunathan Radhakrishnan, Ph.D.
– Sr. Data Scientist
Derek Lin,
– Principal Data Scientist
Sameer Tiwari
– Hadoop Architect
Kenneth Dowling & Michael Nemesh
– DCA Admin
15. 16© Copyright 2013 Pivotal. All rights reserved.
Industry Use Case
Surveillance Video Anomaly Detection
16. © Copyright 2013 Pivotal. All rights reserved.
Anomaly Detection in Surveillance Video
Detect anomalous objects in a restricted perimeter.
Typical large enterprise collects TB’s video per day.
Hadoop MapReduce runs computer vision algorithms in parallel
and captures violation events.
Post-Incident monitoring enabled by Hadoop / HAWQ.
17. © Copyright 2013 Pivotal. All rights reserved.
Unstructured Video Data Workflow
Unstructured data as input
ETL: Distributed Video Transcoder
Analytics: Distributed Video Analytics
Structured Insights in relational database for advanced analytics
ETL Analytics
Unstructured
Data
Structured
Insights
18. © Copyright 2013 Pivotal. All rights reserved.
Real World Video Data
• Benchmark Surveillance Videos (i-LIDS) from United Kingdom Home
Office
– Library of HiDef CCTV video footage based around ‘scenarios’ central to the
government’s requirements.
– The footage accurately represents real operating conditions and potential
threats.
• Anomaly Detection: Sterile zone dataset
Night Day
19. © Copyright 2013 Pivotal. All rights reserved.
Most Common Video Standards
MPEG & ITU: responsible for many video standards
MPEG-2 (1995): Widely adopted, DVDs, Digital TV broadcast, set-top boxes
20. © Copyright 2013 Pivotal. All rights reserved.
Intro to MPEG Standard
MPEG standard encodes video frames
– Redundancy in time: inter-frame encoding
– Redundancy in space: intra-frame encoding
Motion compensation
– I-frame: (Key frame) intra-frame encoding
– P-frame: (Predicted frame) Predicting regions of
current frame from previous frame
– B-frame: (Bi-predictive frame) Predicting regions of
current frame using both previous and next frame
21. © Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved. 22© Copyright 2013 Pivotal. All rights reserved.
Distributed Video Transcoder
on Hadoop
Distributed MapReduce MPEG Transcoder
22. © Copyright 2013 Pivotal. All rights reserved.
Motivation of Distributed Video Transcoding
Can we decode the individual frames from an arbitrary block
in Hadoop File System (HDFS)?
Hadoop splits any file into 64MB or 128MB blocks in HDFS.
Each block can be processed in parallel by customized
Map-Reduce function
Most video file standards are Not Hadoop-Friendly.
23. © Copyright 2013 Pivotal. All rights reserved.
Decoding MPEG-2 with MapReduce
Two key observations
– Video header information: available only at the header in the bitstream
– Group of Pictures (GOP) header repeats
Steps to decode arbitrary blocks
– Step 1: Configure each mapper to extract the header information from each file;
▪ Totals ~20 videos at 5GB
– Step 2: Start searching for GOP header in each block in parallel;
– Step 3: Decode frames into a suitable image format (JPEG, BMP, etc);
– Step 4: Consolidate all time-stamped frames into Hadoop Sequence File.
▪ Reduces to sequence file at 500MB
Transcoding MPEG-2 video into Hadoop-friendly format
24. © Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.
Distributed Video Analytics Platform
on Hadoop
25. © Copyright 2013 Pivotal. All rights reserved.
Object Detection with Gaussian Mixture Model
• The video data is much more noisier than we realize.
• You don’t realize it because your visual cortex can denoise.
• For computer, it requires good statistical models (e.g. GMM) for
robustness.
Distribution of pixel intensities over time
26. © Copyright 2013 Pivotal. All rights reserved.
Typical Video Analytics Workflow
Video/image data are highly unstructured
Hadoop proven to be excellent in extracting structured insights
from Big Data
A typical workflow:
ANALYTIC
RESULT
Foreground
Extraction
Background
Stat Model
Visual Key
Composite
Key
Feature Extraction
/Classification
((Key, Time), Loc)
27. © Copyright 2013 Pivotal. All rights reserved.
Use Case 1: Anomaly Detection
Extracting structured info from Unstructured data
Computer vision algorithms fit into Mapper/Reducer framework
Intermediate (Key, Value)
– (RestrictedArea, IntrusionEvent(Time, ViolatorImage) )
Map
Reduc
e
HDFS
Map
Map
Map
HDFS / GPDB
Reduc
e
Reduc
e
2012-09-01 07:00:00
28. © Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis
Tracking multiple objects in Big Data video archives
Building high level summarization e.g. moving trajectory time
series
T1 T2 T3
T4 T5 T6
29. © Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis “Map”
Map
Foreground
Extraction
Background
Stat Model
Visual Key
Composite
Key
Feature
Extraction
/Classification
((VisKey, time), loc)
Emit(K,V)
30. © Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis “Reduce”
Reduce
Aggregate
User defined
Trajectory
model
(Object,
Trajectory)
2nd Sort on
Composite key
((VisKey, time), loc)
31. © Copyright 2013 Pivotal. All rights reserved.
Video Analytics Platform Supports
Video ETL
– Support standard formats: MPG, AVI, MP4.
– Sequence file in HDFS
Image Processing Toolkit
– Support standard formats (e.g. JPEG, BMP, PNG)
– Color space conversion
– Edge/key point detection
– Morphological processing
– Filtering: convolutional, median, etc.
PHD MapReduce for scalable computer vision algorithms
HAWQ SQL for high level analytics
33. © Copyright 2013 Pivotal. All rights reserved.
Performance Quick Facts
Each frame takes 103 millisecond to process a
720x576 video frame (near real time even in Java)
Detection algorithm: Linearly scale with
#processors
• Impacts:
• Enhance public security
• Improve security officers’ producitivity
34. © Copyright 2013 Pivotal. All rights reserved.
Querying the Analytics Results
• Average speed of the red car on yesterday, using window function
SELECT sqrt(power(avg(abs(x_diff)),2) + power(avg(abs(y_diff)),2))*FPS_MPS_FACTOR
FROM (
SELECT
X-lag(X,1) OVER (ORDER BY TIME ) AS x_diff,
Y-lag(Y,1) OVER (ORDER BY TIME ) AS y_diff
FROM SANMATEO
WHERE TARGET =
AND TIME > (CURRENT_TIMESTAMP – INTERVAL ‘1’ DAY)
AND TIME < (CURRENT_TIMESTAMP );
) x_tmp;
• RESULT:
• 7.2 mph
35. © Copyright 2013 Pivotal. All rights reserved.
More Use Cases
Most of computer vision algorithms are embarrassingly parallel
No data sharing between processes
– Feature extraction
– Object detection/classification
Video Categorization for user generated contents
– Find out trending in Youtube videos by topic modeling
Object Detection
– Detect known categories of objects, e.g. face, bar code,
vehicle.
Object Search
– Given a known object, using template matching to locate
the object
Haar-like + AdaBoost Cascade Face Detector
36. © Copyright 2013 Pivotal. All rights reserved.
Summary
Hadoop : a great tool for data scientists to crunch Unstructured
Big Data!
Hadoop extracts Structured insights from Unstructured video
with customized computer vision algorithms.
Scalable framework with ease of experimenting, developing,
deploying!
Pivotal HD demonstrates large scale video analytics use cases:
– Anomaly detection
– Trajectory analysis
– More …
37. 48© Copyright 2013 Pivotal. All rights reserved. 48© Copyright 2013 Pivotal. All rights reserved.
Q&A
38. © Copyright 2013 Pivotal. All rights reserved.
More Information
Pivotal Blog Site August 12, 2013
Large Scale Video Analytics
Contact the Data Science Lab Services
info@gopivotal.com
39. 50© Copyright 2013 Pivotal. All rights reserved. 50© Copyright 2013 Pivotal. All rights reserved.
Thank You
Editor's Notes Not demoing the HAWQ integration today. In surveillance video, most of time nothing interesting happens
Manually Fast Forward/Backward to locate events is painful
Gets even worse with TB’s video data!