Defense_20140625

Video Summarization in Video Sensor Networks
Presenter: Shun-Hsing Ou (歐順興)
Advisor: Shao-Yi Chien (簡韶逸博士)
Media IC & System Lab
Graduate Institute of Electronics Engineering
National Taiwan University

• Widely applied in our daily life
Video Sensor Network (1/2)
Media IC & System Lab Shun-Hsing Ou 2
TrafficSecurity Environment Monitoring

Video Sensor Network (2/2)
• The EYEs of Machine-to-Machine (M2M)
or Internet-of-Things (IoT)
Plenty of video sensor companies in M2M or
IoT applications shown in Computex 2014.
Goal-line Technology in
FIFA World Cup 2014

Problems
• Video data is usually very large
– Large storage space
– Large transmission data
• Watching video is usually time consuming

Wireless Video Sensor Network (1/2)
• Streaming videos through wireless
communication
– Without wire = more flexible
• Wider coverage
• Better view angles

Wireless Video Sensor Network (2/2)
• Power is the key
– Powered by
• Batteries
• Energy harvest devices
– Streaming video requires large power.

An efficient video management and
filtering method is required

Redundancy of Video Data
• Video usually contains redundant data
– Repeated events
– Overlapped field-of-views

Automatic Video Summarization
• Generating short representation of original
video
• Providing an excellent solution for video
management

Our Idea
• Applying multi-view video summarization
in video sensor networks
– Saving storage space
– Saving transmission data
– Saving power
– Increasing usability
Video Sensor
Sensor Encoder Transceiver
Server
Analyzer
data
info
Summarization Unit

Contributions
• Propose to apply video summarization algorithms
in (wireless) video sensor networks
– Saving 60% ~ 90% storage space & transmission data
– Saving 50% ~ 80% power
– Increasing usability
• Propose an efficient video summarization
algorithm
– Multi-view
– Distributed
– On-line
• Implement real wireless video sensor networks
with summarization system

Outline
• Background
• Proposed summarization algorithm
• Experiments
• Implementations
• Conclusion

Background

Requirements (1/2)
• Multi-view
• On-line
• Distributed
• Low-complexity
Video Sensor
Server
Analyzer
data
info
Summarization Unit

Requirements (2/2)
• 28 summarization methods were surveyed
– Only 4 on-line approaches
– Only 7 multi-view approaches
– No multi-view AND on-line approach
– Existing on-line approaches require large memory and computing
power
– Existing multi-view approaches are centralized
TMM. 4
CVPR. 5
ICIP. 2
ACMMM. 4
ICME. 4
CSVT. 1
ICCV. 2
Other. 6• As a result, a new
summarization algorithm
is required
Conferences and journals
of the references

Proposed Distributed On-line Multi-view
Video Summarization

System Structure
• Two stages design
– Intra-view stage
– Inter-view stage
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 1
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 2
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 3
Server
Video
Feature
Intra-view Stage Inter-view Stage

Intra-view Stage: Overview
• On-line single-view video summarization
– Clustering
• A common technique of video summarization
• Applied to reduce redundancy
– On-line clustering is applied in our system
GMM
Cluster 1 Cluster 2 Cluster n…
On-line
Clustering
Feature
Extraction
Frame
Selection
Input
Frame
Summarization

Intra-view Stage: Feature Extraction
• Frame representative feature is required
• MPEG7 color-layout descriptor is applied
– Simple
– Good representative ability

Intra-view Stage: Clustering (1/2)
• Gaussian Mixture Model
– Each cluster has three parameters
• Mean
• Covariance
• Weighting
– At time t, the probability of each feature can be
represented as

Intra-view Stage: Clustering (2/2)
• Parameter estimation
– EM is usually applied in off-line applications
– On-line estimation
• Step 1: Matching
• Step 2: Updating
:pre-defined learning rate
:1 for matched component, 0 otherwise

Intra-view Stage: Frame Selection
• Using clustering parameters
– Low-weighting cluster: rare events
– High-variance cluster: high activity events
• Algorithm:
– Step 1: Sort clusters in ascending order by
– Step 2: Keep frames if
:pre-defined summarization rate

Intra-view Stage: Another Point of View (1/2)
• The difficulty of on-line summarization
– Partial Information
Off-line Process
Video Data
On-line Process
On-line Process with
Memory Limitation

Intra-view Stage: Another Point of View (2/2)
• The Gaussian-Mixture-Model keeps the
information of previous frames
– A model for what is redundant and what is
active
• No frame buffer is required
GMM
Cluster 1 Cluster 2 Cluster n…
On-line
Clustering
Feature
Extraction
Frame
Selection
Input
Frame
Summarization

Inter-view Stage: Overview
• View selection
• Distributed view selection
– Exchange features & scores between sensors
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 1
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 2
Video
Sensor
On-line Single-view
Summarization
Content Matching &
View Selection
Sensor 3
Server
Video
Feature
Intra-view Stage Inter-view Stage

Inter-view Stage: Overview
• Step 1: Extract inter-view feature and score for
each frame
– Color Layout Descriptor is not suitable
• Step 2: Exchange features and scores with other
sensors
• Step 3: If there is a “matched” feature with higher
score, drop the current frame

Inter-view Stage: Feature Extraction
• Step 1: Foreground mask
– By color layout feature & GMM
• Step 2: Extract HSV histogram of the foreground pixels. (H: 16,
S: 2, V: 2) as the inter-view feature
• Step 3: Mask size is used as the frame score

Result

Experiments

Dataset (1/2)
• Three datasets are applied
– BL-7F: 19 videos, 320 x 240, 30 FPS
– Office1: 4 videos, 640 x 480, 30 FPS
– Lobby1: 3 videos, 640 x 480, 30 FPS
1Yanwei Fu, et al., “Multi-view Video Summarization,” TMM 2010

Dataset (2/2)
• Ground truth
– People who have no knowledge of our project
were asked to mark time period of events in
each video
– They were also asked to add flags if two
segments from different views are the same
event

Experiments
Intra-view Stage

Intra-view Stage: Evaluation
• Single-View Video Summarization
– Frame level precision & recall are applied
• Precision: the ability of the algorithm to remove
useless content
• Recall: the ability of the algorithm to keep important
events

Intra-view Stage: Baseline
• Tree-based1
– D = 30
– D = 90
• Compressed domain2
1Víctor Valdés, et al., “Binary Tree Based On-line Video Summarization,” TVS 2008
2J. Almeida, et al., “Online Video Summarization on Compressed Domain,” JVCIR 2012

Dataset Method Precision Recall F1 score
BL-7F
(19 videos)
Tree, D=30 15.4% 63.1% 0.25
Tree, D=90 21.9% 77.2% 0.34
Compressed 30.4% 44.4% 0.36
GMM 62.6% 74.4% 0.68
Office
(4 videos)
Tree, D=30 15.3% 77.5% 0.26
Tree, D=90 17.8% 79.8% 0.29
Compressed 15.5% 49.3% 0.23
GMM 44.4% 88.0% 0.59
Lobby
(3 videos)
Tree, D=30 77.0% 52.3% 0.62
Tree, D=90 79.9% 42.6% 0.56
Compressed 48.0% 50.3% 0.49
GMM 72.0% 90.5% 0.8

Experiments
Inter-view Stage

Inter-view Stage: Evaluation
• Multi-View Video Summarization
– Cross-view redundant frame are calculated as
false positive

Inter-view Stage: Baseline
• Baseline
– Concatenate the results of single-view
methods
• Tree-based
• Compressed domain
• The proposed GMM
– Graph-based1
• The results are provided by the authors

Dataset Method Summary
Length (s)
Precision Recall F1 score
BL-7F
19 videos
8170 s
Tree 3150 12.1% 78.0 0.21
Compress 1544 14.4% 45.3% 0.22
GMM 1255 33.4% 85.6% 0.48
GMM + Inter-view stage 516 58.0% 61.2% 0.60
Office
4 videos
2613 s
Tree 887 11.8% 71.9% 0.20
Compress 856 10.1% 59.4% 0.17
Graph-based 109 34.4% 29.7% 0.32
GMM 532 24.2% 88.5% 0.38
Lobby
3 videos
1484 s
Tree 291 62.2% 48.8% 0.55
Compress 745 26.6% 52.2% 0.35
Graph-based, level 1 149 80.6% 33.8% 0.48
Graph-based, level 2 277 64.4% 50.2% 0.56
GMM 893 39.2% 92.4% 0.55

Experiments
Complexity

Complexity (1/2)
• Tested on EeePC
– CPU: ATOM N570
– RAM: 2GB
• Dataset: Office
– 640 X 480
• All methods are implemented using C++

Video Skimming: Complexity (2/2)
Tree-Based,
D-30
Tree-Based, D-
90
Compressed
Domain
GMM
FPS (f/s) 21.8 18.8 9.3 34.7
Latency (s) 30 90 ~200 ~0
# Buffered Frames 900 2700 ~6000 1
Memory > 414.7 MB > 1244.1 MB > 2764.8 MB 474.6 KB

Experiments
Power Analysis

Power Analysis
• We compare the power consumption
– With/Without summarization
• Platform: EeePC
– Battery power is measured
– DVC is applied as the encoder
1 S.-Y. Chien, et al., Power consumption analysis for distributed video sensors in machine-to-machine
networks,“ JETCAS 2013

Without Summarization
• Total power
– Encoding power (Pc)
– Transmission power (Pt)
Wireless Video Sensor
data
Server
Analyzer info

With Summarization
• Total power
– Encoding power (Pc)
– Video transmission power (Pt)
– Feature transmission power (Pf)
– Summarization power (Ps)
Video Sensor
Server
Analyzer
data
info
Summarization Unit

0
20
40
60
80
100
120
DVC DVC + Intra-view Stage DVC + Inter-view Stage
Power(mW)
Pf: Feature Transmission Power
Ps: Summarization Power
Pt: Transmission Power
Pc: Encoding Power
BL-7F, Processor-Based
73.5%

Implementation

Implementation
• We use Raspberry Pi to implement our
wireless video sensor network

Raspberry Pi
• Spec
– SoC: Broadcom BCM2835
– CPU: 700 MHz ARM11
– GPU: Broadcom VideoCore IV @ 250 MHz
– Memory: 512 MB
– Power: 5V x 700mA = 3.5W
• Related I/O
– 5V Micro USB power input
– Two USB I/O
– Camera Serial Interface (CSI)

Wireless Video Sensor

Video Acquisition and Encoding (1/2)
• We need raw RGB from camera module
– Color space conversion is slow
• We need to encode video after
summarization
– Encoding is a high-complexity task

Video Acquisition and Encoding (2/2)
• Hardware Acceleration: Broadcom
VideoCore IV
– Hardware camera pipeline
– Hardware H.264 encoder/decoder
– OpenMAX API

Synchronization
• Network Time Protocol (NTP)
– Error may be large when cross domains
(>100ms)
– Error is small in local (< 1ms)
• We create NTP server in our server

Result

Demo

Conclusion

Conclusion
• In this thesis, we propose to apply
summarization on video sensor network
– Saving 60% ~ 90% storage space & transmission
data
– Saving 50% ~ 80% power
• A distributed on-line multi-view
summarization algorithm is proposed
– Low-complexity, low memory requirement
– Generating comparable results with other
methods
• A wireless video sensor network is
implemented to validate the concept

Thank You

Appendix: Proposed System II -
Distributed On-line Multi-view Keyframe
Extraction

Representation of Video Summarization (1/3)
• Video Skimming: A short video highlight
– More enjoyable to watch
– Better for further vision processing
• Keyframe Extraction: Representative
keyframes
– More compact representation
– Better for video browsing, surveillance, etc.

• Storyboard: Arranged keyframes
• Fast forwards: Smart video player
• Video Synopsis: Retargeting in time
domain
1Y. Pritch, et al., “Webcam Synopsis: Peeking Around the World,” ICCV 2007

• “Video skimming” and “Keyframe
extraction” are better for video sensor
networks
– The results are more suitable for other vision
processing
– We focus on data filtering instead of summary
representation

Video-MMR1 (1/2)
• Video maximum marginal relevance
• Iterative algorithm
– Select one frame with max Video-MMR at one
time
1Yingbo Li, et al., “Multi-video Summarization Based on Video-MMR,” WAMIAS 2010
- Frame
- Set of all frames
- Frames in summary
Represent ability Redundancy

Video-MMR1 (2/2)
• Centralized algorithm
• Off-line algorithm

Distributed On-line Video-MMR (1/2)
• Perform operation for every fixed time
period T
– is used instead of , where is the
set of frame captured from t to t + T
– Avoid buffering all frames
• If there are M camera
– We change MMR to

ServerSensor
Sensor
Sensor
Sensor
Sensor

• First term can be calculated at each sensor
• Second term can be calculated by sending
all feature of from the server to sensors
– Large data overhead
• We send frames as

Data Overhead
• There is large data overhead if we want to
send all features belong to to all sensors
• MsWave1 is applied
– MsWave is a distributed kNN/kFN algorithm
– MsWave reduce large amount of data
exchanged
1J.-P. Wang, et al., “Communication-efficient distributed multiple reference pattern
matching for M2M systems, ” ICDM 2013

MsWAVE
• Distributed kNN/kFN search algorithm
between a group of sensors and a server
• Haar transform is applied to generate
coarse level feature
– Upper bond and lower bond are estimated
using the coarse feature

Appendix: Experiments
Keyframe Extraction

Keyframe Extraction: Evaluation
• Metrics
– Event recall
– Redundant keyframe

Keyframe Extraction: Baseline
• Single-view
– Uniform sampling (US)
– Random sampling (RS)
– Visual attention based1 (VA)
• Multi-view
– MMR2
– K-means (KM)
1Y.-F. Ma, “A Generic Framework of User Attention Model and Its Application in Video
Summarization,” TMM 2005

Keyframe Extraction: Extra Data
• Since keyframes are much smaller than
video skimming
– Extra data becomes relatively large
• We compare extra data with centralized
method, which features of all frames are
sent

Single-view Multi-view
RS US VA KM MMR Ours
BL-7F
(19 videos)
Keyframe 77 77 82 77 77 77
Recall (%) 22 30 74 74 67 74
Redundant Frame 1 3 64 38 36 32
Data Sent (%) 0 0 0 100 100 33
Office
(4 videos)
Keyframe 94 94 116 94 94 94
Recall (%) 13 18 52 52 66 63
Data Sent (%) 0 0 0 100 100 26
Lobby
(3 videos)
Keyframe 70 70 117 70 70 70
Recall (%) 66 63 72 72 70 76
Data Sent (%) 0 0 0 100 100 16

Appendix: Baselines

On-line Summarization (1/3)
• Tree-based Method1
– Type: video skimming
– Method:
• On-line decision tree
– Cons
• Long latency
• Large memory required
1Víctor Valdés, et al., “Binary Tree Based On-line Video Summarization,” TVS 2008

• Summarization in compress domain1
– Method
• On-line shot detection: calculate different between frames
• Redundancy removal
– Cons
• Long latency
1J. Almeida, et al., “Online Video Summarization on Compressed Domain,” JVCIR
2012

• Visual Attention Model1
– Type: keyframe
– Method
• Visual attention index
• Attention curve peek detection
– Cons
• Not able to remove redundant frames
1Y.-F. Ma, “A Generic Framework of User Attention Model and Its Application in Video Summarization,”
TMM 2005

Multi-view Summarization (1/2)
• Clustering1
– Method
• Shot detection
• Graph
• Clustering
– Cons
• Centralized
• High-complexity

Multi-view Summarization (2/2)
• MMR1
– Type: keyframe extraction
– Method:
• Video maximum marginal relevance
– Cons
• Centralized
Represent ability Redundancy

Appendix: Detailed Results

Video Skimming
• The result is like video skimming
– Parameter updating is smooth

Tree-based, D=30

Tree-based, D=90

Compress Domain

The Proposed GMM Approach

Video Skimming: Packet Loss
• Dataset: BL-7F
• Each sensor has a uniform probability
failing to receive a feature

Platform
• Processor-based
– EeePC
– Battery power is measured
• ASIC-based1
– Transmission power is
estimated
– H.264 power is estimated
– Summarization power is
estimated
1 S.-Y. Chien, et al., Power consumption analysis for distributed video sensors in machine-to-machine
networks,“ JETCAS 2013

BL-7F, ASIC-Based
0
5
10
15
20
25
No motion DVC DVC + Intra Stage DVC + Inter Stage
Power(mW)
Pf: Feature Transmission Power
Ps: Summarization Power
Pt: Transmission Power
Pc: Encoding Power
83.4%

Appendix: Others

Video Acquisition and Encoding

Communication Issues
• Feature broadcasting
– Only need to broadcast to nearby sensors
• Communication latency
– An additional buffer is needed
• Synchronization
– Clocks of all sensors are synchronized

Wireless Video Sensor Network
• Connected by a single Wi-Fi AP

Communication Channel
• 3 TCP channels are connected to the
server for each sensor
– Video Channel: Streaming video
– Feature Channel: Exchanging features
– Control Channel: Control signals, time
information

Defense_20140625

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Defense_20140625

Similar to Defense_20140625 (20)

Defense_20140625