This work was presented at IEEE Graph Computing Conference 2019 at Laguna Hills California. The work focused on graph based structured representation of video streams and complex event rules creation for pattern matching.
Decoding Loan Approval: Predictive Modeling in Action
VEKG: Video Event Knowledge Graph to Represent Video Streams for Complex Event Pattern Matching
1. VEKG: Video Event Knowledge Graph to Represent
Video Streams for Complex Event Pattern Matching
Piyush Yadav
Edward Curry
2. Executive Summary
Introduction
• Transitioning to Internet of
Multimedia Things (IoMT)
era.
• Video streams are pervasive
now.
• Complex Event Processing
Systems (CEP) detect event
patterns over data streams.
Problem
• Video streams have
unstructured
representation.
• Video Event Patterns are
complex.
• CEP systems have key
challenges to detect
patterns from video data
due to its low-level
features.
Solution
• Video Events: Defining
video events.
• VEKG: Structured
representation of video
data.
• Event Rules for Video
Pattern Matching.
• VEKG-TAG: Aggregating
VEKG for given state.
• Dataset and Queries.
• Event Extraction time.
• Graph Construction
and Search Time.
• Matching Latency and
Accuracy.
Results
1
3. Introduction
2
Internet of Things Era
Exponential Growth in Sensor Devices:
50 billion Devices in 2020, IPv6
Huge amount of Streaming Data
Enabling applications like Business
Intelligence, Surveillance and Monitoring
4. Introduction
Complex Event Processing Systems (CEP)
Easily express event patterns of interest using event rules.
CEP rules are continuous i.e. once registered will continuously monitor data streams.
Detect event patterns in (near) real-time by performing matching over data streams between
Data Producers (sensors) and Consumers(applications).)
notificationAvg.(temp)= 52 °C Avg.(temp)= 42.25 °C Avg.(temp)=32.6°C
t0=0t1=5t2=10t3=15
55 52 52 49 47 41 41 40 35 32 31
Temperature
Stream(°C )
CEP
Engine
Q1: Notify Fire Warning
Alert if avg. (temp) > 50 °C
in last 5 mins
query
Temp. Sensor
3
5. Introduction
Change in Data Landscape
Internet of Multimedia Things (IoMT)
[1]
400 hours of video every minute on
YouTube (Pichai 2018)
500K CCTV around London
Videos are Pervasive
1. S.A. et. al. “Internet of multimedia things: Vision and challenges,” Ad Hoc Networks, vol. 33, pp. 87–111, 2015 4
6. Video Stream
Q2: Notify High Traffic
Volume Alert
Q3: Notify Person
Sitting on Chair
Avg.(temp)= 52
55 52 52 49 47 41 41 40 35 32 31
Avg.(temp)= 42.25 Avg.(temp)= 32.6
Temperature Stream
( room, ’D2’ )
( temp, ’70’ )
( type, ’temp sensor’ )
temp
event
Structured
Representation
notification
Q1: Notify Fire Warning
Alert if avg. (temp) > 50
°C in last 5 mins
CEP
Engine
query
Motivation
t2 t1
No TrafficLow Volume TrafficHigh Volume Traffic
Person moving
towards chair
Person sitting on
chair
Person moving
away from chair
Low-Level Video
Frame Representation
t3 t0
Pattern Matching
over Unstructured
video
Pattern
Matching over
structured data
5
7. Challenges
How to identify relationships
between semantic concepts
of video content which occurs
over time and space?
C2
How to match spatiotemporal
CEP query rules over the
represented data model
efficiently at runtime?
C3
How to extract and represent
low-level video content and
video stream into a structured
data model with high-level
semantic concepts?
C1
6
8. Overview
❑ Discuss Semantic Concept in Videos
❑ Discuss Spatiotemporal Relationships among identified semantic concepts
❑ Create Structured Representation of Video Streams
Define Event Pattern Rules
VEKG- Time Aggregated Graph
Background
Video Event Knowledge Graph (VEKG)
❑ Discuss Video Events
❑ Perform Aggregation over VEKG for efficient state based matching
❑ Discuss event rules for different video patterns
Results
7
9. Background
E1
A1
E2
A2
E3
A3
R3
R2R1
E- Entity
A- Attributes
R- Relationship
person
Barack
Obama
city
Honolulu
country
United
States
locatedIn
Unstructured
Data
Knowledge
Extraction
Entity
Linking
Graph
Construction
Entity
Extraction
Attribute
Extraction
Link
Relationship
between
Entities
Barack Obama
was the president
of United States.
He was born in
Honolulu, a city
located in United
States.
Knowledge represented as entities, attributes and relationships
Entity Represent something in real world
Attributes Properties of entities
Relationships How entities are related
Knowledge Graph
Knowledge Graph Extraction Process
8
11. Video Event
Complex Video Event
❑ In CEP, complex events are considered as composed or derived
events which are constructed from simple events .
❑ Simple video events nested with different spatial, temporal and
logical operators to form a complex event.
Simple Video Event
❑ In CEP, a simple event is the instantaneous and atomic (i.e. either
exists entirely or not at all) occurrence of interest at a specific time
instance.
❑ Objects are the primary visual concepts which a user can
perceive from a video sequence.
❑ We consider object identification notification as simple video event.
Simple Video Event: Notify
‘car’ if present in a video
Complex Video Event: High
Traffic Volume in a video
10
12. Unstructured
Data
R1, R2: Object &
Attribute
Detection
Graph
Construction
Spatiotemporal
Relationship
R3: Object
Relationship
Intraframe Relationship
❑ Spatial Relation: object occupy spatial position within image
VEKG-Video Event Knowledge Graph
VEKG High-Level Extraction Process
Interframe Relationship
❑ Temporal Relationship: objects interact temporally across frames
Videos are sequence of image frames
Spatiotemporal Relationship
t=0t=1t=2t=3
11
14. Car 1
Car 2
Car 3
Car 1
Car 2
Car 3
Car 1
Car 2
Car 3
Frame (T1)Frame (T2)
Car 3
Car 1
Car 2
Car 1
Car 3
Car 2
Car 1
Car 2
Car 3
Frame (T3)
VEKG1 (T1)VEKG2 (T2)VEKG3 (T3)
Spatial
Relation
Temporal
Relation
Object
Node
❑ For any image frame, the resulting Video
Event Knowledge Graph is a labelled
graph with six tuples represented as
VEKG = {𝐕, 𝐄, 𝐀𝐯, 𝐑 𝐄, 𝛌 𝐯, 𝛌 𝐄 } where:
𝐕 = set of object nodes 𝑶𝒊
𝐄 = set of edges such 𝐄 ⊆ 𝐕 𝑿 𝐕
𝐀𝐯= set of properties mapped to each object nodes such that 𝑶𝒊= (id, attributes, label, confidence, features)
𝐑 𝐄= set of spatiotemporal relations classes
𝛌 𝐯, 𝛌 𝐄 are class labelling functions -𝛌 𝐯: 𝐕 → 𝑶 and 𝛌 𝐄: 𝐄 → 𝐑 𝐄.
❑ VEKG is a complete graph where each object is spatially related with other object in a frame
VEKG Graph
❑ A Video Event Knowledge Graph Stream is a sequence ordered representation of VEKG such that 𝑉𝐸𝐾𝐺 𝑺 =
𝑽𝑬𝑲𝐆 𝟏, 𝒕 𝟏 , 𝑽𝑬𝑲𝐆 𝟐, 𝒕 𝟐 … 𝑽𝑬𝑲𝐆 𝐧, 𝒕 𝐧 𝑤ℎ𝑒𝑟𝑒 𝒕𝝐 𝒕𝒊𝒎𝒆𝒔𝒕𝒂𝒎𝒑 such that 𝒕𝒊 < 𝒕𝒊+𝟏.
VEKG Graph Stream
VEKG-Video Event Knowledge Graph
13
15. Spatiotemporal Relationship Calculation
Geometry-Based Spatial Representation:
• Polygon based bounding box
Topology-Based Spatial Relation
• Used Dimensionally Extended nine- Intersection Model
• Nine Relations: {Disjoint, Touch, Contains, Intersect, Within, Covered by, Crosses, Overlap, Inside}
Direction-Based Spatial Relation:
• Fixed Orientation Reference System (FORS): 8 directions
Temporal Relation
• Allen Interval Algebra
Spatial Relation
14
16. VEKG Graphs
Graph
Constructor
Object Detector(TinyYOLO)
CNN Features
Object with
Bounding Box
Attribute Classifier
Region of Interest
Object
Tracker
DNN Models Cascade
Pattern
Matcher
VEKG Extraction Process
❑ Pattern Matcher: In CEP engine windows capture the number of image frames as VEKG graph and perform
spatial and temporal operations over it.
VEKG Extraction Architecture
Video Frame
Decoder
Video
Stream
❑ Video Frame Decoder: receives the raw video frames and processes them to low-level feature map using
video encoders.
❑ DNN Model Cascades: computer vision pipeline of different DNN models (object detectors, attribute
classifiers)
❑ Graph Constructor: constructs a timestamped graph snapshot for each frame.
15
18. Video Pattern Matching
VEKG
(T1)
VEKG
(T2)
VEKG
(T3)
Time Window
= 10 sec
Video
Stream
VEKG Graphs
Graph
Constructor
Object Detector(TinyYOLO)
CNN Features
Object with
Bounding Box
Attribute Classifier
Region of Interest
Object
Tracker
DNN Models Cascade
Pattern
Matcher
Video Frame
Decoder
Event Rules
Sitting, High Traffic Vol.
Reasoner
❑ 𝑻𝑰𝑴𝑬𝑾𝑰𝑵𝑫𝑶𝑾 ⊞ (𝑽𝑬𝑲𝑮 𝑺), 𝒕 : → 𝑺′
Window
Reasoner
❑ As per event rule weight represents overlap
relation threshold between objects- person
and chair .
❑ The reasoner performs matching by traversing
over VEKG nodes for given time window.
Table
Person
Chair
80
Table
Person
Chair
80
Table
Person
Chair
80
t=0t=1t=2
17
19. VEKG- Time Aggregated Graph(TAG)
❑ In videos, objects exist for a certain period and may retain across multiple frames.
❑ Leads of creation of redundant VEKG object nodes.
❑ Increase search time.
❑ Temporal Aggregation over Video Event Knowledge graph.
❑ VEKG-TAG is a labelled complete directed graph with 7 tuples such that
VEKG-TAG = 𝐕, 𝐄, 𝐀𝐯, 𝐑 𝐄, 𝐓, 𝛌 𝐯, 𝛌 𝐄 .
❑ Additional temporal dimension (T) adding to edges in a single aggregated view.
❑ [𝑛 𝑛 − 1 + 𝑛(𝑠𝑒𝑙𝑓 − 𝑙𝑜𝑜𝑝𝑠)] edges
VEKG-TAG
Car1
Car2 Car3
[12,15, X……]
T1, T2, T3….…
Car1
Car2
Car3Car3
Car2
Car1
Car1
Car2
distance
relation
105
3
T1T2T3
1215 1418
VEKG Stream VEKG-TAG
18
20. Video Dataset FPS Query
P1 Pexels 30.8 Q1: {Car}
P2 Pexels 30.2 Q2: {Car ˄ color: black}
P3 YouTube 31 Q3: {High Traffic Volume (Car)}
P4 Le2i 30 Q4: Sitting (Person ˄ Chair)
Experiments and Results
Dataset Specification
VEKG Extraction Time
VEKGExtractionTime(ms)
❑ Object+Attribute+Tracking takes maximum time.
❑ VEKG extraction time increases with increase in
number of objects in frames ~ 56.7 ms.
❑ Biggest bottleneck in system performance.
❑ 16- core Linux Machine
❑ Nvidia Titan GPU -12 GB RAM
System Specification
19
21. Experiments and Results
Graph construction time with
change in window size
Graph search time over
multiple queries
❑ Time to create VEKG graph for a given time window.
❑ Includes the time for creating nodes and edges
relations as per the query rules.
❑ VEKG and VEKG-TAG construction time is nearly
same – 2.2 and 2.5 sec for 5 sec window.
❑ Extra time requires only for VEKG-TAG nodes and
edges initialization.
❑ Graph search time is the time to search the event
pattern as per query rule.
❑ For 100 queries
▪ VEKG-TAG search requires – 61.7 ms
▪ VEKG search requires- 148.6 ms
❑ Search over VEKG-TAG 2.3X faster to VEKG(
for 100 queries)
20
22. Query Precision Recall F-Score
Q1_P1 0.90 0.72 0.80
Q1_P2 0.92 0.87 0.89
Q2_P2 0.86 0.73 0.79
Q3_P3 0.91 0.81 0.86
Q4_P4 0.80 0.71 0.75
Experiments and Results
Event Query Accuracy Event Matching Latency
❑ F-score of Q1_P1 is 0.80 is less as compare to
Q1_P2 (0.89) because P1 has more number of
objects leading to occlusion which reduces
accuracy.
❑ Sitting query (Q4_P4) has the least F-score of
0.75 because of more false positives.
❑ Average processing time of each state for
different query pattern.
❑ Q4 tries to extract the edges as sitting is a
relation between two object nodes thus its
latency is highest (1.2- 3 ms).
21
23. Executive Summary
Introduction
• Transitioning to Internet of
Multimedia Things (IoMT)
era.
• Video streams are pervasive
now.
• Complex Event Processing
Systems (CEP) detect event
patterns over data streams.
Problem
• Video streams have
unstructured
representation.
• Video Event Patterns are
complex.
• CEP systems have key
challenges to detect
patterns from video data
due to its low-level
features.
Solution
• Video Events: Defining
video events.
• VEKG: Structured
representation of video
data.
• Event Rules for Video
Pattern Matching.
• VEKG-TAG: Aggregating
VEKG for given state.
• Dataset and Queries.
• Event Extraction time.
• Graph Construction
and Search Time.
• Matching Latency and
Accuracy.
Results
22