From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Novel large scale digital forensics service platform for internet videos
1. NOVEL LARGE SCALE DIGITAL
FORENSICS SERVICE PLATFORM
FOR INTERNET VIDEOS
Abd El-Fattah Hussein Mahran
2. AGENDA
Problem Definition
Problem solution
Content delivery network
Proposed Architecture overview
Challenges
Main paper contribution
Digital Video Forensics (DVF) Techniques
Digital Video Forensics Systems
Proposed System Architecture
System Deployment and Performance Evaluation
Conclusion
Future work
References
4. PROBLEM SOLUTION
Developing of large scale digital video forensics
system for prosecuting & deterring digital crimes
in the Internet.
Paper gives a proposal, design & implementation
of a novel large scale digital forensics service
platform (DFSP) that can effectively detect
illegal content from the Internet Videos.
5. WHAT IS CDN
A content delivery network (CDN) is a large distributed system
of servers deployed in multiple data centers in the Internet.
The goal of a CDN is to serve content to end-users with high
availability and high performance. CDNs serve a large fraction of
the Internet content today, including web objects (text, graphics,
URLs and scripts), downloadable objects (media files, software,
documents), applications (e-commerce, portals), live
streaming media, on-demand streaming media, and social
networks.
Several protocol suites are designed to provide access to a wide
variety of content services distributed throughout a content
network. The Internet Content Adaptation Protocol (ICAP) , A
more recently defined and robust solution is provided by the Open
Pluggable Edge Services (OPES) protocol.
6. PROPOSED ARCHITECTURE
OVERVIEW
A distributed architecture taking the advantage of Content
Delivery Network (CDN) to improve scalability which gives
the advantage to process enormous number of Internet
Videos in real time.
Proposed architecture will be specifically a CDN based
Resource Aware scheduling (CRAS) algorithm which
schedules the tasks efficiently in the DFSP according to the
resource parameters such as delay & computation load.
The DFSP will be deployed in the Internet which
integrates the CDN based distributed architecture & CRAS
algorithm with a large scale video detection algorithm &
evaluate the deployed system.
Evaluation results demonstrates the effectiveness of the
Platform.
7. TODAY Number of the Digital Videos distributed over the Internet increases exponentially
which creates a need to solve their security problems.
Studies identified that 23.8% of the global internet traffic can be associated with
the illegal distribution of copyrighted work & 91% of the Internet pornographic
materials are videos that could be downloaded from different media sources.
Reacting to all of this raises the needs to develop a digital forensics system for large
scale Internet videos
Most existing Forensics Systems focus on how to deal with content robustness &
identification of their accuracy leaving the consideration of detecting the legality
of the large scale Internet videos in real time manner even those reports which
works on the efficiency & scalability for large scale video content identification
focus on solving these problems from video retrieval algorithmfocus on solving these problems from video retrieval algorithm
perspectiveperspective
Through this paper, a different dimension will be addressed which is the efficiency &
scalability issues from system perspectiveissues from system perspective
8. CHALLENGES
The large amount of computation brought by
analyzing & detecting large volume of video data.
This makes it difficult to serve a large number of
users concurrently.
The large amount of communication brought by
transmitting a great volume of video data from
media sources to the system.
9. SCALABILITY PROBLEM
To solve the Scalability problem, paper propose
to build DFSP upon CDN pushing the massive
forensics tasks to the most appropriate CDN
nodes thereby effectively reducing the
computational cost & communication cost.
10. EFFICIENCY PROBLEM
To solve efficiency problem , paper proposes CDN-based
Resource-Aware Scheduling (CRAS) algorithm.
Existing network-aware resource scheduling algorithms include
non-adaptive scheduling algorithms that use some heuristics to
select nodes, and adaptive scheduling algorithms that take the
current network or server conditions into account.
In DFSP, with CRAS algorithm, user requests can be directed to
appropriate nodes. Different forensics tasks are assigned to
different CDN nodes based on not only the network conditions
but also the computation load, thereby the massive data stream
can be in parallel scheduled among multiple nodes efficiently
also it is proposed to integrate the proposed CDN-based
distributed architecture and CRAS algorithm with a Large-scale
Video Detection (LVD) algorithm.
11. MAIN PAPER CONTRIBUTION
Proposing & design of a novel large-scale digital
video forensics service architecture and platform,
which can process enormous number of Internet
videos in real time by employing CDN.
Proposing a CDN-based Resource-Aware
Scheduling algorithm for dynamic load balancing
in DFSP, which can significantly improve the
efficiency of the platform.
Implementing & evaluating a deployed system in
large scale, which integrates the CDN-based
distributed architecture & the CDN-based
resource-aware scheduling algorithm with a
large-scale video detection algorithm.
12. DIGITAL VIDEO FORENSICS (DVF)
TECHNIQUES
I) WatermarkingI) Watermarking is a traditional forensics technique for
detecting illegal copies & digital tampering which has
improvement aspects:
Watermark must be embedded in legal videos prior to video
distribution
Nature of the original data will be changed
Watermark could be attacked or destroyed during transmission.
13. DIGITAL VIDEO FORENSICS (DVF)
TECHNIQUES
II) Video fingerprintII) Video fingerprint technique has drawn wide attention,
this technique can identify illegal content by extracting a
unique fingerprint from video data.
The fingerprint can be extracted based on some static
features such as color, texture and shape, or some motion
features of the video.
The major advantage of fingerprint technique is that the
fingerprint can be extracted after the media has been
distributed and will not change the original data; thus, it is a
very useful method for detecting Internet videos.
14. DIGITAL VIDEO FORENSICS
SYSTEMS
Douze presented a video copy detection system, which used
a precise representation method to decide whether or not a
query video segment was a copy of a video from the
indexed dataset.
Xu explored an effective system for analyzing the high-
level structures and extracting useful features from soccer
videos in order to identify the content of videos.
Gauch and Shivadas presented a commercial identification
system by extracting features from video sequences that
can characterize the temporal and chromatic variations
within each clip.
Shen outlined a system for detecting near-duplicate videos
based on the dominating content and content changing
trends of the videos .
In addition to the above systems, some other similar ones
also provided effective methods for video forensics and
achieved good results. However, all this work analyzes and
processes videos on a single server.
15. IMPROVING FORENSICS
EFFICIENCY
To improve the forensic efficiency, some researchers began to study the data
structure for effectively indexing the video features. However, the approaches are
all based on single server.
For example, Hoad and Zobel, Hoad and Zobel used local alignment to find sequences of similar
values in video and clip, which provided much faster searching .
LejsekLejsek achieved good efficiency greatly depending on a specific hardware
configuration, which is outside the range of usual computer workstations ;thus,
their work may not be practical for large-scale deployments.
ZhaoZhao improved the scalability of several well-known features including color
signature and visual keywords for web-based retrieval by using high-dimensional
indexing techniques .
Recently, ShangShang introduced a compact spatiotemporal feature to represent videos
and constructed an efficient data structure to address the efficiency and
scalability issues for real-time large-scale near-duplicate Web video retrieval.
Although these works have made good use of the server capacity, when theAlthough these works have made good use of the server capacity, when the
number of videos continues to scale up, the real-time performance may benumber of videos continues to scale up, the real-time performance may be
still hard to be guaranteed.still hard to be guaranteed.
16. PROPOSED SYSTEM ARCHITECTURE
It consists of three main components:
1. Content Access (CA)
2. Video Detection (VD)
3. Resource Management (RM)
Content Access (CA):Content Access (CA): It is located on each CDN node ; which is used to obtain
video data from certain media sources. During this process, we use web crawler, a
special technology for web search, to collect data. This technology can methodically
scan or “crawl” through Internet pages to create an index of video data, so that CA
can quickly provide the relevant data to detect and regularly ensure the data are up
to date.
17. PROPOSED SYSTEM ARCHITECTURE
It consists of three main components:
1. Content Access (CA)
2. Video Detection (VD)
3. Resource Management (RM)
Video Detection (VD):Video Detection (VD): VD is a group of servers distributed near CDN nodes,
which are responsible for analyzing video content and judging their legality. It is
composed of “Blacklist” Database, Content Analysis Servers, and Searching and
Matching Servers.
“Blacklist” Database stores a copy of fingerprints of improper videos.
Content Analysis Servers are used to analyze the video content and extract
their fingerprints.
Searching and Matching Servers take charge of comparing these
fingerprints with those pre-stored in “Blacklist” Database, and judging
their legality.
Paper proposes a large-scale video detection algorithm in VD
18. PROPOSED SYSTEM ARCHITECTURE
It consists of three main components:
1. Content Access (CA)
2. Video Detection (VD)
3. Resource Management (RM)
Resource Management (RM):Resource Management (RM): RM controls and monitors the whole
platform, in charge of scheduling, coordinating, and managing all the
resources and tasks. It comprises Network Monitoring module and Load
Balancing module.
Since each node has a copy of “blacklist”, the forensics tasks can be done at any
node. Network Monitoring module is in charge of monitoring all the nodes and
media sources, so that Load Balancing module can coordinate each node and
schedule different tasks depending on the overall situation.
To balance the load among multiple nodes, paper propose CDN-based
Resource-Aware Scheduling algorithm.
19. PROPOSED SYSTEM ARCHITECTURE
It consists of three main components:
1. Content Access (CA)
2. Video Detection (VD)
3. Resource Management (RM)
LARGE-SCALE VIDEO DETECTIONLARGE-SCALE VIDEO DETECTION
To identify illegal content in DFSP, paper propose a large-scale video
detection algorithm, which represents video by spatial-temporal videospatial-temporal video
wordswords and computes the relevance score by language modeling approachlanguage modeling approach
21. SYSTEM DEPLOYMENT AND
PERFORMANCE EVALUATION
System Deployment
DFSP is deployed on ChinaCache, the biggest CDN
provider in China. Three Central “Blacklist”
Database are deployed to back up each other. Around
these central databases, 55 CDN nodes (about 500
servers) are located within each district in China.
24. SCALABILITY EVALUATION
DFSP has a relatively stable
detection time with the
increasing number of user
queries, which demonstrates
better scalability. The average
detection time is 35.2 s.
DFSP Despite the heavy
demands on the video detection,
the DFSP architecture is able to
distribute the computation load
among several nodes, which
saves computation time
significantly.
25. CONCLUSION
Proposed, designed, and implemented a novel
large-scale digital forensics service platform for
Internet videos.
Proposed DFSP architecture is built upon CDN
infrastructure
Proposed CRAS algorithm to solve dynamic load
balancing problems in large scale, thereby
achieving high system efficiency
Deployed DFSP on ChinaCache and performed
evaluation. The evaluation results demonstrate
the effectiveness of the DFSP.
26. FUTURE WORK
Reduce the storage and computation complexity
for processing a great amount of video data
27. REFERENCES
Novel Large Scale Digital Forensics Service
Platform for Internet Videos
http://en.wikipedia.org/wiki/HSL_and_HSV
http://en.wikipedia.org/wiki/Information_retrieva
l#Recall