Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Online Security Analytics on Large Scale Video Surveillance System by Yu Cao and Xiaoyan Guo


Published on

Spark Summit East Talk

Published in: Data & Analytics

Online Security Analytics on Large Scale Video Surveillance System by Yu Cao and Xiaoyan Guo

  1. 1. Online Security Analytics On Large Scale Video Surveillance System Yu Cao, Xiaoyan Guo EMC Corporation
  2. 2. Security Analytics On Video Surveillance • Search across video systems in all store locations to identify the customer of a fraudulent card transaction and his/her other transactions • Correlate register transactions and surveillance video to identify employee fraud transactions where there is no customer present • If multiple stores in a region are robbed, identify any faces that were in all of those stores in the weeks leading up to the events 2 -- Retail Industry Customer Cases
  3. 3. Challenges In Big & Fast Data Era 3 Cloud Integration M&O Fast Data Ingestion Multi- Latency Analytics Scalable Data Storage
  4. 4. EMC Video Analytics Data Lake 4
  5. 5. Where Spark Resides @ VADL Offline Video Analytics & Model Training Object Detection Feature Extraction Classification Abnormal Detection Face Recognition Feature Indexing Online Video Processing and Detection Ad-Hoc Video Content Search Video & Feature Storage Analytics Model Streaming MLlib & GraphX Core & SQL Deep Learning on Deep Learning Framework 5
  6. 6. Enable Spark to Process Raw Video Data 6 • Spark has no built-in video processing capability • Combine Spark program (Scala, Java) with video processing library(C++)
  7. 7. PipedRDD: Invoking External Programs 7 • PipedRDD[T]: T => Linux Command(T) => String,T is text line • Spark Pipe – pipe interface takes an input of an external command, and then execute it externally. The input stream of this program is the content of RDD in spark, the output of this external program will form a new RDD • JAVA API – JavaRDD<String> pipe(String command) – JavaRDD<String> pipe(java.util.List<String> command) – JavaRDD<String> pipe(java.util.List<String> command, java.util.Map<String,String> env) – Return an RDD created by piping elements to a forked external process
  8. 8. Video Processing Function Implementation 8 • OpenCV – Popular open source computer vision library • Home-grown algorithms, e.g. CNN • Video Processing Functions – video file => video transcoding => list of frame images – frame image => background extraction => background image – frame image => object detection => list of objects – object => feature extraction => object features – ……
  9. 9. Pipeline Video Processing Tasks 9 • Steps – Implement all required video processing sub-components as external programs – Pipeline these processing units by utilizing PipedRDD in Spark jobs • Pseudo-code (Chaining & Pipeline) sc.fromCameraStream (“rtsp://”) .pipe(“video_transcoding”) .pipe(“object_detection”) .pipe(“feature extraction”) .writeToHBase()
  10. 10. Online Video Processing During Ingestion Video Ingestion System 10
  11. 11. Online Video Processing During Ingestion 11 video streams Object Detection Feature Extraction Classification/ Recognition Indexing/ Storing Deep Learning Platform Model Real-time Detection Real-time Dashboard
  12. 12. Video Processing in Spark Streaming 12 • Receive Video Stream – val snapList = stream.queueStream(rddQueue) – Read video stream in certain time interval,put data into msgQueue – rddQueue += sc.makeRDD(msgQueue) – Then process snapList Spark Job Spark Streaming rdd.pipe(“video_transcoding”) .pipe(“object_detection”) .pipe(“feature extraction”) .writeToHBase() Feature & object store Online Video Analytics App
  13. 13. Video Content Search 13 • Content-based video object search – Search similar objects by a given object instance – E.g. search suspect from history video records by given the suspect's identification photo • Semantic-based video object search – Search matched objects by given semantic declaration – E.g. given keywords: search "Red Porsche", "a woman sitting and smoking", etc
  14. 14. Video Content Search Workflow 14 camera streams Object Detection Feature Extraction Index Building HBase Ingestion Index Ingestion video pre-processing: object detection and feature extraction Web Dashboard Web Backend SearchEngine HDFS Multi-Tier Video Storage HBase HBase Client Feature Extraction query image similar object search features object information query top-k objects similar objects • Video Pre-processing and Feature Extraction • Scalable Storage • Object-based Indexing • Similar Object Search Engine Feature & Index Object Info Original Video Data
  15. 15. Video Object Similarity 15 Local Binary Pattern(LBP) • Similarity of Features == Similarity of Video Objects – Color, Texture, Shape – SIFT • 160 features, each is a vector of 128 dimensions – Deep Learning Features … Deep Learning Features in Different Layers
  16. 16. Feature Dimensionality Reduction 16 • PCA – MLlib version PCA (when D is small) • Scalable PCA – Distributed PPCA implemented atop Spark (when D is large) • LSH (Locality-Sensitive Hashing) – LSH hashes input items so that similar items map to the same “buckets” with high probability Resize Grayscale SIFT PCA
  17. 17. Spark Top-K Query Pipeline 17 workers: --f(i)-- --f(i)-- --f(i)-- --f(i)-- --f(i)-- --f(i)-- map: --f(i)-- --qf-- --qf-- query feature --f(i)-- --qf-- --f(i)-- --qf-- order: Array[s1, s2, …,sn] Array[s1, s2, …,sn] Array[s1, s2, …,sn] top-k top-k most similar features
  18. 18. Scala Code Example 18 def topRankScore(sc:SparkContext, top:Int, queryInput:String, trainPath:String, useMethod:(Array[Double], Array[Double])=>Double ) = { val query = sc.makeRDD(Array(queryInput)).map( _.split(" ").map( _.toDouble ) ).collect()(0) featureFile.filter( _.length > 0 ).map{ line => val parts = line.split(" ", 2) (useMethod(query, parts(1).split(" ").map( _.toDouble)), parts(0)) }. takeOrdered(top).map( i => (i._2, i._1) ) } topRankScore(sc, topNumber.toInt, imageFeaturesStr, names, cosScore).foreach(println) Parameter: (sc, top-k, queried feature, HDFS feature file, similarity computing method)
  19. 19. Deep Learning @ VADL 19 • Feature extraction for detected video objects – faces – humans • Classification of video objects – With trained model • Suspect detection and recognition Training neural networks with many layers
  20. 20. 20 Deep Learning With Spark • DeepLearning4J (DL4J) – Open source – Variety of NNs & Flexibility – Cross-platform & Scale – Java Implementation • parallelization (Yarn, Spark) • GPU support – Also supports multi-GPU per host node • DeepDist – Open source – Deep belief networks – Asynchronous stochastic gradient descent for data stored on HDFS / Spark – Python Implementation
  21. 21. THANK YOU.