Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang


Published on

AI plays a central role in the today’s Internet applications and emerging intelligent systems, which are driving the need for scalable, distributed big data analytics with deep learning capabilities. There is increasing demand from organizations to discover and explore data using advanced big data analytics and deep learning. In this talk, we will share how we work with our users to build deep learning powered big data analytics applications (e.g., object detection, image recognition, NLP, etc.) using BigDL, an open source distributed deep learning library for Apache Spark.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang

  1. 1. Intel® Confidential — INTERNAL USE ONLY Building Deep Learning Powered Big Data Analytics using BigDLYiheng Wang, JennieWang BDT / SSG / Intel
  2. 2. 2 What is BigDL? BigDL is a distributed deep learning library forApache Spark*
  3. 3. 3 Big data boost deep learning Production ML/DL system is Complex Why BigDL? Andrew NG, Baidu, NIPS 2015 Paper
  4. 4. 4 Why BigDL? BigDL open sourced on Dec 30, 2016 § Write deep learning applications as standard Spark programs § Run on top of existing Spark or Hadoop clusters(No change to the clusters) § Rich deep learning support § High performance powered by Intel MKL and multi-threaded programming § Efficient scale-out with an all-reduce communications on Spark
  5. 5. 5 usage and examples of bigdl
  6. 6. 6 Fraud Transaction Detection Fraud transaction detection is very import to finance companies. A good fraud detection solution can save a lot of money. ML solution challenge § Data cleaning § Feature engineering § Unbalanced data § Hyper parameter
  7. 7. 7 Fraud Transaction Detection § History data is stored on Hive § Easily data preprocess/cleaning with Spark-SQL § Spark ML pipelinefor complex feature engineering § Under sample + Bagging solve unbalance problem § Grid search for hyperparameter tuning Powered by BigDL
  8. 8. 8 Product Defect Detection and Classification Data source § Cameras installed on manufactory pipeline Task § Detect defect from the photos § Classify the defect
  9. 9. 9 Product Defect Detection and Classification (KeyStone ML Pipeline)
  10. 10. 10 Object Detection on PASCAL(
  11. 11. 11 Fast-RCNN § Faster-RCNN is a popularobject detection framework § It share the features between detection network and region proposal network Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
  12. 12. 12 Object Detection with Fast-RCNN See the code at:
  13. 13. 13 Language Model with RNN Text Preprocessing RNN Model Training Sentence Generating § Sentence Tokenizer § Dictionary Building § Input Document Transformer Generated sentences with regard to trigger words.
  14. 14. 14 RNN Model See the code at:
  15. 15. 15 Learn from Shakespeare Poems Output of RNN: Long live the King . The King and Queen , and the Strange of the Veils of the rhapsodic . and grapple, and the entreatments of the pressure . Upon her head , and in the world ? `` Oh, the gods ! O Jove ! To whom the king : `` O friends ! Her hair, nor loose ! If , my lord , and the groundlingsof the skies . jocund and Tasso in the Staggering of the Mankind . and
  16. 16. 16 Fine-tune Caffe/Torch Model on Spark BigDL Model Fine-tune Melancholy Sunny Macro Caffe Model BigDL Model Torch Model Load • Train on different datasetbasedon pre-trainedmodel • Predict image style instead oftype • Save training timeand improveaccuracy Image source:
  17. 17. 17 Accuracy increases 10% and converge time decreases. Fine-tune Caffe/Torch Model on Spark
  18. 18. 18 Integration with Spark Streaming Spark Streaming RDDs EvaluatorBigDL Model StreamWriter BigDL integarates with Spark Streaming for runtime training and prediction HDFS/S3 Kafka Flume Kinesis Twitter Train Predict
  19. 19. 19 Tight Integration with SparkSQLand DataFrames$’image’) .withColumn( “image_type”, ImgClassifier(“image”)) .filter($’image_type’==‘dog’) .show() Image classification on ImageNet(
  20. 20. 20 More BigDL Examples BigDL provide examples to help developer play with bigdl and start with popular models. Models(Train and Inference example code): § LeNet, Inception, VGG, ResNet, RNN, Auto-encoder Examples: • Text Classification • Image Classification • Load Torch/Caffe model