Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang

Intel® Confidential — INTERNAL USE ONLY
Building Deep
Learning Powered
Big Data Analytics
using BigDLYiheng Wang, JennieWang
BDT / SSG / Intel

2
What is BigDL?
BigDL is a distributed deep learning library forApache Spark*

3
Big data boost deep learning Production ML/DL system is Complex
Why BigDL?
Andrew NG, Baidu, NIPS 2015 Paper

4
Why BigDL?
BigDL open sourced on Dec 30, 2016
§ Write deep learning applications as standard Spark programs
§ Run on top of existing Spark or Hadoop clusters(No change to the clusters)
§ Rich deep learning support
§ High performance powered by Intel MKL and multi-threaded programming
§ Efficient scale-out with an all-reduce communications on Spark

6
Fraud Transaction Detection
Fraud transaction detection is very import to finance companies. A good fraud detection
solution can save a lot of money.
ML solution challenge
§ Data cleaning
§ Feature engineering
§ Unbalanced data
§ Hyper parameter

7
Fraud Transaction Detection
§ History data is stored on Hive
§ Easily data preprocess/cleaning
with Spark-SQL
§ Spark ML pipelinefor complex
feature engineering
§ Under sample + Bagging solve
unbalance problem
§ Grid search for hyperparameter
tuning
Powered by BigDL

8
Product Defect Detection and Classification
Data source
§ Cameras installed on manufactory pipeline
Task
§ Detect defect from the photos
§ Classify the defect

9
Product Defect Detection and Classification
(KeyStone ML Pipeline)

10
Object Detection on PASCAL(http://host.robots.ox.ac.uk/pascal/VOC/)

11
Fast-RCNN
§ Faster-RCNN is a popularobject
detection framework
§ It share the features between detection
network and region proposal network
Ren, Shaoqing, et al. "Faster r-cnn: Towards
real-time object detection with region proposal
networks." Advances in neural information
processing systems. 2015.

12
Object Detection with Fast-RCNN
See the code at: https://github.com/intel-analytics/BigDL/pull/387

13
Language Model with RNN
Text
Preprocessing
RNN Model
Training
Sentence
Generating
§ Sentence Tokenizer
§ Dictionary Building
§ Input Document
Transformer
Generated sentences with
regard to trigger words.

14
RNN Model
See the code at:
https://github.com/intel-analytics/BigDL/tree/master/dl/src/main/scala/com/intel/analytics/bigdl/models/rnn

15
Learn from Shakespeare Poems
Output of RNN:
Long live the King . The King and Queen , and the Strange of the Veils of the rhapsodic .
and grapple, and the entreatments of the pressure .
Upon her head , and in the world ? `` Oh, the gods ! O Jove ! To whom the king : `` O
friends !
Her hair, nor loose ! If , my lord , and the groundlingsof the skies . jocund and Tasso in
the Staggering of the Mankind . and

16
Fine-tune Caffe/Torch Model on Spark
BigDL Model
Fine-tune
Melancholy
Sunny
Macro
Caffe
Model
BigDL
Model
Torch
Model
Load
• Train on different datasetbasedon pre-trainedmodel
• Predict image style instead oftype
• Save training timeand improveaccuracy
Image source: https://www.flickr.com/photos/

17
Accuracy increases 10% and converge time decreases.
Fine-tune Caffe/Torch Model on Spark

18
Integration with Spark Streaming
Spark
Streaming RDDs
EvaluatorBigDL
Model
StreamWriter
BigDL integarates with Spark Streaming for runtime training and prediction
HDFS/S3
Kafka
Flume
Kinesis
Twitter
Train
Predict

19
Tight Integration with SparkSQLand DataFrames
df.select($’image’)
.withColumn(
“image_type”,
ImgClassifier(“image”))
.filter($’image_type’==‘dog’)
.show()
Image classification on ImageNet(http://www.image-net.org)

20
More BigDL Examples
BigDL provide examples to help developer play with bigdl and start with popular models.
https://github.com/intel-analytics/BigDL/wiki/Examples
Models(Train and Inference example code):
§ LeNet, Inception, VGG, ResNet, RNN, Auto-encoder
Examples:
• Text Classification
• Image Classification
• Load Torch/Caffe model

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang

Similar to Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang (20)

More from Spark Summit

More from Spark Summit (20)

Recently uploaded

Recently uploaded (20)

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang