SlideShare a Scribd company logo
BigDL: Bringing Ease of Use of Deep
Learning for Apache Spark
Jason Dai
Radhika Rangarajan
BigDL
Bringing Deep Learning To Big Data Platform
https://github.com/intel-analytics/BigDL
Spark Core
SQL SparkR Streaming
MLlib GraphX
ML Pipeline
DataFrame
BigDL
http://software.intel.com/bigdl
• Distributed deep learning framework for Apache Spark*
• Make deep learning more accessible to big data users
and data scientists
• Write deep learning applications as standard Spark programs
• Run on existing Spark/Hadoop clusters (no changes needed)
• Feature parity with popular deep learning frameworks
• E.g., Caffe, Torch, Tensorflow, etc.
• High performance
• Powered by Intel MKL and multi-threaded programming
• Efficient scale-out
• Leveraging Spark for distributed training & inference
Why BigDL?
Chasm b/w Deep Learning and Big Data
Communities
Average users (Big Data users, data scientists, analysts, etc.)Deep learning experts
The
Chasm
BigDL Answering The Needs
Make deep learning more accessible to big data and data science communities
• Continue the use of familiar SW tools and HW infrastructure to build deep learning
applications
• Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the
data are stored
• Add deep learning functionalities to the Big Data (Spark) programs and/or workflow
• Leverage existing Hadoop/Spark clusters to run deep learning applications
• Dynamically share with other workloads (e.g., ETL, data warehouse, feature engineering, statistic
machine learning, graph analytics, etc.)
Overview of BigDL
Distributed Execution of BigDL Programs
Training
for (i <- 1 to N) {
batch = next_batch()
output = model.forward(batch.input)
loss = criterion.forward(output, batch.target)
error = criterion.backward(output, batch.target)
model.backward(input, error)
optimMethod.optimize(model.weight, model.gradient)
}
Iterative Data parallel
Synchronous SGD
Mini-batch
Inference
for (b <- 1 to D) {
input = next_data(i)
output = model.forward(input)
}
Embarrassingly (data)
parallel in nature
Run as standard Spark Programs
Spark
Program
DL App on Driver
Spark
Executor
(JVM)
Spark
Task
BigDL lib
Worker
Intel MKL
Standard
Spark jobs
Worker
Worker Worker
Worker
Spark
Executor
(JVM)
Spark
Task
BigDL lib
Worker
Intel MKL
BigDL
library
Spark
jobs
BigDL Program
Standard Spark jobs
• No changes to the Spark or Hadoop clusters needed
Iterative
• Each iteration of the training runs as a Spark job
Data parallel
• Each Spark task runs the same model on a subset of the data (batch)
Synchronous Mini-Batch SGD
…
Training Set
Block Manager (BM) / Parameter Server
Partition 1 Partition 2 Partition n
Worker
Gradient
1
2
Weight
3
4
5
Worker
Gradient
1
2
Weight
3
4
5
Worker
Gradient
1
2
Weight
3
4
5
…
… …
… … … …
Peer-2-Peer All-Reduce synchronization
implemented on top of Block Manager in Spark
BigDL APIs
Tensor
• Multi-dimensional array of numeric types (e.g., Float, Double, etc.)
• Generic support of numerical computing (using Intel MKL)
Sample
• Tuple of Tensors (Input, Target) representing a training / test sample
Module
• (100+) Layers of neural network (such as ReLU, Linear, SpatialConvolution, Sequential, etc.)
Criterion
• Given input and target, computing gradient per given loss function
Optimizer
• Local & distributed optimizer (synchronous mini-batch SGD)
• OptimMethod: SGD, Adam, AdaGrad, RMSprop, etc.
Integration with Spark SQL, DataFrames and
Structure Streaming
ImageNet dataset (http://www.image-net.org)
Kafka File
Data Frame
(Batch/Stream)
BigDL UDF
Filtered Data
Frame
(Batch/Stream)
df.select($’image’)
.withColumn( “image_type”,
ImgClassifyInBigDL(“image”))
.filter($’image_type’ == ‘dog’)
Seamless support of deep learning functionalities
in SQL queries and stream processing
Integration with Spark Streaming
and ML Pipelines
Spark
Streaming RDDs
EvaluatorBigDL
Model
StreamWriterHDFS/S3
Kafka
Flume
Kinesis
Twitter
Train
Predict
Spark Streaming
TransformerData
Frames
DLClassifier
TransformerTransformer
Train
Predict
DLEstimator
ML Pipelines
Latest BigDL Features
BigDL Releases
https://github.com/intel-analytics/BigDL/wiki/Downloads
• Open sourced in Dec 2016
• Latest release v0.1.0 (beginning of April’17)
• v0.1.1 targeting the coming week
• Next major release v0.2.0 soon
BigDL 0.1: Python Support & Notebook
Python API support
• Built on top of PySpark
• Python 2.7 support since BigDL 0.1.0
• Python 3.5 support since BigDL 0.1.1
Auto-packing Python dependency for YARN
• No need to pre-install any Python packages in
the cluster
Jupyter notebook integration
https://github.com/intel-analytics/BigDL/blob/branch-
0.1/pyspark/dl/example/tutorial/simple_text_classification/text_classfication.ipynb
BigDL 0.1: Integration With TensorBoard for
Visualization
https://github.com/intel-analytics/BigDL/blob/branch-
0.1/pyspark/dl/example/tutorial/simple_text_classification/text_classfication.ipynb
TensorBoard integration for visualizing BigDL
program behaviors (available since BigDL 0.1.0)
BigDL 0.1: Recurrent Neural Network Support
Simple RNN
LSTM (since BigDL 0.1.0)
GRU (since BigDL 0.1.0)
BigDL 0.2: Functional APIs
Functional API support in upcoming releases
(BigDL 0.2)
• Similar to that in Keras and PyTroch
• Each layer / module is callable
• Much easier to construct complex models
• E.g., multi-input multi-output models, directed acyclic
graphs, etc.
fc1 = Linear(4, 2)()
fc2 = Linear(4, 2)()
cadd = CAddTable()([fc1, fc2])
output1 = ReLU()(cadd)
output2 = Threshold(10.0)(cadd)
optimizer = Optimizer(
model = Model([fc1, fc2], [output1, output2]),
training_rdd=train_rdd,
criterion=ClassNLLCriterion(),
end_trigger=MaxEpoch(max_epoch),
batch_size=batch_size,
optim_method=Adagrad(learningrate=0.01,
learningrate_decay=0.0002))
train_model = optimizer.optimize()
BigDL 0.2: Models Interoperability Support
(e.g., between TensorFlow, Caffe, Torch, BigDL models)
Load existing TensorFlow (in addition to Caffe and Torch) models into BigDL
• Allow model deployment in distributed analytics pipelines using Spark
• Allow for transfer learning, model tuning, model sharing (b/w data scientists and data engineers), etc.
Generate TensorFlow, Caffe and Torch models
• Allow BigDL models to be loaded into existing DL frameworks
Run BigDL (model training & inference) as standalone program in local JVM
• Allow flexible deployment and serving of BigDL models in Java applications
BigDL 0.2: Advanced DL Functionalities
Bi-directional RNN Tree-LSTMRecurrent Dropout
“Hybrid Speech Recognition with Deep
Bidirectional LSTM”, Graves et al., ASRU
2013
“Improved Semantic Representations
From Tree-Structured Long Short-Term
Memory Networks”, Tai et al., ACL 2015
“A Theoretically Grounded Application of
Dropout in Recurrent Neural Networks”,
Gal et al., NIPS 2016
BigDL 0.2: Advanced DL Functionalities
Convolutional LSTM
“Convolutional LSTM Network: a machine learning approach
for precipitation nowcasting”, Shi et al., NIPS 2015
“V-Net: Fully Convolutional Neural Networks for Volumetric
Medical Image Segmentation”, Milletari et al., 3DV 2016
3D Convolution
BigDL Use Cases
Cloud & Big Data Platforms
Running BigDL, Deep Learning for
Apache Spark, on AWS* (Amazon*
Web Service)
https://aws.amazon.com/blogs/ai/running-bigdl-
deep-learning-for-apache-spark-on-aws/
Use BigDL on Microsoft* Azure*
HDInsight*
https://azure.microsoft.com/en-us/blog/use-bigdl-on-
hdinsight-spark-for-distributed-deep-learning/
Intel’s BigDL on Databricks*
https://databricks.com/blog/2017/02/09/int
els-bigdl-databricks.html
BigDL on Alibaba* Cloud
E-MapReduce*
https://yq.aliyun.com/articles/73347
BigDL on CDH* and Cloudera*
Data Science Workbench*
http://blog.cloudera.com/blog/2017/04/bigdl-
on-cdh-and-cloudera-data-science-workbench/
Recommendation
Wide & Deep Learning for Recommender Systems
Cheng et al, DLRS 2016
Neural Collaborative Filtering
He et al, WWW 2017
Churn Analysis
WTTE-RNN (Weibull Time-to-event Recurrent Neural Network)
“WTTE-RNN : Weibull Time To Event Recurrent Neural Network”, Egil Martinsson,
Master’s thesis in Engineering Mathematics & Computational Science, Chalmers
University of Technology and University of Gothenburg
Fraud Detection in UnionPay
Train one model
all features selected features
model
candidate model
sampled
partition
Training Data
…
normal
fraud
Train one model
Train one model
sampled
partition
sampled
partition
Post
Processing
Pre-
Processing
model
model
Spark Pipeline
Test Data
Predictions
Test
Spark DataFrameHive Table
Pre-processing
Feature
Engineering
Feature
Engineering
Feature
Selection
Model
Ensemble
SparkPipeline
Neural Network Model Using BigDL
Feature
Selection* Model
Training
Model
Evaluation
& Fine Tune
https://mp.weixin.qq.com/s?__biz=MzI3NDAwNDUwNg==&mid=2648307335&idx=1&sn=8eb9f63eaf2e40e24a90601b9cc03d1f
Sentiment Analysis for Natural Language
“Improved Semantic Representations From Tree-
Structured Long Short-Term Memory Networks”, Tai
et al., ACL 2015
“When Are Tree Structures Necessary for Deep
Learning of Representations?”, Li et al., EMNLP 2015
Speech Recognition
“Deep speech 2: end-to-end speech recognition
in English and mandarin”, Amodei et al., ICML'16
Image Recognition and Object Detection
Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks,
Ren et al., NIPS 2015
SSD: Single Shot MultiBox Detector,
Liu et al., ECCV 2016
Image Recognition and Object Detection
Pascal VOC data sets (http://host.robots.ox.ac.uk/pascal/VOC/)
Defect Detection in Manufacturing
Scrape Proposals
Extraction
Acid and Oil
Proposals Extraction
`
Size Rescale
Pixel Normalization
To Gray Image
Preprocessing
Parallel Proposal
Extraction
CNN Training and
Prediction
Evaluation
On Test
Data
Retuning the models:
Hidden Layers
Learning Rate
Proposal Algorithm
… …
Label on Image
Preprocessing
Training /
scoring
Proposal
Extraction
Big Data
management
3D Medical Imaging
https://www.ucsf.edu/news/2017/01/405536/ucsf-intel-join-forces-
develop-deep-learning-analytics-health-care
https://www.kaggle.com/c/intel-mobileodt-cervical-cancer-screening
Partner With Us
https://github.com/intel-analytics/BigDL http://software.intel.com/ai
Legal Disclaimers
• Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or
retailer.
• No computer system can be absolutely secure.
• Tests document performance of components on a particular test, in specific systems.
Differences in hardware, software, or configuration will affect actual performance. Consult
other sources of information to evaluate performance as you consider your purchase. For
more complete information about performance and benchmark results, visit
http://www.intel.com/performance.
Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the
U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2017 Intel Corporation
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai and Radhika Rangarajan

More Related Content

What's hot

Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Databricks
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
Databricks
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Databricks
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Databricks
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Databricks
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
Spark Summit
 
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Spark Summit
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
Keith Kraus
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Databricks
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
Video Games at Scale: Improving the gaming experience with Apache Spark
Video Games at Scale: Improving the gaming experience with Apache SparkVideo Games at Scale: Improving the gaming experience with Apache Spark
Video Games at Scale: Improving the gaming experience with Apache Spark
Spark Summit
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
Databricks
 

What's hot (20)

Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
 
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Video Games at Scale: Improving the gaming experience with Apache Spark
Video Games at Scale: Improving the gaming experience with Apache SparkVideo Games at Scale: Improving the gaming experience with Apache Spark
Video Games at Scale: Improving the gaming experience with Apache Spark
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 

Similar to BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai and Radhika Rangarajan

Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Databricks
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
FIWARE Global Summit - Knowage Hands On: Visualizing Data Insights
FIWARE Global Summit - Knowage Hands On: Visualizing Data InsightsFIWARE Global Summit - Knowage Hands On: Visualizing Data Insights
FIWARE Global Summit - Knowage Hands On: Visualizing Data Insights
FIWARE
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
ModusOptimum
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
Alluxio, Inc.
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
Li Gao
 
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Databricks
 
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Jen Aman
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
VMware Tanzu
 

Similar to BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai and Radhika Rangarajan (20)

Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
 
FIWARE Global Summit - Knowage Hands On: Visualizing Data Insights
FIWARE Global Summit - Knowage Hands On: Visualizing Data InsightsFIWARE Global Summit - Knowage Hands On: Visualizing Data Insights
FIWARE Global Summit - Knowage Hands On: Visualizing Data Insights
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
 
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 

Recently uploaded (20)

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 

BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai and Radhika Rangarajan

  • 1. BigDL: Bringing Ease of Use of Deep Learning for Apache Spark Jason Dai Radhika Rangarajan
  • 2. BigDL Bringing Deep Learning To Big Data Platform https://github.com/intel-analytics/BigDL Spark Core SQL SparkR Streaming MLlib GraphX ML Pipeline DataFrame BigDL http://software.intel.com/bigdl • Distributed deep learning framework for Apache Spark* • Make deep learning more accessible to big data users and data scientists • Write deep learning applications as standard Spark programs • Run on existing Spark/Hadoop clusters (no changes needed) • Feature parity with popular deep learning frameworks • E.g., Caffe, Torch, Tensorflow, etc. • High performance • Powered by Intel MKL and multi-threaded programming • Efficient scale-out • Leveraging Spark for distributed training & inference
  • 4. Chasm b/w Deep Learning and Big Data Communities Average users (Big Data users, data scientists, analysts, etc.)Deep learning experts The Chasm
  • 5. BigDL Answering The Needs Make deep learning more accessible to big data and data science communities • Continue the use of familiar SW tools and HW infrastructure to build deep learning applications • Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the data are stored • Add deep learning functionalities to the Big Data (Spark) programs and/or workflow • Leverage existing Hadoop/Spark clusters to run deep learning applications • Dynamically share with other workloads (e.g., ETL, data warehouse, feature engineering, statistic machine learning, graph analytics, etc.)
  • 7. Distributed Execution of BigDL Programs Training for (i <- 1 to N) { batch = next_batch() output = model.forward(batch.input) loss = criterion.forward(output, batch.target) error = criterion.backward(output, batch.target) model.backward(input, error) optimMethod.optimize(model.weight, model.gradient) } Iterative Data parallel Synchronous SGD Mini-batch Inference for (b <- 1 to D) { input = next_data(i) output = model.forward(input) } Embarrassingly (data) parallel in nature
  • 8. Run as standard Spark Programs Spark Program DL App on Driver Spark Executor (JVM) Spark Task BigDL lib Worker Intel MKL Standard Spark jobs Worker Worker Worker Worker Spark Executor (JVM) Spark Task BigDL lib Worker Intel MKL BigDL library Spark jobs BigDL Program Standard Spark jobs • No changes to the Spark or Hadoop clusters needed Iterative • Each iteration of the training runs as a Spark job Data parallel • Each Spark task runs the same model on a subset of the data (batch)
  • 9. Synchronous Mini-Batch SGD … Training Set Block Manager (BM) / Parameter Server Partition 1 Partition 2 Partition n Worker Gradient 1 2 Weight 3 4 5 Worker Gradient 1 2 Weight 3 4 5 Worker Gradient 1 2 Weight 3 4 5 … … … … … … … Peer-2-Peer All-Reduce synchronization implemented on top of Block Manager in Spark
  • 10. BigDL APIs Tensor • Multi-dimensional array of numeric types (e.g., Float, Double, etc.) • Generic support of numerical computing (using Intel MKL) Sample • Tuple of Tensors (Input, Target) representing a training / test sample Module • (100+) Layers of neural network (such as ReLU, Linear, SpatialConvolution, Sequential, etc.) Criterion • Given input and target, computing gradient per given loss function Optimizer • Local & distributed optimizer (synchronous mini-batch SGD) • OptimMethod: SGD, Adam, AdaGrad, RMSprop, etc.
  • 11. Integration with Spark SQL, DataFrames and Structure Streaming ImageNet dataset (http://www.image-net.org) Kafka File Data Frame (Batch/Stream) BigDL UDF Filtered Data Frame (Batch/Stream) df.select($’image’) .withColumn( “image_type”, ImgClassifyInBigDL(“image”)) .filter($’image_type’ == ‘dog’) Seamless support of deep learning functionalities in SQL queries and stream processing
  • 12. Integration with Spark Streaming and ML Pipelines Spark Streaming RDDs EvaluatorBigDL Model StreamWriterHDFS/S3 Kafka Flume Kinesis Twitter Train Predict Spark Streaming TransformerData Frames DLClassifier TransformerTransformer Train Predict DLEstimator ML Pipelines
  • 14. BigDL Releases https://github.com/intel-analytics/BigDL/wiki/Downloads • Open sourced in Dec 2016 • Latest release v0.1.0 (beginning of April’17) • v0.1.1 targeting the coming week • Next major release v0.2.0 soon
  • 15. BigDL 0.1: Python Support & Notebook Python API support • Built on top of PySpark • Python 2.7 support since BigDL 0.1.0 • Python 3.5 support since BigDL 0.1.1 Auto-packing Python dependency for YARN • No need to pre-install any Python packages in the cluster Jupyter notebook integration https://github.com/intel-analytics/BigDL/blob/branch- 0.1/pyspark/dl/example/tutorial/simple_text_classification/text_classfication.ipynb
  • 16. BigDL 0.1: Integration With TensorBoard for Visualization https://github.com/intel-analytics/BigDL/blob/branch- 0.1/pyspark/dl/example/tutorial/simple_text_classification/text_classfication.ipynb TensorBoard integration for visualizing BigDL program behaviors (available since BigDL 0.1.0)
  • 17. BigDL 0.1: Recurrent Neural Network Support Simple RNN LSTM (since BigDL 0.1.0) GRU (since BigDL 0.1.0)
  • 18. BigDL 0.2: Functional APIs Functional API support in upcoming releases (BigDL 0.2) • Similar to that in Keras and PyTroch • Each layer / module is callable • Much easier to construct complex models • E.g., multi-input multi-output models, directed acyclic graphs, etc. fc1 = Linear(4, 2)() fc2 = Linear(4, 2)() cadd = CAddTable()([fc1, fc2]) output1 = ReLU()(cadd) output2 = Threshold(10.0)(cadd) optimizer = Optimizer( model = Model([fc1, fc2], [output1, output2]), training_rdd=train_rdd, criterion=ClassNLLCriterion(), end_trigger=MaxEpoch(max_epoch), batch_size=batch_size, optim_method=Adagrad(learningrate=0.01, learningrate_decay=0.0002)) train_model = optimizer.optimize()
  • 19. BigDL 0.2: Models Interoperability Support (e.g., between TensorFlow, Caffe, Torch, BigDL models) Load existing TensorFlow (in addition to Caffe and Torch) models into BigDL • Allow model deployment in distributed analytics pipelines using Spark • Allow for transfer learning, model tuning, model sharing (b/w data scientists and data engineers), etc. Generate TensorFlow, Caffe and Torch models • Allow BigDL models to be loaded into existing DL frameworks Run BigDL (model training & inference) as standalone program in local JVM • Allow flexible deployment and serving of BigDL models in Java applications
  • 20. BigDL 0.2: Advanced DL Functionalities Bi-directional RNN Tree-LSTMRecurrent Dropout “Hybrid Speech Recognition with Deep Bidirectional LSTM”, Graves et al., ASRU 2013 “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks”, Tai et al., ACL 2015 “A Theoretically Grounded Application of Dropout in Recurrent Neural Networks”, Gal et al., NIPS 2016
  • 21. BigDL 0.2: Advanced DL Functionalities Convolutional LSTM “Convolutional LSTM Network: a machine learning approach for precipitation nowcasting”, Shi et al., NIPS 2015 “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation”, Milletari et al., 3DV 2016 3D Convolution
  • 23. Cloud & Big Data Platforms Running BigDL, Deep Learning for Apache Spark, on AWS* (Amazon* Web Service) https://aws.amazon.com/blogs/ai/running-bigdl- deep-learning-for-apache-spark-on-aws/ Use BigDL on Microsoft* Azure* HDInsight* https://azure.microsoft.com/en-us/blog/use-bigdl-on- hdinsight-spark-for-distributed-deep-learning/ Intel’s BigDL on Databricks* https://databricks.com/blog/2017/02/09/int els-bigdl-databricks.html BigDL on Alibaba* Cloud E-MapReduce* https://yq.aliyun.com/articles/73347 BigDL on CDH* and Cloudera* Data Science Workbench* http://blog.cloudera.com/blog/2017/04/bigdl- on-cdh-and-cloudera-data-science-workbench/
  • 24. Recommendation Wide & Deep Learning for Recommender Systems Cheng et al, DLRS 2016 Neural Collaborative Filtering He et al, WWW 2017
  • 25. Churn Analysis WTTE-RNN (Weibull Time-to-event Recurrent Neural Network) “WTTE-RNN : Weibull Time To Event Recurrent Neural Network”, Egil Martinsson, Master’s thesis in Engineering Mathematics & Computational Science, Chalmers University of Technology and University of Gothenburg
  • 26. Fraud Detection in UnionPay Train one model all features selected features model candidate model sampled partition Training Data … normal fraud Train one model Train one model sampled partition sampled partition Post Processing Pre- Processing model model Spark Pipeline Test Data Predictions Test Spark DataFrameHive Table Pre-processing Feature Engineering Feature Engineering Feature Selection Model Ensemble SparkPipeline Neural Network Model Using BigDL Feature Selection* Model Training Model Evaluation & Fine Tune https://mp.weixin.qq.com/s?__biz=MzI3NDAwNDUwNg==&mid=2648307335&idx=1&sn=8eb9f63eaf2e40e24a90601b9cc03d1f
  • 27. Sentiment Analysis for Natural Language “Improved Semantic Representations From Tree- Structured Long Short-Term Memory Networks”, Tai et al., ACL 2015 “When Are Tree Structures Necessary for Deep Learning of Representations?”, Li et al., EMNLP 2015
  • 28. Speech Recognition “Deep speech 2: end-to-end speech recognition in English and mandarin”, Amodei et al., ICML'16
  • 29. Image Recognition and Object Detection Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren et al., NIPS 2015 SSD: Single Shot MultiBox Detector, Liu et al., ECCV 2016
  • 30. Image Recognition and Object Detection Pascal VOC data sets (http://host.robots.ox.ac.uk/pascal/VOC/)
  • 31. Defect Detection in Manufacturing Scrape Proposals Extraction Acid and Oil Proposals Extraction ` Size Rescale Pixel Normalization To Gray Image Preprocessing Parallel Proposal Extraction CNN Training and Prediction Evaluation On Test Data Retuning the models: Hidden Layers Learning Rate Proposal Algorithm … … Label on Image Preprocessing Training / scoring Proposal Extraction Big Data management
  • 34.
  • 35. Legal Disclaimers • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. • No computer system can be absolutely secure. • Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2017 Intel Corporation