SlideShare a Scribd company logo
VeryLarge-ScaleDistributed
DeepLearningon BigDL
JasonDai,DingDing
Overview of BigDL
3
BigDL
BringingDeepLearningToBigDataPlatform
• Distributed deep learning framework for Apache
Spark*
• Make deep learning more accessible to big data users
and data scientists
• Write deep learning applications as standard Spark programs
• Run on existing Spark/Hadoop clusters (no changes needed)
• Feature parity with popular deep learning frameworks
• E.g., Caffe, Torch, Tensorflow, etc.
• High performance
• Powered by Intel MKL and multi-threaded programming
• Efficient scale-out
• Leveraging Spark for distributed training & inference
https://github.com/intel-analytics/BigDL
https://bigdl-project.github.io/
Spark Core
SQL SparkR Streaming
MLlib GraphX
ML Pipeline
DataFrame
4
Chasmb/wDeepLearningandBigDataCommunities
Average users (data engineers, data scientists, analysts, etc.)Deep learning experts
The
Chasm
5
BigDLAnsweringTheNeeds
Make deep learning more accessible to big data and data science communities
• Continue the use of familiar SW tools and HW infrastructure to build deep learning
applications
• Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the data
are stored
• Add deep learning functionalities to the Big Data (Spark) programs and/or workflow
• Leverage existing Hadoop/Spark clusters to run deep learning applications
• Shared with other workloads (e.g., ETL, data warehouse, feature engineering, statistic machine
learning, graph analytics, etc.) in a dynamic and elastic fashion
6
FraudDetectionforUnionPay
Train one model
all features selected features
model
candidate model
sampled
partition
Training Data
…
normal
fraud
Train one model
Train one model
sampled
partition
sampled
partition
Post
Processing
Pre-
Processing
model
model
Spark Pipeline
Test Data
Predictions
Test
Spark DataFrameHive Table
Pre-processing
Feature
Engineering
Feature
Engineering
Feature
Selection
Model
Ensemble
SparkPipeline
Neural Network Model Using BigDL
Feature
Selection*
Model
Training
Model
Evaluation
& Fine Tune
https://mp.weixin.qq.com/s?__biz=MzI3NDAwNDUwNg==&mid=2648307335&idx=1&sn=8eb9f63eaf2e40e24a90601b9cc03d1f
DistributedTraining onBigDL
8
DistributedTrainingonBigDL
Training
for (i <- 1 to N) {
batch = next_batch()
output = model.forward(batch.input)
loss = criterion.forward(output, batch.target)
error = criterion.backward(output, batch.target)
model.backward(input, error)
optimMethod.optimize(model.weight, model.gradient)
}
Iterative Data parallel
Synchronous SGD
Mini-batch
9
RunasstandardSparkProgram
Spark
Program
DL App on Driver
Spark
Executor
(JVM)
Spark
Task
BigDL lib
Worker
Intel MKL
Standard
Spark jobs
Worker
Worker Worker
Worker
Spark
Executor
(JVM)
Spark
Task
BigDL lib
Worker
Intel MKL
BigDL
library
Spark
jobs
BigDL Program
Standard Spark jobs
• No changes to the Spark or Hadoop clusters needed
Iterative
• Each iteration of the training runs as a Spark job
Data parallel
• Each Spark task runs the same model on a subset of the data (batch)
10
TechniquesforLarge-ScaleDistributedTraining
• Optimizing parameter synchronization and aggregation
• Optimizing task scheduling
• Scaling batch size
11
ParameterSynchronizationinSparkMLlib
…
Partition 1
Partition 2
Partition n
Training Set
Sample
Sample
Sample
Worker
Worker
Worker
Driver
2
2
2
1
1
1
3
3
3
4
1
2
3
4
Broadcast weight
to each workerEach task computes
gradients
Each task sends gradients
for (tree) aggregation, and
then driver updates the
weight
12
ParameterSynchronizationinBigDL
…
Training Set
PS (Parameter Server) Architecture in BigDL
on top of Spark Block Manager
Partition 1 Partition 2 Partition n
Worker
Gradient
1
2
Weight
3
4
5
Worker
Gradient
1
2
Weight
3
4
5
Worker
Gradient
1
2
Weight
3
4
5
…
… …
… … … …
Peer-2-Peer All-Reduce synchronization
13
ParameterSynchronizationinBigDL
Parameter synchronization time as a fraction of
average compute time for Inception v1 training
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
14
TaskSchedulingOverheads
Spark overheads (task scheduling, task serde, task fetch) as
a fraction of average compute time for Inception v1 training
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
15
Drizzle
*Collaboration with Shivaram Venkataraman from UC Berkeley RICELab
• Low latency execution engine for Apache
Spark
• Fine-grained execution with coarse-grained
scheduling
• Group scheduling
• Schedule a group of iterations at once
• Fault tolerance, scheduling at group boundaries
• Coordinating shuffles: PRE-SCHEDULING
• Pre-schedule tasks on executors
• Trigger tasks once dependencies are met
16
ReducingSchedulingOverheadswithDrizzle
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
17
IncreasedMini-BatchSize
• Distributed synchronous mini-batch SGD
• Increased mini-batch size
total_batch_size = batch_size_per_worker * num_of_workers
• Can lead to loss in test accuracy
• State-of-art method for scaling mini-batch size*
• Linear scaling rule
• Warm-up strategy
• Layer-wise adaptive rate scaling
• Adding batch normalization
* “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour”
* “Scaling SGD Batch Size to 32K for ImageNet Training”
18
Inceptionv1
Strategies
• Gradual warm-up
• Linear scaling
• Gradient clipping
• TODO: adding batch normalization
19
SSD(SingleShotMulti-boxDetector)
Strategies
• Warm-up w/ Adam
• Linear scaling
• Gradient clipping
Distributed Inference on BigDL
21
DistributedInferenceonBigDL
Inference
for (b <- 1 to D) {
input = next_data(i)
output = model.forward(input)
}
Embarrassingly (data) parallel in nature
Efficiently scale out using BigDL on
Apache Spark
22
ImageDetection&ExtractionPipeline
(usingSSD+DeepBitModels)
23
ChallengesofLarge-ScaleProcessinginGPUSolutions
• Reading images out takes a very long time
• Image pre-processing on HBase is very complex
• No existing software frameworks can be leveraged
• E.g., resource management, distributed data processing, fault tolerance, etc.
• Very challenging to scale out to massive amount of pictures
• Due to SW and HW infrastructure constraints
24
UpgradingtoBigDLSolutions
• Reuse existing Hadoop/Spark clusters for deep learning with no changes
• Efficiently scale out on Spark with superior performance
• Reading HBase data no longer a bottleneck
• Very easy to build the end-to-end pipeline in BigDL
• Image transformation and augmentation based on OpenCV on Spark
val preProcessor = BytesToMat() -> Resize(300, 300) -> …
val transformed = preProcessor(dataRdd)
• Directly Load pre-trained models (BigDL/Caffe/Torch/TensorFLow) into BigDL
val model = Module.loadCaffeModel(caffeDefPath, caffeModelPath)
25
ModelQuantizationforEfficientInferenceinBigDL
• Local quantization scheme converting floats to intergers
• Faster compute and smaller models
• Take advantage of SSE and AVX instructions on Xeon servers
• Supports pre-trained models (BigDL/Caffe/Torch/TensorFLow)
val model = Module.loadCaffeModel(caffeDefPath, caffeModelPath)
val quantizedModel = model.quantize()
• Quantized SSD model
• ~4x model size reduction
• >2x inference speedup
• ~0.001 mAP (mean average precision) loss
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
26
TryBigDLOut
Running BigDL, Deep Learning
for Apache Spark, on AWS*
(Amazon* Web Service)
https://aws.amazon.com/blogs/ai/running-
bigdl-deep-learning-for-apache-spark-on-aws/
Use BigDL on Microsoft*
Azure* HDInsight*
https://azure.microsoft.com/en-
us/blog/use-bigdl-on-hdinsight-spark-for-
distributed-deep-learning/
Intel’s BigDL on Databricks*
https://databricks.com/blog/2017/02/09/in
tels-bigdl-databricks.html
BigDL on Alibaba* Cloud
E-MapReduce*
https://yq.aliyun.com/articles/73347
BigDL on CDH* and Cloudera*
Data Science Workbench*
http://blog.cloudera.com/blog/2017/04/bigdl-
on-cdh-and-cloudera-data-science-workbench/
BigDL Distribution in Cray
Urika-XC Analytics Suite
http://www.cray.com/products/analytics/uri
ka-xc
27
PartnerWithUs
https://github.com/intel-analytics/BigDL http://software.intel.com/ai
Stay connected
28
@intelnervana
@intelAI
#intelAI
intelnervana.com
30
LegalNotices&disclaimers
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel
representative to obtain the latest forecast, schedule, specifications and roadmaps.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the
OEM or retailer. No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult
other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit
http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and
provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and
uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata
are available on request.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced
data are accurate.
Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
Intel Confidential

More Related Content

What's hot

Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Databricks
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
Alluxio, Inc.
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
Transparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep LearningTransparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep Learning
Indrajit Poddar
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
Databricks
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016
Keith Kraus
 
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Databricks
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
DESMOND YUEN
 
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
VMware Tanzu
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
VMware Tanzu
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks
 
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data PayloadsHarnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
Simeon Fitch
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
iguazio
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Mathieu Dumoulin
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
Keith Kraus
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
Mathieu Dumoulin
 
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
VMware Tanzu
 

What's hot (20)

Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Transparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep LearningTransparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep Learning
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016
 
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
Pivotal Greenplum in Action on AWS, Azure, and GCP - Greenplum Summit 2018
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data PayloadsHarnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
 

Similar to Very large scale distributed deep learning on BigDL

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
testSri1
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
Ahsan Javed Awan
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
Neo4j
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
Databricks
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Intel® Software
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Joshua Patterson
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
Indrajit Poddar
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Ahsan Javed Awan
 
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Principled Technologies
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Data Con LA
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
inoshg
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
Igor José F. Freitas
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 

Similar to Very large scale distributed deep learning on BigDL (20)

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...Performance advantages of Hadoop ETL offload with the Intel processor-powered...
Performance advantages of Hadoop ETL offload with the Intel processor-powered...
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 

More from DESMOND YUEN

2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf
DESMOND YUEN
 
Small Is the New Big
Small Is the New BigSmall Is the New Big
Small Is the New Big
DESMOND YUEN
 
Intel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefIntel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product Brief
DESMOND YUEN
 
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsCryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
DESMOND YUEN
 
Intel 2021 Product Security Report
Intel 2021 Product Security ReportIntel 2021 Product Security Report
Intel 2021 Product Security Report
DESMOND YUEN
 
How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...
DESMOND YUEN
 
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreNASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
DESMOND YUEN
 
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
DESMOND YUEN
 
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
PUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIESPUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIES
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
DESMOND YUEN
 
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEBUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
DESMOND YUEN
 
An Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelAn Introduction to Semiconductors and Intel
An Introduction to Semiconductors and Intel
DESMOND YUEN
 
Changing demographics and economic growth bloom
Changing demographics and economic growth bloomChanging demographics and economic growth bloom
Changing demographics and economic growth bloom
DESMOND YUEN
 
Intel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyIntel’s Impacts on the US Economy
Intel’s Impacts on the US Economy
DESMOND YUEN
 
2021 private networks infographics
2021 private networks infographics2021 private networks infographics
2021 private networks infographics
DESMOND YUEN
 
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
DESMOND YUEN
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
DESMOND YUEN
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
DESMOND YUEN
 
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm.""Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
DESMOND YUEN
 
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
DESMOND YUEN
 
Machine programming
Machine programmingMachine programming
Machine programming
DESMOND YUEN
 

More from DESMOND YUEN (20)

2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf2022-AI-Index-Report_Master.pdf
2022-AI-Index-Report_Master.pdf
 
Small Is the New Big
Small Is the New BigSmall Is the New Big
Small Is the New Big
 
Intel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product BriefIntel® Blockscale™ ASIC Product Brief
Intel® Blockscale™ ASIC Product Brief
 
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable ProcessorsCryptography Processing with 3rd Gen Intel Xeon Scalable Processors
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
 
Intel 2021 Product Security Report
Intel 2021 Product Security ReportIntel 2021 Product Security Report
Intel 2021 Product Security Report
 
How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...How can regulation keep up as transformation races ahead? 2022 Global regulat...
How can regulation keep up as transformation races ahead? 2022 Global regulat...
 
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, MoreNASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
 
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
 
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
PUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIESPUTTING PEOPLE FIRST:  ITS IS SMART COMMUNITIES AND  CITIES
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
 
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPEBUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
 
An Introduction to Semiconductors and Intel
An Introduction to Semiconductors and IntelAn Introduction to Semiconductors and Intel
An Introduction to Semiconductors and Intel
 
Changing demographics and economic growth bloom
Changing demographics and economic growth bloomChanging demographics and economic growth bloom
Changing demographics and economic growth bloom
 
Intel’s Impacts on the US Economy
Intel’s Impacts on the US EconomyIntel’s Impacts on the US Economy
Intel’s Impacts on the US Economy
 
2021 private networks infographics
2021 private networks infographics2021 private networks infographics
2021 private networks infographics
 
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
 
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm.""Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
 
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
 
Machine programming
Machine programmingMachine programming
Machine programming
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

Very large scale distributed deep learning on BigDL

  • 3. 3 BigDL BringingDeepLearningToBigDataPlatform • Distributed deep learning framework for Apache Spark* • Make deep learning more accessible to big data users and data scientists • Write deep learning applications as standard Spark programs • Run on existing Spark/Hadoop clusters (no changes needed) • Feature parity with popular deep learning frameworks • E.g., Caffe, Torch, Tensorflow, etc. • High performance • Powered by Intel MKL and multi-threaded programming • Efficient scale-out • Leveraging Spark for distributed training & inference https://github.com/intel-analytics/BigDL https://bigdl-project.github.io/ Spark Core SQL SparkR Streaming MLlib GraphX ML Pipeline DataFrame
  • 4. 4 Chasmb/wDeepLearningandBigDataCommunities Average users (data engineers, data scientists, analysts, etc.)Deep learning experts The Chasm
  • 5. 5 BigDLAnsweringTheNeeds Make deep learning more accessible to big data and data science communities • Continue the use of familiar SW tools and HW infrastructure to build deep learning applications • Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the data are stored • Add deep learning functionalities to the Big Data (Spark) programs and/or workflow • Leverage existing Hadoop/Spark clusters to run deep learning applications • Shared with other workloads (e.g., ETL, data warehouse, feature engineering, statistic machine learning, graph analytics, etc.) in a dynamic and elastic fashion
  • 6. 6 FraudDetectionforUnionPay Train one model all features selected features model candidate model sampled partition Training Data … normal fraud Train one model Train one model sampled partition sampled partition Post Processing Pre- Processing model model Spark Pipeline Test Data Predictions Test Spark DataFrameHive Table Pre-processing Feature Engineering Feature Engineering Feature Selection Model Ensemble SparkPipeline Neural Network Model Using BigDL Feature Selection* Model Training Model Evaluation & Fine Tune https://mp.weixin.qq.com/s?__biz=MzI3NDAwNDUwNg==&mid=2648307335&idx=1&sn=8eb9f63eaf2e40e24a90601b9cc03d1f
  • 8. 8 DistributedTrainingonBigDL Training for (i <- 1 to N) { batch = next_batch() output = model.forward(batch.input) loss = criterion.forward(output, batch.target) error = criterion.backward(output, batch.target) model.backward(input, error) optimMethod.optimize(model.weight, model.gradient) } Iterative Data parallel Synchronous SGD Mini-batch
  • 9. 9 RunasstandardSparkProgram Spark Program DL App on Driver Spark Executor (JVM) Spark Task BigDL lib Worker Intel MKL Standard Spark jobs Worker Worker Worker Worker Spark Executor (JVM) Spark Task BigDL lib Worker Intel MKL BigDL library Spark jobs BigDL Program Standard Spark jobs • No changes to the Spark or Hadoop clusters needed Iterative • Each iteration of the training runs as a Spark job Data parallel • Each Spark task runs the same model on a subset of the data (batch)
  • 10. 10 TechniquesforLarge-ScaleDistributedTraining • Optimizing parameter synchronization and aggregation • Optimizing task scheduling • Scaling batch size
  • 11. 11 ParameterSynchronizationinSparkMLlib … Partition 1 Partition 2 Partition n Training Set Sample Sample Sample Worker Worker Worker Driver 2 2 2 1 1 1 3 3 3 4 1 2 3 4 Broadcast weight to each workerEach task computes gradients Each task sends gradients for (tree) aggregation, and then driver updates the weight
  • 12. 12 ParameterSynchronizationinBigDL … Training Set PS (Parameter Server) Architecture in BigDL on top of Spark Block Manager Partition 1 Partition 2 Partition n Worker Gradient 1 2 Weight 3 4 5 Worker Gradient 1 2 Weight 3 4 5 Worker Gradient 1 2 Weight 3 4 5 … … … … … … … Peer-2-Peer All-Reduce synchronization
  • 13. 13 ParameterSynchronizationinBigDL Parameter synchronization time as a fraction of average compute time for Inception v1 training For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 14. 14 TaskSchedulingOverheads Spark overheads (task scheduling, task serde, task fetch) as a fraction of average compute time for Inception v1 training For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 15. 15 Drizzle *Collaboration with Shivaram Venkataraman from UC Berkeley RICELab • Low latency execution engine for Apache Spark • Fine-grained execution with coarse-grained scheduling • Group scheduling • Schedule a group of iterations at once • Fault tolerance, scheduling at group boundaries • Coordinating shuffles: PRE-SCHEDULING • Pre-schedule tasks on executors • Trigger tasks once dependencies are met
  • 16. 16 ReducingSchedulingOverheadswithDrizzle For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 17. 17 IncreasedMini-BatchSize • Distributed synchronous mini-batch SGD • Increased mini-batch size total_batch_size = batch_size_per_worker * num_of_workers • Can lead to loss in test accuracy • State-of-art method for scaling mini-batch size* • Linear scaling rule • Warm-up strategy • Layer-wise adaptive rate scaling • Adding batch normalization * “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour” * “Scaling SGD Batch Size to 32K for ImageNet Training”
  • 18. 18 Inceptionv1 Strategies • Gradual warm-up • Linear scaling • Gradient clipping • TODO: adding batch normalization
  • 19. 19 SSD(SingleShotMulti-boxDetector) Strategies • Warm-up w/ Adam • Linear scaling • Gradient clipping
  • 21. 21 DistributedInferenceonBigDL Inference for (b <- 1 to D) { input = next_data(i) output = model.forward(input) } Embarrassingly (data) parallel in nature Efficiently scale out using BigDL on Apache Spark
  • 23. 23 ChallengesofLarge-ScaleProcessinginGPUSolutions • Reading images out takes a very long time • Image pre-processing on HBase is very complex • No existing software frameworks can be leveraged • E.g., resource management, distributed data processing, fault tolerance, etc. • Very challenging to scale out to massive amount of pictures • Due to SW and HW infrastructure constraints
  • 24. 24 UpgradingtoBigDLSolutions • Reuse existing Hadoop/Spark clusters for deep learning with no changes • Efficiently scale out on Spark with superior performance • Reading HBase data no longer a bottleneck • Very easy to build the end-to-end pipeline in BigDL • Image transformation and augmentation based on OpenCV on Spark val preProcessor = BytesToMat() -> Resize(300, 300) -> … val transformed = preProcessor(dataRdd) • Directly Load pre-trained models (BigDL/Caffe/Torch/TensorFLow) into BigDL val model = Module.loadCaffeModel(caffeDefPath, caffeModelPath)
  • 25. 25 ModelQuantizationforEfficientInferenceinBigDL • Local quantization scheme converting floats to intergers • Faster compute and smaller models • Take advantage of SSE and AVX instructions on Xeon servers • Supports pre-trained models (BigDL/Caffe/Torch/TensorFLow) val model = Module.loadCaffeModel(caffeDefPath, caffeModelPath) val quantizedModel = model.quantize() • Quantized SSD model • ~4x model size reduction • >2x inference speedup • ~0.001 mAP (mean average precision) loss For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
  • 26. 26 TryBigDLOut Running BigDL, Deep Learning for Apache Spark, on AWS* (Amazon* Web Service) https://aws.amazon.com/blogs/ai/running- bigdl-deep-learning-for-apache-spark-on-aws/ Use BigDL on Microsoft* Azure* HDInsight* https://azure.microsoft.com/en- us/blog/use-bigdl-on-hdinsight-spark-for- distributed-deep-learning/ Intel’s BigDL on Databricks* https://databricks.com/blog/2017/02/09/in tels-bigdl-databricks.html BigDL on Alibaba* Cloud E-MapReduce* https://yq.aliyun.com/articles/73347 BigDL on CDH* and Cloudera* Data Science Workbench* http://blog.cloudera.com/blog/2017/04/bigdl- on-cdh-and-cloudera-data-science-workbench/ BigDL Distribution in Cray Urika-XC Analytics Suite http://www.cray.com/products/analytics/uri ka-xc
  • 29.
  • 30. 30 LegalNotices&disclaimers This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2016 Intel Corporation. Intel Confidential