SlideShare a Scribd company logo
Deep Learning On Spark
Using BigDL on Qubole
Dash Desai
Technology Evangelist
@iamontheinet
Some Basic Concepts
Copyright 2017 © Qubole
What is Machine Learning?
Gives ‘computers the ability to learn without
being explicitly programmed’ - Wikipedia
Copyright 2017 © Qubole
What is Deep Learning?
Form of ML that uses a model of computing—
inspired by the structure of the brain
Copyright 2017 © Qubole
Deep Learning Applications
Computer Vision / Image Recognition / Object Detection
Speech Recognition / Natural Language Processing (NLP)
Recommendation Systems (Products, Matchmaking, etc.)
Prediction (Stock Market, Healthcare, etc.)
Anomaly Detection (Cybersecurity, etc.)
Copyright 2017 © Qubole
What is Apache Spark?
A fast and general-purpose engine
for large-scale, distributed data
processing
MLlib
Spark’s scalable machine learning library
High-quality algorithms; 100x faster than MapReduce
Usable in Java, Scala, Python, and R
Copyright 2017 © Qubole
Deep Learning: On Apache Spark
Copyright 2017 © Qubole
Deep Learning: Other Popular Non-Spark Options
TensorFlow* (Google)
• Natively distributed out-of-the-box
Keras
• Naturally runs on distributed frameworks/back-ends
• Theano, MXNet (CMU, MIT, NYU), TensorFlow, CNTK (Microsoft)
*Not to be confused with TensorFlow On Spark (TFOS) by Yahoo
BigDL
Copyright 2017 © Qubole
What is BigDL?
Distributed deep learning library
for Apache Spark
Open sourced by Intel (in Dec 2016)
Feature parity with DL frameworks such as Caffe, Torch
Integrates with Spark ML pipeline and Spark Streaming
Supports Model snapshots
Intel MKL (Math Kernel Library); multi-threading within
each Spark task
Copyright 2017 © Qubole
Cont…
Includes 100+ Layers (highest level building block in DL)
Includes 20+ Loss functions (help with model fitting)
Optimization methods include SGD, Adagard, LBFGS
Numeric computing via Tensor & high level neural networks
Scaling: synchronous mini-batch SGD and all-reduce
communication on Spark
What is BigDL?
Copyright 2017 © Qubole
BigDL vs TensorFrames
TensorFrames — can call TF from individual
partitions of a DataFrame or an RDD (in PySpark)…
However, since TF is not natively integrated
with Spark, it does not support distributed deep
learning such as for model training or fine
tuning.
Copyright 2017 © Qubole
BigDL vs TensorFlow on Spark (TFOS), Caffe
TensorFlow on Spark* (TFOS) and Caffe on Spark —
use Spark executors to launch TF or Caffe instances
on the cluster…
However, model training, predictions, etc. are
performed outside of Spark across multiple TF or
Caffe instances…
• Run as standalone jobs outside of the pipeline
• Very fine-grained/limited interaction with
analytics pipeline
*Not to be confused with natively distributed TensorFlow by Google
Copyright 2017 © Qubole
How Does BigDL Work
Copyright 2017 © Qubole
Distributed Deep Learning: Two Methods
Copyright 2017 © Qubole
Distributed Deep Learning: BigDL
<Insert Demo Here >> YAY!/>
Copyright 2017 © Qubole
Demo: Recognize Handwritten Digits
On
Use Model
Train On Dataset
… with everything running on …
…
…
… to recognize handwritten digits …
Data Science on Qubole
00Copyright 2017 © Qubole
Qubole
Qubole automates, controls and orchestrates all big data workloads including Data
Science so that you can optimize for performance, cost and scale.
Built for Anyone Who Uses
Data
Analysts
Data Scientists
Data Engineers
Data Admins
A Single Platform
for Any Use Case
ETL & Reporting
Ad Hoc Queries
Machine Learning
Streaming
Vertical Apps
Open Source Engines,
Optimized for the Cloud
Cloud-Native,
Cloud-Optimized,
Cloud-Agnostics
Copyright 2017 © Qubole
Data Science on Qubole
Copyright 2017 © Qubole
Data Science on Qubole
Copyright 2017 © Qubole
Cluster LifeCycle Management on Qubole
Note: Available on Apache Spark, Hadoop, and Presto as a service on Qubole
Auto-scaling Clusters
• Policy-driven
• One-time setup; Runtime modifications
• Work load aware upscaling and downscaling
• No wasted resources results in lowered TCO
Heterogeneous Clusters
• Mix-and-match instance types
• On-Demand and Spot instances (on AWS)
00Copyright 2017 © Qubole
Qubole: High-level View
User Access Qubole Tier Customer’s Azure Account
QUBOLE UI
VIA BROWSER
SDK
ODBC
EPHEMERAL WEB TIER
WEB SERVERS
Default Hive
Metastore
RDS–Qubole
User, Account
Configurations
(Encrypted
credentials)
Encrypted
Result Cache
(Optional)
Custom Hive
Metastore
(Optional) Other
RDS
Encrypted
HDFS
Slave
Encrypted
HDFS
Slave
Master
Ephemeral
Cluster,
Managed by
Qubole
Data Flow within
Customer’s CloudRESTAPI
(HTTPS)
Thank you!
Dash Desai
Technology Evangelist
@iamontheinet
Getting Started
Install BigDL on Qubole + Demo App: http://bit.ly/deep_learning_bigdl_qubole
BigDL: https://github.com/intel-analytics/BigDL

More Related Content

What's hot

Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Akash Tandon
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019
Goutam Tadi
 
Distributed ML with Dask and Kubernetes
Distributed ML with Dask and KubernetesDistributed ML with Dask and Kubernetes
Distributed ML with Dask and Kubernetes
Ray Hilton
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to df
Mohit Jaggi
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
Databricks
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
Mehdi Shibahara
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
MLconf
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
DataWorks Summit/Hadoop Summit
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
Building DSLs with Scala
Building DSLs with ScalaBuilding DSLs with Scala
Building DSLs with Scala
Mohit Jaggi
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
Databricks
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with Scala
Susan Eraly
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
Databricks
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
Rafi Khan
 
Distributed deep learning optimizations for Finance
Distributed deep learning optimizations for FinanceDistributed deep learning optimizations for Finance
Distributed deep learning optimizations for Finance
geetachauhan
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
MLconf
 
AI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowAI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using Kubeflow
Steve Guhr
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
Spark Summit
 

What's hot (20)

Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019
 
Distributed ML with Dask and Kubernetes
Distributed ML with Dask and KubernetesDistributed ML with Dask and Kubernetes
Distributed ML with Dask and Kubernetes
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to df
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Building DSLs with Scala
Building DSLs with ScalaBuilding DSLs with Scala
Building DSLs with Scala
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with Scala
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
 
Distributed deep learning optimizations for Finance
Distributed deep learning optimizations for FinanceDistributed deep learning optimizations for Finance
Distributed deep learning optimizations for Finance
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
AI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowAI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using Kubeflow
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 

Similar to Deep Learning on Apache Spark

Image Recognition on AWS with Apache Spark and BigDL
Image Recognition on AWS with Apache Spark and BigDLImage Recognition on AWS with Apache Spark and BigDL
Image Recognition on AWS with Apache Spark and BigDL
Amazon Web Services
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Data Con LA
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
Amazon Web Services
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Indrajit Poddar
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van HovenOcto and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
InfluxData
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
Rodney Joyce
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
Dr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
Frank Schroeter
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
Shivaji Dutta
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
GDSCNiT
 
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Codemotion
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
Seldon
 
Spark 101
Spark 101Spark 101

Similar to Deep Learning on Apache Spark (20)

Image Recognition on AWS with Apache Spark and BigDL
Image Recognition on AWS with Apache Spark and BigDLImage Recognition on AWS with Apache Spark and BigDL
Image Recognition on AWS with Apache Spark and BigDL
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van HovenOcto and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Spark 101
Spark 101Spark 101
Spark 101
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Deep Learning on Apache Spark

  • 1. Deep Learning On Spark Using BigDL on Qubole Dash Desai Technology Evangelist @iamontheinet
  • 3. Copyright 2017 © Qubole What is Machine Learning? Gives ‘computers the ability to learn without being explicitly programmed’ - Wikipedia
  • 4. Copyright 2017 © Qubole What is Deep Learning? Form of ML that uses a model of computing— inspired by the structure of the brain
  • 5. Copyright 2017 © Qubole Deep Learning Applications Computer Vision / Image Recognition / Object Detection Speech Recognition / Natural Language Processing (NLP) Recommendation Systems (Products, Matchmaking, etc.) Prediction (Stock Market, Healthcare, etc.) Anomaly Detection (Cybersecurity, etc.)
  • 6. Copyright 2017 © Qubole What is Apache Spark? A fast and general-purpose engine for large-scale, distributed data processing MLlib Spark’s scalable machine learning library High-quality algorithms; 100x faster than MapReduce Usable in Java, Scala, Python, and R
  • 7. Copyright 2017 © Qubole Deep Learning: On Apache Spark
  • 8. Copyright 2017 © Qubole Deep Learning: Other Popular Non-Spark Options TensorFlow* (Google) • Natively distributed out-of-the-box Keras • Naturally runs on distributed frameworks/back-ends • Theano, MXNet (CMU, MIT, NYU), TensorFlow, CNTK (Microsoft) *Not to be confused with TensorFlow On Spark (TFOS) by Yahoo
  • 10. Copyright 2017 © Qubole What is BigDL? Distributed deep learning library for Apache Spark Open sourced by Intel (in Dec 2016) Feature parity with DL frameworks such as Caffe, Torch Integrates with Spark ML pipeline and Spark Streaming Supports Model snapshots Intel MKL (Math Kernel Library); multi-threading within each Spark task
  • 11. Copyright 2017 © Qubole Cont… Includes 100+ Layers (highest level building block in DL) Includes 20+ Loss functions (help with model fitting) Optimization methods include SGD, Adagard, LBFGS Numeric computing via Tensor & high level neural networks Scaling: synchronous mini-batch SGD and all-reduce communication on Spark What is BigDL?
  • 12. Copyright 2017 © Qubole BigDL vs TensorFrames TensorFrames — can call TF from individual partitions of a DataFrame or an RDD (in PySpark)… However, since TF is not natively integrated with Spark, it does not support distributed deep learning such as for model training or fine tuning.
  • 13. Copyright 2017 © Qubole BigDL vs TensorFlow on Spark (TFOS), Caffe TensorFlow on Spark* (TFOS) and Caffe on Spark — use Spark executors to launch TF or Caffe instances on the cluster… However, model training, predictions, etc. are performed outside of Spark across multiple TF or Caffe instances… • Run as standalone jobs outside of the pipeline • Very fine-grained/limited interaction with analytics pipeline *Not to be confused with natively distributed TensorFlow by Google
  • 14. Copyright 2017 © Qubole How Does BigDL Work
  • 15. Copyright 2017 © Qubole Distributed Deep Learning: Two Methods
  • 16. Copyright 2017 © Qubole Distributed Deep Learning: BigDL
  • 17. <Insert Demo Here >> YAY!/>
  • 18. Copyright 2017 © Qubole Demo: Recognize Handwritten Digits On Use Model Train On Dataset … with everything running on … … … … to recognize handwritten digits …
  • 19. Data Science on Qubole
  • 20. 00Copyright 2017 © Qubole Qubole Qubole automates, controls and orchestrates all big data workloads including Data Science so that you can optimize for performance, cost and scale. Built for Anyone Who Uses Data Analysts Data Scientists Data Engineers Data Admins A Single Platform for Any Use Case ETL & Reporting Ad Hoc Queries Machine Learning Streaming Vertical Apps Open Source Engines, Optimized for the Cloud Cloud-Native, Cloud-Optimized, Cloud-Agnostics
  • 21. Copyright 2017 © Qubole Data Science on Qubole
  • 22. Copyright 2017 © Qubole Data Science on Qubole
  • 23. Copyright 2017 © Qubole Cluster LifeCycle Management on Qubole Note: Available on Apache Spark, Hadoop, and Presto as a service on Qubole Auto-scaling Clusters • Policy-driven • One-time setup; Runtime modifications • Work load aware upscaling and downscaling • No wasted resources results in lowered TCO Heterogeneous Clusters • Mix-and-match instance types • On-Demand and Spot instances (on AWS)
  • 24. 00Copyright 2017 © Qubole Qubole: High-level View User Access Qubole Tier Customer’s Azure Account QUBOLE UI VIA BROWSER SDK ODBC EPHEMERAL WEB TIER WEB SERVERS Default Hive Metastore RDS–Qubole User, Account Configurations (Encrypted credentials) Encrypted Result Cache (Optional) Custom Hive Metastore (Optional) Other RDS Encrypted HDFS Slave Encrypted HDFS Slave Master Ephemeral Cluster, Managed by Qubole Data Flow within Customer’s CloudRESTAPI (HTTPS)
  • 25. Thank you! Dash Desai Technology Evangelist @iamontheinet Getting Started Install BigDL on Qubole + Demo App: http://bit.ly/deep_learning_bigdl_qubole BigDL: https://github.com/intel-analytics/BigDL