Submit Search
Upload
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
•
Download as PPTX, PDF
•
13 likes
•
6,365 views
Spark Summit
Follow
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 14
Download now
Recommended
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Databricks
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
Alexander Ulanov
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
MLconf
Distributed Deep Learning on Spark
Distributed Deep Learning on Spark
Mathieu Dumoulin
Recommended
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Databricks
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
Alexander Ulanov
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
MLconf
Distributed Deep Learning on Spark
Distributed Deep Learning on Spark
Mathieu Dumoulin
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Databricks
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Summit
Distributed deep learning
Distributed deep learning
Mehdi Shibahara
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Spark Summit
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
Anomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
Cloudera, Inc.
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Databricks
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Databricks
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Spark Summit
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Spark Summit
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)
Edward Yoon
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Databricks
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Databricks
More Related Content
What's hot
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Databricks
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark Summit
Distributed deep learning
Distributed deep learning
Mehdi Shibahara
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Spark Summit
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
Anomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
Cloudera, Inc.
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Databricks
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Databricks
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Spark Summit
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Spark Summit
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)
Edward Yoon
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Databricks
What's hot
(20)
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Distributed deep learning
Distributed deep learning
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Anomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Viewers also liked
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Databricks
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
Spark Summit
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
Jen Aman
ETL со Spark
ETL со Spark
Vasil Remeniuk
FareBor Presentation
FareBor Presentation
Andrey Vilchinsky
Sven Kreiss, Lead Data Scientist, Wildcard at MLconf ATL - 9/18/15
Sven Kreiss, Lead Data Scientist, Wildcard at MLconf ATL - 9/18/15
MLconf
Opening slides to Warsaw Scala FortyFives on Testing tools
Opening slides to Warsaw Scala FortyFives on Testing tools
Jacek Laskowski
apsis - Automatic Hyperparameter Optimization Framework for Machine Learning
apsis - Automatic Hyperparameter Optimization Framework for Machine Learning
andi1400
A Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change Memory
IBM Research
Image Object Detection Pipeline
Image Object Detection Pipeline
Abhinav Dadhich
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Антон Шестаков
Александр Сербул —1С-Битрикс — ICBDA 2015
Александр Сербул —1С-Битрикс — ICBDA 2015
rusbase
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
Allen Day, PhD
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Spark Summit
Neural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Data Con LA
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
batchinsights
Viewers also liked
(20)
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
ETL со Spark
ETL со Spark
FareBor Presentation
FareBor Presentation
Sven Kreiss, Lead Data Scientist, Wildcard at MLconf ATL - 9/18/15
Sven Kreiss, Lead Data Scientist, Wildcard at MLconf ATL - 9/18/15
Opening slides to Warsaw Scala FortyFives on Testing tools
Opening slides to Warsaw Scala FortyFives on Testing tools
apsis - Automatic Hyperparameter Optimization Framework for Machine Learning
apsis - Automatic Hyperparameter Optimization Framework for Machine Learning
A Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change Memory
Image Object Detection Pipeline
Image Object Detection Pipeline
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Александр Сербул —1С-Битрикс — ICBDA 2015
Александр Сербул —1С-Битрикс — ICBDA 2015
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Reactive Streams, linking Reactive Application to Spark Streaming by Luc Bour...
Neural Networks and Deep Learning
Neural Networks and Deep Learning
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Similar to A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
Amazon Web Services
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Databricks
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
Julien SIMON
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
Dalei Li
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Spark Summit
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Databricks
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
Intel Nervana
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
Amazon Web Services
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
eam2
eam2
butest
StackNet Meta-Modelling framework
StackNet Meta-Modelling framework
Sri Ambati
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Databricks
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Kundjanasith Thonglek
Computer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
Similar to A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
(20)
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
eam2
eam2
StackNet Meta-Modelling framework
StackNet Meta-Modelling framework
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Computer Vision for Beginners
Computer Vision for Beginners
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
More from Spark Summit
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
More from Spark Summit
(20)
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Recently uploaded
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Florian Roscheck
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Jack DiGiovanna
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
Rafezzaman
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Emmanuel Dauda
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
limedy534
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
fhwihughh
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
soniya singh
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
voginip
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
📊 Markus Baersch
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
natarajan8993
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
F sss
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
Human37
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
MYRABACSAFRA2
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Social Samosa
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
thyngster
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
17djon017
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
Sonatrach
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Jeremy Anderson
Recently uploaded
(20)
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
1.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. A Scalable Implementation of Deep Learning on Spark Alexander Ulanov 1 Joint work with Xiangrui Meng2, Bert Greevenbosch3 With the help from Guoqiang Li4, Andrey Simanovsky1 1Hewlett-Packard Labs 2Databricks 3Huawei & Jules Energy 4Spark community
2.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2 Outline • Artificial neural network basics • Implementation of Multilayer Perceptron (MLP) in Spark • Optimization & parallelization • Experiments • Future work
3.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 Artificial neural network Basics • Statistical model that approximates a function of multiple inputs • Consists of interconnected “neurons” which exchange messages – “Neuron” produces an output by applying a transformation function on its inputs • Network with more than 3 layers of neurons is called “deep”, instance of deep learning Layer types & learning • A layer type is defined by a transformation function – Affine: 𝑦𝑗 = 𝒘𝒊𝒋 ∙ 𝑥𝑖 + 𝑏𝑗, Sigmoid: 𝑦𝑖 = 1 + 𝑒−𝑥 𝑖 −1 , Convolution, Softmax, etc. • Multilayer perceptron (MLP) – a network with several pairs of Affine & Sigmoid layers • Model parameters – weights that “neurons” use for transformations • Parameters are iteratively estimated with the backpropagation algorithm Multilayer perceptron • Speech recognition (phoneme classification), computer vision 𝑥 𝑦 input output hidden layer
4.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 Example of MLP in Spark Handwritten digits recognition • Dataset MNIST [LeCun et al. 1998] • 28x28 greyscale images of handwritten digits 0-9 • MLP with 784 inputs, 10 outputs and two hidden layers of 300 and 100 neurons val digits: DataFrame = sqlContext.read.format("libsvm").load("/data/mnist") val mlp = new MultilayerPerceptronClassifier() .setLayers(Array(784, 300, 100, 10)) .setBlockSize(128) val model = mlp.fit(digits) 784 inputs 300 neurons 100 neurons 10 neurons 1st hidden layer 2nd hidden layer Output layer digits = sqlContext.read.format("libsvm").load("/data/mnist") mlp = MultilayerPerceptronClassifier(layers=[784, 300, 100, 10], blockSize=128) model = mlp.fit(digits) Scala Python
5.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 Pipeline with PCA+MLP in Spark val digits: DataFrame = sqlContext.read.format(“libsvm”).load(“/data/mnist”) val pca = new PCA() .setInputCol(“features”) .setK(20) .setOutPutCol(“features20”) val mlp = new MultilayerPerceptronClassifier() .setFeaturesCol(“features20”) .setLayers(Array(20, 50, 10)) .setBlockSize(128) val pipeline = new Pipeline() .setStages(Array(pca, mlp)) val model = pipeline.fit(digits) digits = sqlContext.read.format("libsvm").load("/data/mnist8m") pca = PCA(inputCol="features", k=20, outputCol="features20") mlp = MultilayerPerceptronClassifier(featuresCol="features20", layers=[20, 50, 10], blockSize=128) pipeline = Pipeline(stages=[pca, mlp]) model = pipeline.fit(digits) Scala Python
6.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 MLP implementation in Spark Requirements • Conform to Spark APIs • Extensible interface (deep learning API) • Efficient and scalable (single node & cluster) Why conform to Spark APIs? • Spark can call any Java, Python or Scala library, not necessary designed for Spark – Results with expensive data movement from Spark RDD to the library – Prohibits from using for Spark ML Pipelines Extensible interface • Our implementation processes each layer as a black box with backpropagation in general form – Allows further introduction of new layers and features • CNN, Autoencoder, RBM are currently under dev. by community
7.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 Efficiency Batch processing • Layer’s affine transformations can be represented in vector form: 𝒚 = 𝑊 𝑇 𝒙 + 𝒃 – 𝒚 – output from the layer, vector of size 𝑛 – 𝑊 – the matrix of layer weights 𝑚 × 𝑛 , 𝒃 – bias, vector of size 𝑚 – 𝒙 – input to the layer, vector of size 𝑚 • Vector-matrix multiplications are not as efficient as matrix-matrix – Stack 𝑠 input vectors (into batch) to perform matrices multiplication: 𝒀 = 𝑊 𝑇 𝑿 + 𝑩 – 𝑿 is 𝑚 × 𝑠 , 𝒀 is 𝑛 × 𝑠 , – 𝑩 is 𝑛 × 𝑠 , each column contains a copy of 𝒃 • We implemented batch processing in matrix form – Enabled the use of optimized native BLAS libraries – Memory is reused to limit GC overhead = * + = * +
8.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 (1x1)*(1x1) (10x10)*(10x1) (10x10)*(10x10) (100x100)*(100x1) (100x100)*(100x10) (100x100)*(100x100) (1000x1000)*(1000x100) (1000x1000)*(1000x1000) (10000x10000)*(10000x1000) (10000x10000)*(10000x10000) dgemm performance netlib-NVBLAS netlib-MKL netlib OpenBLAS netlib-f2jblas Single node BLAS BLAS in Spark • BLAS – Basic Linear Algebra Subprograms • Hardware optimized native in C & Fortran – CPU: MKL, OpenBLAS etc. – GPU: NVBLAS (F-BLAS interface to CUDA) • Use in Spark through Netlib-java Experiments • Huge benefit from native BLAS vs pure Java f2jblas • GPU is faster (2x) only for large matrices – When compute is larger than copy to/from GPU • More details: – https://github.com/avulanov/scala-blas – “linalg: Matrix Computations in Apache Spark” Reza et al., 2015 CPU: 2x Xeon X5650 @ 2.67GHz, 32GB RAM GPU: Tesla M2050 3GB, 575MHz, 448 CUDA cores seconds Matrices size
9.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 Scalability Parallelization • Each iteration 𝑘, each node 𝑖 – 1. Gets parameters 𝑤 𝑘 from master – 2. Computes a gradient 𝛻𝑖 𝑘 𝐹(𝑑𝑎𝑡𝑎𝑖) – 3. Sends a gradient to master – 4. Master computes 𝑤 𝑘+1 based on gradients • Gradient type – Batch – process all data on each iteration – Stochastic – random point – Mini-batch – random batch • How many workers to use? – Less workers – less compute – More workers – more communication 𝑤 𝑘 𝑤 𝑘+1 ≔ 𝑌 𝛻𝑖 𝑘 𝐹 Master Executor 1 Executor N Partition 1 Partition 2 Partition P Executor 1 Executor N V V v 𝛻1 𝑘 𝐹(𝑑𝑎𝑡𝑎1) 𝛻 𝑁 𝑘 𝐹(𝑑𝑎𝑡𝑎 𝑁) 𝛻1 𝑘 𝐹 Master Executor 1 Executor N Master V V v 1. 2. 3. 4. GoTo #1
10.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 Communication and computation trade-off Parallelization of batch gradient • There are 𝑑 data points, 𝑓 features and 𝑘 classes – Assume, we want to train logistic regression, it has 𝑓𝑘 parameters • Communication: 𝑛 workers get/receive 𝑓𝑘 64 bit parameters through the network with bandwidth 𝑏 and software overhead 𝑐. Use all-reduce: – 𝑡 𝑐𝑚 = 2 64𝑓𝑘 𝑏 + 𝑐 log2 𝑛 • Computation: each worker has 𝑝 FLOPS and processes 𝑑 𝑛 of data, that needs 𝑓𝑘 operations – 𝑡 𝑐𝑝~ 𝑑 𝑛 𝑓𝑘 𝑝 • What is the optimal number of workers? – min 𝑛 𝑡 𝑐𝑚 + 𝑡 𝑐𝑝 ⇒ 𝑛 = 𝑚𝑎𝑥 𝑑𝑓𝑘 ln 2 𝑝 128𝑓𝑘 𝑏+2𝑐 , 1 – 𝑚𝑎𝑥 𝑑∙𝑤∙ln 2 𝑝 128𝑤 𝑏+2𝑐 , 1 , if 𝑤 is the number of model parameters and floating point operations
11.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 Analysis of the trade-off Optimal number of workers for batch gradient • Parallelism in a cluster – 𝑛 = 𝑚𝑎𝑥 𝑑∙𝑤∙ln 2 𝑝 128𝑤 𝑏+2𝑐 , 1 • Analysis – More FLOPS 𝑝 means lower degree of batch gradient parallelism in a cluster – More operations, i.e. more features and classes 𝑤 = 𝑓𝑘 (or a deep network) means higher degree – Small 𝑐 overhead for get/receive a message means higher degree • Example: MNIST8M handwritten digit recognition dataset – 8.1M documents, 784 features, 10 classes, logistic regression – 32GFlops double precision CPU, 1Gbit network, overhead ~ 0.1s – 𝑛 = 𝑚𝑎𝑥 8.1𝑀∙784∙10∙0.69 32𝐺 128∙784∙10 1𝐺+2∙0.1 , 1 = 6
12.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 0 20 40 60 80 100 0 1 2 3 4 5 6 Spark MLP vs Caffe MLP MLP (total) MLP (compute) Caffe CPU Caffe GPU Scalability testing Setup • MNIST character recognition 60K samples • 6-layer MLP (784,2500,2000,1500,1000,500,10) • 12M parameters • CPU: Xeon X5650 @ 2.67GHz • GPU: Tesla M2050 3GB, 575MHz • Caffe (Deep Learning from Berkeley): 1 node • Spark: 1 master + 5 workers Results per iteration • Single node (both tools double precision) – 1.6 slower than Caffe CPU (Scala vs C++) • Scalability – 5 nodes give 4.7x speedup, beats Caffe, close to GPU Seconds Workers Communication cost 𝑛 = 𝑚𝑎𝑥 60𝐾 ∙ 12𝑀 ∙ 0.69 64𝐺 128 ∙ 12𝑀 950𝑀 + 2 ∙ 0.1 , 1 = 𝟒
13.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 Conclusions & future work Conclusions • Scalable multilayer perceptron is available in Spark 1.5.0 • Extensible internal API for Artificial Neural Networks – Further contributions are welcome! • Native BLAS (and GPU) speeds up Spark • Heuristics for parallelization of batch gradient Work in progress [SPARK-5575] • Autoencoder(s) • Restricted Boltzmann Machines • Drop-out • Convolutional neural networks Future work • SGD & parameter server
14.
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Thank you
Download now