SlideShare a Scribd company logo
Scaling Up
Deep Learning
on Clusters
Max Xie
Qualia Li
Aleks Kamko
Who are we?
Aleks did his undergrad in EECS at
Berkeley. He’s back for some Data
Science experience.
Max has a B.S. in Math and would like
to explore the world of data with it.
Qualia majored in Software
Engineering at ZJU. He is interested
in how big data works.
Qualia Aleks
Max
2. What is “Machine Learning”!?
a. What is “Deep Learning”?
3. History + Status of Project
4. Scaling Up - What and How
5. Stakeholders
6. Roadmap
Agenda
Project
Overview
Scaling Up Deep Learning on Clusters
Extend the BIDMach machine learning
framework
Implement + benchmark novel deep
learning algorithms
Extend framework to take advantage of
clusters. More machines → faster
Project
Overview
Scaling Up Deep Learning on Clusters
Extend the BIDMach machine learning
framework
Implement + benchmark novel deep
learning algorithms
Extend framework to take advantage of
clusters. More machines → faster
Work with OpenChai to bring machine learning
to enterprise + consumers
Adapt BIDMach to OpenChai’s bleeding
Machine Learning 101
- Computer algorithms automatically learning
from data and information.
Machine Learning 101
- Computer algorithms automatically learning
from data and information.
History:
1950 - Alan Turing creates the “Turing Test”
1952 - First ML program, for checkers.
1990s - Shifts from knowledge-driven to data-
driven
1997 - IBM’s Deep Blue beats world champion
at chess.
So... Deep Learning?
First, Neural Networks:
Inspired by the structure and
functional aspects of biological
neural networks, modeling complex
relationships
So... Deep Learning?
First, Neural Networks:
Inspired by the structure and
functional aspects of biological
neural networks, modeling complex
relationships
1957 - The first neural network for
computers.
Deep Learning!!
= Deeper Neural Networks!
Focusing on different levels of
abstraction in representing objects.
Deep Learning!!
= Deeper Neural Networks!
Focusing on different levels of
abstraction in representing objects.
Applications: speech recognition,
computer vision, robotics, planning, etc.
BIDMach doesn’t have much
implementation for Deep Learning.
History of Project:
Overview
Target
Make the BID Data Suite the fastest Big Data
tool on the Internet
Components
Storage, CPU and GPU
BIDMat -- Matrix Algebra, Data Manipulation
BIDMach -- Machine Learning
Scaling Up
Butterfly Mixing
Sparse AllReduce
History of Project:
BIDMach and the BID Data Suite
BIDMach is a machine learning framework that is part of the larger BID Data
Suite. Motives for the suite include:
1. Exploratory data analysis → quickly sifting through data, making
hypotheses about structure, rapidly testing
History of Project:
BIDMach and the BID Data Suite
BIDMach is a machine learning framework that is part of the larger BID Data
Suite. Motives for the suite include:
1. Exploratory data analysis → quickly sifting through data, making
hypotheses about structure, rapidly testing
2. Rapid deployment and live tuning of models in commercial setting → very
high performance in both prototype and production setting
History of Project:
BIDMach and the BID Data Suite
BIDMach is a machine learning framework that is part of the larger BID Data
Suite. Motives for the suite include:
1. Exploratory data analysis → quickly sifting through data, making
hypotheses about structure, rapidly testing
2. Rapid deployment and live tuning of models in commercial setting → very
high performance in both prototype and production setting
3.Make single-machine algorithms fast first, improve using clusters later
300%
BIDMach offers up to 300% gain in performance,
compared with state-of-art.
(John Canny et. al, 2015)
BIDMach offers up to 300% gain in performance,
compared with state-of-art.
(John Canny et. al, 2015)
Netflix Movie Recommendation
(A Matrix Factorization Problem)
Scaling Up
“Make single-machine algorithms fast first, improve using clusters later”
“The Cloud”
Scaling Up
“Make single-machine algorithms fast first, improve using clusters later”
Using only a single machine, BIDMach already beats most cluster-
based frameworks.
Scaling Up
“Make single-machine algorithms fast first, improve using clusters later”
Using only a single machine, BIDMach already beats most cluster-
based frameworks.
Our capstone project aims to extend BIDMach’s existing
algorithms to utilize the compute power of a cluster, making
BIDMach even faster!
“Scaling Up” to an
Enterprise Solution
Our main industry partner,
OpenChai, is working with us
to bring BIDMach and Machine
Learning to enterprise
“Scaling Up” to an
Enterprise Solution
Our main industry partner,
OpenChai, is working with us
to bring BIDMach and Machine
Learning to enterprise
● Retail
● Banking
● Genetics
● Mobile
● Home
“Scaling Up” to a
Consumer Solution
We’re also trying to bring
machine learning to your home
and local businesses!
Using mobile-phone GPUs and
CPUs → affordable, powerful,
and power-efficient
“Scaling Up” Goal 1:
Bring BIDMach to clusters.
“The Cloud”
“Scaling Up” Goal 2:
Bring clusters to your home!
“The Cloud”
Stakeholders
Co-developers
nVidia
ARM
OpenChai
Potential-Users
Google
Facebook
Twitter
Roadmap
1. Get familiar with the various systems
2. Implement Random Forests & K-Means
3. Improve BIDMach distributed
communication system
4. Working on model-parallel algorithms:
- Logistic regression
- Recurrent Neural Networks
- Convolutional Neural Networks
- Translational Model
Oct.
Dec.
Feb.
May
Technical Business
Roadmap
1. Get familiar with the various systems
2. Implement Random Forests & K-Means
3. Improve BIDMach distributed
communication system
4. Working on model-parallel algorithms:
- Logistic regression
- Recurrent Neural Networks
- Convolutional Neural Networks
- Translational Model
- Meet with OpenChai
- Get BIDMach working on OpenChai
hardware
- Running benchmarks
- Marketing “OpenChai + BIDMach”
Oct.
Dec.
Feb.
May
Technical Business
We will generally be co-working.
Aleks: Deal with the machines on cloud and how
they communicate.
Qualia: Learn Scala and write codes.
Max: Understand how variables are interrelated in
model-parallel algorithms
Roles
Qualia Aleks
Max
Revisiting
Project
Overview
Scaling Up Deep Learning on Clusters
Extend the BIDMach machine learning
framework
Implement + benchmark novel deep
learning algorithms
Extend framework to take advantage of
clusters. More machines → faster
Revisiting
Project
Overview
Scaling Up Deep Learning on Clusters
Extend the BIDMach machine learning
framework
Implement + benchmark novel deep
learning algorithms
Extend framework to take advantage of
clusters. More machines → faster
Work with OpenChai to bring machine learning
to enterprise + consumers
Adapt BIDMach to OpenChai’s bleeding
“A breakthrough
in machine
learning would be
worth ten
Microsofts. ”
- Bill Gates
Questions?

More Related Content

What's hot

BigData Analytics
BigData AnalyticsBigData Analytics
BigData Analytics
Mayank Kumar Sharma
 
Cloud computing & big data for service innovation & learning
Cloud computing & big data for service innovation & learningCloud computing & big data for service innovation & learning
Cloud computing & big data for service innovation & learning
2016
 
The AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesThe AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic Perspectives
SnapLogic
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
DataWorks Summit/Hadoop Summit
 
Final Report
Final ReportFinal Report
Final Report
San Kai Hong
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
 
IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
Tony Pearson
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
Dataiku
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
Qubole
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
Minhazul Arefin
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
SnapLogic
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
Mohammed Guller
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)
RaffaelDzikowski
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. Hunt
Databricks
 
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16thDataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
Auto AI : AI used to create AI applications
Auto AI : AI used to create AI applicationsAuto AI : AI used to create AI applications
Auto AI : AI used to create AI applications
Karan Sachdeva
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Dataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine Learning
Dataiku
 

What's hot (20)

BigData Analytics
BigData AnalyticsBigData Analytics
BigData Analytics
 
Cloud computing & big data for service innovation & learning
Cloud computing & big data for service innovation & learningCloud computing & big data for service innovation & learning
Cloud computing & big data for service innovation & learning
 
The AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesThe AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic Perspectives
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
 
Final Report
Final ReportFinal Report
Final Report
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. Hunt
 
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16thDataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Auto AI : AI used to create AI applications
Auto AI : AI used to create AI applicationsAuto AI : AI used to create AI applications
Auto AI : AI used to create AI applications
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Dataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine Learning
 

Viewers also liked

Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
Intel Nervana
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
Intel Nervana
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
Intel Nervana
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
HPC Advisory Council Stanford Conference 2016
HPC Advisory Council Stanford Conference 2016HPC Advisory Council Stanford Conference 2016
HPC Advisory Council Stanford Conference 2016
Baidu USA Research
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
Sean Everett
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 

Viewers also liked (7)

Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
HPC Advisory Council Stanford Conference 2016
HPC Advisory Council Stanford Conference 2016HPC Advisory Council Stanford Conference 2016
HPC Advisory Council Stanford Conference 2016
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 

Similar to Scaling Up Presentation

Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Joachim Schlosser
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW Modernization
Robert Gleave
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Cynthia Saracco
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
Arcadia Data
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
smitasatpathy2
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
Ganesan Narayanasamy
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data
 
Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the Cloud
IBM Analytics
 
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Challenges of Deep Learning in Computer Vision Webinar - Tessellate ImagingChallenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Adhesh Shrivastava
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
BigML, Inc
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
IBM Sverige
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
Institute of Contemporary Sciences
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Anant Corporation
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
Precisely
 
Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIs
Rakuten Group, Inc.
 
Scaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownScaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling Down
Databricks
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
Wasm1953
 

Similar to Scaling Up Presentation (20)

Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW Modernization
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
 
Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the Cloud
 
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Challenges of Deep Learning in Computer Vision Webinar - Tessellate ImagingChallenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron...
 
Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIs
 
Scaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownScaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling Down
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 

Scaling Up Presentation

  • 1. Scaling Up Deep Learning on Clusters Max Xie Qualia Li Aleks Kamko
  • 2. Who are we? Aleks did his undergrad in EECS at Berkeley. He’s back for some Data Science experience. Max has a B.S. in Math and would like to explore the world of data with it. Qualia majored in Software Engineering at ZJU. He is interested in how big data works. Qualia Aleks Max
  • 3. 2. What is “Machine Learning”!? a. What is “Deep Learning”? 3. History + Status of Project 4. Scaling Up - What and How 5. Stakeholders 6. Roadmap Agenda
  • 4. Project Overview Scaling Up Deep Learning on Clusters Extend the BIDMach machine learning framework Implement + benchmark novel deep learning algorithms Extend framework to take advantage of clusters. More machines → faster
  • 5. Project Overview Scaling Up Deep Learning on Clusters Extend the BIDMach machine learning framework Implement + benchmark novel deep learning algorithms Extend framework to take advantage of clusters. More machines → faster Work with OpenChai to bring machine learning to enterprise + consumers Adapt BIDMach to OpenChai’s bleeding
  • 6. Machine Learning 101 - Computer algorithms automatically learning from data and information.
  • 7. Machine Learning 101 - Computer algorithms automatically learning from data and information. History: 1950 - Alan Turing creates the “Turing Test” 1952 - First ML program, for checkers. 1990s - Shifts from knowledge-driven to data- driven 1997 - IBM’s Deep Blue beats world champion at chess.
  • 8. So... Deep Learning? First, Neural Networks: Inspired by the structure and functional aspects of biological neural networks, modeling complex relationships
  • 9. So... Deep Learning? First, Neural Networks: Inspired by the structure and functional aspects of biological neural networks, modeling complex relationships 1957 - The first neural network for computers.
  • 10. Deep Learning!! = Deeper Neural Networks! Focusing on different levels of abstraction in representing objects.
  • 11. Deep Learning!! = Deeper Neural Networks! Focusing on different levels of abstraction in representing objects. Applications: speech recognition, computer vision, robotics, planning, etc. BIDMach doesn’t have much implementation for Deep Learning.
  • 12. History of Project: Overview Target Make the BID Data Suite the fastest Big Data tool on the Internet Components Storage, CPU and GPU BIDMat -- Matrix Algebra, Data Manipulation BIDMach -- Machine Learning Scaling Up Butterfly Mixing Sparse AllReduce
  • 13. History of Project: BIDMach and the BID Data Suite BIDMach is a machine learning framework that is part of the larger BID Data Suite. Motives for the suite include: 1. Exploratory data analysis → quickly sifting through data, making hypotheses about structure, rapidly testing
  • 14. History of Project: BIDMach and the BID Data Suite BIDMach is a machine learning framework that is part of the larger BID Data Suite. Motives for the suite include: 1. Exploratory data analysis → quickly sifting through data, making hypotheses about structure, rapidly testing 2. Rapid deployment and live tuning of models in commercial setting → very high performance in both prototype and production setting
  • 15. History of Project: BIDMach and the BID Data Suite BIDMach is a machine learning framework that is part of the larger BID Data Suite. Motives for the suite include: 1. Exploratory data analysis → quickly sifting through data, making hypotheses about structure, rapidly testing 2. Rapid deployment and live tuning of models in commercial setting → very high performance in both prototype and production setting 3.Make single-machine algorithms fast first, improve using clusters later
  • 16. 300% BIDMach offers up to 300% gain in performance, compared with state-of-art. (John Canny et. al, 2015)
  • 17. BIDMach offers up to 300% gain in performance, compared with state-of-art. (John Canny et. al, 2015) Netflix Movie Recommendation (A Matrix Factorization Problem)
  • 18. Scaling Up “Make single-machine algorithms fast first, improve using clusters later” “The Cloud”
  • 19. Scaling Up “Make single-machine algorithms fast first, improve using clusters later” Using only a single machine, BIDMach already beats most cluster- based frameworks.
  • 20. Scaling Up “Make single-machine algorithms fast first, improve using clusters later” Using only a single machine, BIDMach already beats most cluster- based frameworks. Our capstone project aims to extend BIDMach’s existing algorithms to utilize the compute power of a cluster, making BIDMach even faster!
  • 21. “Scaling Up” to an Enterprise Solution Our main industry partner, OpenChai, is working with us to bring BIDMach and Machine Learning to enterprise
  • 22. “Scaling Up” to an Enterprise Solution Our main industry partner, OpenChai, is working with us to bring BIDMach and Machine Learning to enterprise ● Retail ● Banking ● Genetics ● Mobile ● Home
  • 23. “Scaling Up” to a Consumer Solution We’re also trying to bring machine learning to your home and local businesses! Using mobile-phone GPUs and CPUs → affordable, powerful, and power-efficient
  • 24. “Scaling Up” Goal 1: Bring BIDMach to clusters. “The Cloud”
  • 25. “Scaling Up” Goal 2: Bring clusters to your home! “The Cloud”
  • 27. Roadmap 1. Get familiar with the various systems 2. Implement Random Forests & K-Means 3. Improve BIDMach distributed communication system 4. Working on model-parallel algorithms: - Logistic regression - Recurrent Neural Networks - Convolutional Neural Networks - Translational Model Oct. Dec. Feb. May Technical Business
  • 28. Roadmap 1. Get familiar with the various systems 2. Implement Random Forests & K-Means 3. Improve BIDMach distributed communication system 4. Working on model-parallel algorithms: - Logistic regression - Recurrent Neural Networks - Convolutional Neural Networks - Translational Model - Meet with OpenChai - Get BIDMach working on OpenChai hardware - Running benchmarks - Marketing “OpenChai + BIDMach” Oct. Dec. Feb. May Technical Business
  • 29. We will generally be co-working. Aleks: Deal with the machines on cloud and how they communicate. Qualia: Learn Scala and write codes. Max: Understand how variables are interrelated in model-parallel algorithms Roles Qualia Aleks Max
  • 30. Revisiting Project Overview Scaling Up Deep Learning on Clusters Extend the BIDMach machine learning framework Implement + benchmark novel deep learning algorithms Extend framework to take advantage of clusters. More machines → faster
  • 31. Revisiting Project Overview Scaling Up Deep Learning on Clusters Extend the BIDMach machine learning framework Implement + benchmark novel deep learning algorithms Extend framework to take advantage of clusters. More machines → faster Work with OpenChai to bring machine learning to enterprise + consumers Adapt BIDMach to OpenChai’s bleeding
  • 32. “A breakthrough in machine learning would be worth ten Microsofts. ” - Bill Gates