SlideShare a Scribd company logo
1 of 30
Download to read offline
CLOUD-NATIVE MACHINE LEARNING
Emerging Trends and the Road Ahead
Tristan Zajonc, CTO for Machine Learning @ Cloudera
2 © Cloudera, Inc. All rights reserved.
WHO AM I?
Tristan Zajonc
Cloudera
@tristanzajonc
CTO for Machine Learning at Cloudera
Previously lead ML engineering team at Cloudera
Former CEO & cofounder of Sense (acquired by Cloudera in 2016)
Originally Bayesian statistician and economist
3 © Cloudera, Inc. All rights reserved.
DISCLAIMER
The information in this document is proprietary to Cloudera. No part of this document may be reproduced or
disclosed without the express prior written permission of Cloudera.
This document is not binding upon Cloudera in any way (including with respect to Cloudera’s course of
business, product strategy, future releases, or development commitments), and is not a part of or subject to
any license agreement or other agreement you may have with Cloudera. Cloudera makes no commitments
about any future developments or functionality in any Cloudera product. The development, release, and timing
of release of any software features or functionality described in this document remains at the discretion of
Cloudera and you should not rely on any statements about development plans or anticipated functionality in
this document when making any purchasing decisions.
Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the
accuracy or completeness of the information, text, graphics, links or other items contained within this
material. This document is provided without a warranty of any kind, either express or implied, including but
not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement.
Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or
consequential damages that may result from the use of these materials.”
4 © Cloudera, Inc. All rights reserved. 4
enable the end-to-end machine learning lifecycle for teams
building thousands of applications across petabytes of data
with an enterprise-ready platform optimized for the cloud
Rapid provisioning and
experimentation
DevOps mindset
From development to
production
Team
collaboration
and shared
context
Any
framework,
your data, your
models
Secure by default
CPUs, GPUS, TPUs,
and beyond
Massive, diverse
data
Automation Runs anywhere
Confidential-Restricted – For Discussion Purposes
Only
5 © Cloudera, Inc. All rights reserved.
ACCELERATING THREE STAGES OF MACHINE LEARNING
Manage pipelines + models
Deploy models
Automate pipelines
Monitor performance
DEPLOYDEVELOP
Make teams more productive
Explore data
Develop reports, pipelines, models
Collaborate with peers
TRAIN
Scale resources efficiently
Train models
Tune parameters
Track performance
End-to-end machine learning infrastructure for teams at scale
MANAGE
Run anywhere with a common architecture
Manage access and resources
Scale cost with usage
6 © Cloudera, Inc. All rights reserved.
TODAY: CLOUDERA DATA SCIENCE WORKBENCH
Accelerate Machine Learning from Research to Production on CDH/HDP
For data scientists
• Experiment faster
Use R, Python, or Scala with
on-demand compute and
secure CDH data access
• Work together
Share reproducible research
with your whole team
• Deploy with confidence
Get to production repeatably
and without recoding
For IT professionals
• Bring data science to the data
Give your data science team
more freedom while reducing
the risk and cost of silos
• Secure by default
Leverage common security and
governance across workloads
• Run anywhere
On-premises or in the cloud
7 © Cloudera, Inc. All rights reserved.
+
8 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH FOR HDP/CDP
Enterprise data science and ML powered by Kubernetes
• Built with Docker and Kubernetes
• Isolated, reproducible user environments
• Supports both big and small data
• Local Python, R, Scala runtimes
• Connect to any data source
• Schedule & share GPU resources
• Run Spark, Impala, and other CDH services
• Secure and governed by default
• Easy, audited access to Kerberized clusters
• Leverages SDX platform services
CDH/HDP CDH/HDP
Cloudera Manager / Ambari
gateway node(s) CDH/HDP nodes
Hive, HDFS, ...
CDSW CDSW
Master
...
Engine
EngineEngine
EngineEngine
K8S K8S
9 © Cloudera, Inc. All rights reserved.
SOME LESSONS LEARNED
10 © Cloudera, Inc. All rights reserved.
LESSON 1: ENTERPRISE ML REQUIRES BIGGER PICTURE
Streaming
Ingest
Batch Ingest
Machine
Learning Tools
BI Tools and
SQL Editors
Data Products
DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT
MACHINE
LEARNING
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
11 © Cloudera, Inc. All rights reserved.
UBER
Michelangelo: Uber’s Machine Learning Platform
Source: https://eng.uber.com/michelangelo/
DATA CENTER
PUBLIC CLOUD
12 © Cloudera, Inc. All rights reserved.
NETFLIX
Netflix recommendation platform
Source: https://medium.com/netflix-techblog/netflix-at-spark-ai-summit-2018-5304749ed7fa
PUBLIC CLOUD
13 © Cloudera, Inc. All rights reserved.
LESSON 2: EMERGING CONSENSUS ON MODERN ML STACK
Big Data
Platform
ML/AI
Frameworks
Container
Infrastructure
14 © Cloudera, Inc. All rights reserved.
Data scientists want to use every library under the sun, platform should support that
Workbench
(interactive)
ANALYZ
E
Experiments
(batch, ad-hoc)
TRAIN
Reports
(shared output)
SHARE
Jobs
(batch, scheduled)
REPEAT
Models
(RPC APIs)
SERVE
LOOSELY COUPLED, ML/AI FRAMEWORK AGNOSTIC
LESSON 3: FOCUS ON WORKFLOWS NOT FRAMEWORKS
15 © Cloudera, Inc. All rights reserved.
Cloudera ML is built around one Operator and four “CRDs”:
• Session (Interactive)
• Experiment (Batch)
• Job (Scheduled)
• Model (Online API)
No long-lived operators for particular libraries: TFJob, SparkJob, etc
Kubernetes is not exposed to data scientist. Goal is serverless experience.
WHAT DOES THIS LOOK LIKE IN K8S?
Let’s talk operators and CRDs….
Confidential-Restricted – For Discussion Purposes
Only
16 © Cloudera, Inc. All rights reserved.
LESSON 4: NO SINGLE BEST INFRASTRUCTURE FOR CUSTOMERS
Non-Cloud Private Cloud Public Cloud PaaS
I want to
maximize
• Cost-efficiency • Control, elasticity,
and convenience
• Control, elasticity,
and convenience
• Agility
I want to
minimize
• Dependence on
unproven technology
• Resource contention
between departments
• Dependence on data
center floor space
• Dependence on IT
and therefore need
as simple as possible
I want to
standardize
• On whatever
provides the best
ROI
• On a single
environment for the
entire data center
• On a single cloud
provider for all
infrastructure needs
• On whatever is
easiest to use
I want to
store my data
• On premises
because cheaper
and/or more secure
• On premises due to
company /
government mandate
• In the cloud because
easier
• In the cloud because
easier
17 © Cloudera, Inc. All rights reserved.
ROAD AHEAD
18 © Cloudera, Inc. All rights reserved.
TODAY: CURRENT CDSW ARCHITECTURE
Containerized gateway to CDH/HDP cluster for data scientists
• Benefits
• Easy to attach for self-service data science
• Seamless connectivity to existing clusters
• Isolated, reproducible user environments
• Tradeoffs
• Requires sizing, managing two clusters
• Requires dependency coordination
• Not optimized for distributed training
• Complex, inelastic setup in public cloud
CDH/HDP CDH/HDP
Cloudera Manager / Ambari
gateway node(s) CDH/HDP nodes
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
K8S K8S
19 © Cloudera, Inc. All rights reserved.
def risk_simulation():
import numpy
import tensorflow
...
results = sc.parallelize(xrange(0, 1000)) 
.map(risk_simulation).collect()
20 © Cloudera, Inc. All rights reserved.
cldr mlx run --gpus 3 --cmd “python train.py”
21 © Cloudera, Inc. All rights reserved.
CLOUDERA ML-X FOR CDP
Cloud-native enterprise machine learning platform
DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS
CLOUDERA ML RUNTIME
Python/R, Spark, TensorFlow, CPU/GPU-Optimized
Interactive Development Batch Pipelines Predictive APIs
Full capability of CDSW
Rapid cloud provisioning
and elastic autoscaling
Unified data engineering and
ML with seamless
dependency management
Multi-cloud portability
powered by Kubernetes
Connects to HDFS or
cloud object storage
and shared metadata
Accelerated deep learning
with distributed GPU training
As-as-service or in private
cloud
CONTAINER CLOUD
EKS, AKS, GKE, OpenShift
DATA CATALOG
GOVERNANCESECURITY LIFECYCLEWORKLOAD XM
STORAGE Amazon
S3
Microsoft
ADLS
HDFS KUDU
22 © Cloudera, Inc. All rights reserved.
Requires and extends CDH/HDP, pushing
distributed compute to the cluster
CDH / HDP CDH / HDP
Cloudera Manager / Ambari
gateway node(s) CDH/HDP nodes
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
Cloud optimized, elastic distributed compute;
can optionally use CDH / HDP
Kubernetes
(EKS, GKE, AKS,
OpenShift)
CDH / HDP CDH / HDP
Hive, HDFS,
HMS, Sentry,
Navigator
MLX MLX
...
Master
...
Engine
EngineEngine
EngineEngine
Object
Store
Altus Director / CloudBreak
OPTIONAL
DATA SCIENCE WORKBENCH vs. CLOUDERA ML-X
Cloud ML
APIs
23 © Cloudera, Inc. All rights reserved.
• Simple dependency management, particularly for ML workloads
• python: pip install, R: install.packages, etc
• Unified workflow for distributed data engineering and machine learning
• Elastic compute (CPU / GPU) in the cloud
• Portable form-factor optimized for cloud and private cloud
• Long-term potential to unify resource management across workloads
WHY SPARK ON KUBERNETES?
24 © Cloudera, Inc. All rights reserved.
SPARK ON KUBERNETES
• Spark has native integration with Kubernetes since 2.3+
• Runs Spark fully containerized on K8s
• Allows Spark to add / remove executors with user specified Spark configuration
• Able to leverages Kubernetes features to facilitate launching Spark
(e.g: Node Affinities, RBAC, Namespaces, etc).
• But still many limits…
25 © Cloudera, Inc. All rights reserved.
Kubernetes
CMLCML
Engine
Spark
Driver
(Client
Mode)
Spark
Executor
Fluentd Fluentd
Project
Files
Project
Files
Spark
Executor
Project
Files
Hive, HDFS,
HMS, Sentry,
Navigator- Create user service account and namespace
for security and isolation
- Configure Spark driver / executor
configuration to connect to k8s API server.
- Also populate kerberos related information
for accessing HDFS and other CDH services.
- Populate dependency management related
configurations using pod template: volume,
image, user information, etc.
- Fluentd configured to pick up logging and
metric information to external storage
MAKING SPARK EASY IN ML-X
26 © Cloudera, Inc. All rights reserved.
Development: NFS/EFS shared filesystem gives laptop like developer experience
Production: Built in source-to-image (S2I) pipeline gives PaaS experience
• Provides declarative pathway from version control to experiment or model
• Provides easy deployment from other environments, e.g. laptops, webhooks
RUN
• Stage 3: Run image
SNAPSHOT
• Stage 1: Git snapshot
HOW DEPENDENCY MANAGEMENT WORKS
Simplifying dependency management from development to production
BUILD
• Stage 2: Docker build
27 © Cloudera, Inc. All rights reserved.
DEMO
28 © Cloudera, Inc. All rights reserved.
• Spark
• Dynamic allocation and shuffle service
• GPU support
• Scale testing and performance improvements
• K8s
• Advanced scheduling (FairScheduler, Hierarchical Queues, etc)
• Fast/lazy image distribution at scale
• Other Interesting Frameworks
• Ray
• Distributed Tensorflow & Pytorch (Horovod)
WHAT’S NEXT
29 © Cloudera, Inc. All rights reserved.
Relevant Sigs
sig-big-data, sig-ml, sig-scheduling
THANK YOU

More Related Content

What's hot

Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...DataWorks Summit
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricDataWorks Summit
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Hortonworks
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Eric Rice
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerDataWorks Summit
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...DataWorks Summit
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingDataWorks Summit
 
Audi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid CloudAudi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid CloudDataWorks Summit
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...Cloudera, Inc.
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataDataWorks Summit/Hadoop Summit
 

What's hot (20)

Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Apache Deep Learning 201
Apache Deep Learning 201Apache Deep Learning 201
Apache Deep Learning 201
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Audi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid CloudAudi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid Cloud
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 

Similar to Cloud-Native Machine Learning: Emerging Trends and the Road Ahead

Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019 Timothy Spann
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Enterprise machine learning on k8s lessons learned and the road ahead
Enterprise machine learning on k8s   lessons learned and the road aheadEnterprise machine learning on k8s   lessons learned and the road ahead
Enterprise machine learning on k8s lessons learned and the road aheadTimothy Chen
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...Timothy Spann
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019Timothy Spann
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudCloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudGoDataDriven
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA DATASCIENCE
 

Similar to Cloud-Native Machine Learning: Emerging Trends and the Road Ahead (20)

Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Enterprise machine learning on k8s lessons learned and the road ahead
Enterprise machine learning on k8s   lessons learned and the road aheadEnterprise machine learning on k8s   lessons learned and the road ahead
Enterprise machine learning on k8s lessons learned and the road ahead
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Cloud-Native Machine Learning: Emerging Trends and the Road Ahead

  • 1. CLOUD-NATIVE MACHINE LEARNING Emerging Trends and the Road Ahead Tristan Zajonc, CTO for Machine Learning @ Cloudera
  • 2. 2 © Cloudera, Inc. All rights reserved. WHO AM I? Tristan Zajonc Cloudera @tristanzajonc CTO for Machine Learning at Cloudera Previously lead ML engineering team at Cloudera Former CEO & cofounder of Sense (acquired by Cloudera in 2016) Originally Bayesian statistician and economist
  • 3. 3 © Cloudera, Inc. All rights reserved. DISCLAIMER The information in this document is proprietary to Cloudera. No part of this document may be reproduced or disclosed without the express prior written permission of Cloudera. This document is not binding upon Cloudera in any way (including with respect to Cloudera’s course of business, product strategy, future releases, or development commitments), and is not a part of or subject to any license agreement or other agreement you may have with Cloudera. Cloudera makes no commitments about any future developments or functionality in any Cloudera product. The development, release, and timing of release of any software features or functionality described in this document remains at the discretion of Cloudera and you should not rely on any statements about development plans or anticipated functionality in this document when making any purchasing decisions. Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials.”
  • 4. 4 © Cloudera, Inc. All rights reserved. 4 enable the end-to-end machine learning lifecycle for teams building thousands of applications across petabytes of data with an enterprise-ready platform optimized for the cloud Rapid provisioning and experimentation DevOps mindset From development to production Team collaboration and shared context Any framework, your data, your models Secure by default CPUs, GPUS, TPUs, and beyond Massive, diverse data Automation Runs anywhere
  • 5. Confidential-Restricted – For Discussion Purposes Only 5 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Manage pipelines + models Deploy models Automate pipelines Monitor performance DEPLOYDEVELOP Make teams more productive Explore data Develop reports, pipelines, models Collaborate with peers TRAIN Scale resources efficiently Train models Tune parameters Track performance End-to-end machine learning infrastructure for teams at scale MANAGE Run anywhere with a common architecture Manage access and resources Scale cost with usage
  • 6. 6 © Cloudera, Inc. All rights reserved. TODAY: CLOUDERA DATA SCIENCE WORKBENCH Accelerate Machine Learning from Research to Production on CDH/HDP For data scientists • Experiment faster Use R, Python, or Scala with on-demand compute and secure CDH data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  • 7. 7 © Cloudera, Inc. All rights reserved. +
  • 8. 8 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH FOR HDP/CDP Enterprise data science and ML powered by Kubernetes • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Connect to any data source • Schedule & share GPU resources • Run Spark, Impala, and other CDH services • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages SDX platform services CDH/HDP CDH/HDP Cloudera Manager / Ambari gateway node(s) CDH/HDP nodes Hive, HDFS, ... CDSW CDSW Master ... Engine EngineEngine EngineEngine K8S K8S
  • 9. 9 © Cloudera, Inc. All rights reserved. SOME LESSONS LEARNED
  • 10. 10 © Cloudera, Inc. All rights reserved. LESSON 1: ENTERPRISE ML REQUIRES BIGGER PICTURE Streaming Ingest Batch Ingest Machine Learning Tools BI Tools and SQL Editors Data Products DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT MACHINE LEARNING DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE
  • 11. 11 © Cloudera, Inc. All rights reserved. UBER Michelangelo: Uber’s Machine Learning Platform Source: https://eng.uber.com/michelangelo/ DATA CENTER PUBLIC CLOUD
  • 12. 12 © Cloudera, Inc. All rights reserved. NETFLIX Netflix recommendation platform Source: https://medium.com/netflix-techblog/netflix-at-spark-ai-summit-2018-5304749ed7fa PUBLIC CLOUD
  • 13. 13 © Cloudera, Inc. All rights reserved. LESSON 2: EMERGING CONSENSUS ON MODERN ML STACK Big Data Platform ML/AI Frameworks Container Infrastructure
  • 14. 14 © Cloudera, Inc. All rights reserved. Data scientists want to use every library under the sun, platform should support that Workbench (interactive) ANALYZ E Experiments (batch, ad-hoc) TRAIN Reports (shared output) SHARE Jobs (batch, scheduled) REPEAT Models (RPC APIs) SERVE LOOSELY COUPLED, ML/AI FRAMEWORK AGNOSTIC LESSON 3: FOCUS ON WORKFLOWS NOT FRAMEWORKS
  • 15. 15 © Cloudera, Inc. All rights reserved. Cloudera ML is built around one Operator and four “CRDs”: • Session (Interactive) • Experiment (Batch) • Job (Scheduled) • Model (Online API) No long-lived operators for particular libraries: TFJob, SparkJob, etc Kubernetes is not exposed to data scientist. Goal is serverless experience. WHAT DOES THIS LOOK LIKE IN K8S? Let’s talk operators and CRDs….
  • 16. Confidential-Restricted – For Discussion Purposes Only 16 © Cloudera, Inc. All rights reserved. LESSON 4: NO SINGLE BEST INFRASTRUCTURE FOR CUSTOMERS Non-Cloud Private Cloud Public Cloud PaaS I want to maximize • Cost-efficiency • Control, elasticity, and convenience • Control, elasticity, and convenience • Agility I want to minimize • Dependence on unproven technology • Resource contention between departments • Dependence on data center floor space • Dependence on IT and therefore need as simple as possible I want to standardize • On whatever provides the best ROI • On a single environment for the entire data center • On a single cloud provider for all infrastructure needs • On whatever is easiest to use I want to store my data • On premises because cheaper and/or more secure • On premises due to company / government mandate • In the cloud because easier • In the cloud because easier
  • 17. 17 © Cloudera, Inc. All rights reserved. ROAD AHEAD
  • 18. 18 © Cloudera, Inc. All rights reserved. TODAY: CURRENT CDSW ARCHITECTURE Containerized gateway to CDH/HDP cluster for data scientists • Benefits • Easy to attach for self-service data science • Seamless connectivity to existing clusters • Isolated, reproducible user environments • Tradeoffs • Requires sizing, managing two clusters • Requires dependency coordination • Not optimized for distributed training • Complex, inelastic setup in public cloud CDH/HDP CDH/HDP Cloudera Manager / Ambari gateway node(s) CDH/HDP nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine K8S K8S
  • 19. 19 © Cloudera, Inc. All rights reserved. def risk_simulation(): import numpy import tensorflow ... results = sc.parallelize(xrange(0, 1000)) .map(risk_simulation).collect()
  • 20. 20 © Cloudera, Inc. All rights reserved. cldr mlx run --gpus 3 --cmd “python train.py”
  • 21. 21 © Cloudera, Inc. All rights reserved. CLOUDERA ML-X FOR CDP Cloud-native enterprise machine learning platform DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS CLOUDERA ML RUNTIME Python/R, Spark, TensorFlow, CPU/GPU-Optimized Interactive Development Batch Pipelines Predictive APIs Full capability of CDSW Rapid cloud provisioning and elastic autoscaling Unified data engineering and ML with seamless dependency management Multi-cloud portability powered by Kubernetes Connects to HDFS or cloud object storage and shared metadata Accelerated deep learning with distributed GPU training As-as-service or in private cloud CONTAINER CLOUD EKS, AKS, GKE, OpenShift DATA CATALOG GOVERNANCESECURITY LIFECYCLEWORKLOAD XM STORAGE Amazon S3 Microsoft ADLS HDFS KUDU
  • 22. 22 © Cloudera, Inc. All rights reserved. Requires and extends CDH/HDP, pushing distributed compute to the cluster CDH / HDP CDH / HDP Cloudera Manager / Ambari gateway node(s) CDH/HDP nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine Cloud optimized, elastic distributed compute; can optionally use CDH / HDP Kubernetes (EKS, GKE, AKS, OpenShift) CDH / HDP CDH / HDP Hive, HDFS, HMS, Sentry, Navigator MLX MLX ... Master ... Engine EngineEngine EngineEngine Object Store Altus Director / CloudBreak OPTIONAL DATA SCIENCE WORKBENCH vs. CLOUDERA ML-X Cloud ML APIs
  • 23. 23 © Cloudera, Inc. All rights reserved. • Simple dependency management, particularly for ML workloads • python: pip install, R: install.packages, etc • Unified workflow for distributed data engineering and machine learning • Elastic compute (CPU / GPU) in the cloud • Portable form-factor optimized for cloud and private cloud • Long-term potential to unify resource management across workloads WHY SPARK ON KUBERNETES?
  • 24. 24 © Cloudera, Inc. All rights reserved. SPARK ON KUBERNETES • Spark has native integration with Kubernetes since 2.3+ • Runs Spark fully containerized on K8s • Allows Spark to add / remove executors with user specified Spark configuration • Able to leverages Kubernetes features to facilitate launching Spark (e.g: Node Affinities, RBAC, Namespaces, etc). • But still many limits…
  • 25. 25 © Cloudera, Inc. All rights reserved. Kubernetes CMLCML Engine Spark Driver (Client Mode) Spark Executor Fluentd Fluentd Project Files Project Files Spark Executor Project Files Hive, HDFS, HMS, Sentry, Navigator- Create user service account and namespace for security and isolation - Configure Spark driver / executor configuration to connect to k8s API server. - Also populate kerberos related information for accessing HDFS and other CDH services. - Populate dependency management related configurations using pod template: volume, image, user information, etc. - Fluentd configured to pick up logging and metric information to external storage MAKING SPARK EASY IN ML-X
  • 26. 26 © Cloudera, Inc. All rights reserved. Development: NFS/EFS shared filesystem gives laptop like developer experience Production: Built in source-to-image (S2I) pipeline gives PaaS experience • Provides declarative pathway from version control to experiment or model • Provides easy deployment from other environments, e.g. laptops, webhooks RUN • Stage 3: Run image SNAPSHOT • Stage 1: Git snapshot HOW DEPENDENCY MANAGEMENT WORKS Simplifying dependency management from development to production BUILD • Stage 2: Docker build
  • 27. 27 © Cloudera, Inc. All rights reserved. DEMO
  • 28. 28 © Cloudera, Inc. All rights reserved. • Spark • Dynamic allocation and shuffle service • GPU support • Scale testing and performance improvements • K8s • Advanced scheduling (FairScheduler, Hierarchical Queues, etc) • Fast/lazy image distribution at scale • Other Interesting Frameworks • Ray • Distributed Tensorflow & Pytorch (Horovod) WHAT’S NEXT
  • 29. 29 © Cloudera, Inc. All rights reserved. Relevant Sigs sig-big-data, sig-ml, sig-scheduling