SlideShare a Scribd company logo
1 of 23
MACHINE LEARNING IN THE ENTERPRISE
SAUMITRA BURAGOHAIN | VICE PRESIDENT, PRODUCT MANAGEMENT
VIDYA RAMAN | DIRECTOR, PRODUCT MANAGEMENT
MICHAEL GREGORY, MACHINE LEARNING FIELD ENGINEERING LEAD
2 © Cloudera, Inc. All rights reserved.
The Industry’s First Enterprise Data Cloud
From the Edge to AI
Data
Engineering
Data
Warehouse
IoT, Ingest &
Streaming
AI, ML &
Data Science
Enterprise Data Cloud
3© Cloudera, Inc. All rights reserved.
OUR APPROACH TO MACHINE LEARNING
Modern enterprise platform, tools and expert guidance to help you unlock
business value with ML/AI
Agile platform to build,
train, and deploy many
scalable ML applications
Enterprise data science
tools to accelerate
team productivity
Expert guidance,
services & training to
fast track value & scale
4 © Cloudera, Inc. All rights reserved.
PLATFORM
5 © Cloudera, Inc. All rights reserved.
Infinitely Scalable
(Billions of files, Exabytes)
Low TCO
(Less Storage Overhead)
BIGGER
TRUSTED
Data Swamp->Data Lake
SMARTER
Deep Learning frameworks
(TensorFlow, Caffe)
GPU Pooling/Isolation
Faster time to deployment
(Containerized Micro-Services)
FASTER
App 1 App 2
S3, ADLS/WASB, GCS with Truly
Incremental Replication
HYBRID
HDP MARKET DRIVERS
6 © Cloudera, Inc. All rights reserved.
DATA PLATFORM: HDP
Enterprise Grade Data Management: Hybrid, Secure and Scalable
● On-Premises and Multi-Cloud
● Enterprise Security & Data Lineage
● Choice of Deep Learning Support (Co-locate
TensorFlow/GPU/Data)
● Dockerized for Packaging/Isolation
7 © Cloudera, Inc. All rights reserved.
THE CHALLENGE
Balance these needs
DATA SCIENCE
• Access to granular data
• Flexibility
• Preferred open source tools
• Elastic provisioning
• Compute
• Storage
• Reproducible research
• Path to production
DATA MANAGEMENT
• Security
• Governance
• Standards
• Low maintenance
• Low cost
• Self-service access
8 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
9 © Cloudera, Inc. All rights reserved.
A PLATFORM FOR
MACHINE LEARNING
• Open platform
• Complete lifecycle
• Team collaboration
• Enterprise ready
• Runs anywhere
RESEARCH | PRODUCTION
LOCAL | SPARK | HIVE
DEPLOYMENT
COMPUTE
OPEN SOURCE ECOSYSTEMALGORITHMS
SELF-SERVICE
TOOLS
SOLUTIONS | USE CASESAPPS
CLOUD ON-PREMISES
ADLSS3 HDFS KUDU
CATALOG | SECURITY | GOVERNANCE
SHARED
CONTEXT
10 © Cloudera, Inc. All rights reserved.
THE TYPICAL SOLUTION
“If I can’t use my favorite tools, I’ll…”
• Copy data to my laptop
• Copy data to a data science appliance
• Copy data to a cloud service
Why this is a problem:
• Complicates security
• Breaks data governance
• Adds latency to process
• Makes collaboration more difficult
• Complicates model management and
deployment
• Creates infrastructure silos
11 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Accelerate Machine Learning from Research to Production
For data scientists
• Experiment faster
Use R, Python, or Scala with on-
demand compute and secure
CDH data access
• Work together
Share reproducible research
with your whole team
• Deploy with confidence
Get to production repeatably
and without recoding
For IT professionals
• Bring data science to the data
Give your data science team
more freedom while reducing
the risk and cost of silos
• Secure by default
Leverage common security and
governance across workloads
• Run anywhere
On-premises or in the cloud
12 © Cloudera, Inc. All rights reserved.
A MODERN DATA SCIENCE ARCHITECTURE
Containerized environments with scalable, on-demand compute
• Built with Docker and Kubernetes
• Isolated, reproducible user environments
• Supports both big and small data
• Local Python, R, Scala runtimes
• Schedule & share GPU resources
• Run Spark, Impala, and other CDH services
• Secure and governed by default
• Easy, audited access to Kerberized clusters
• Leverages Ranger for Security and Atlas for
Governance
HDP HDP
Ambari
gateway node(s) HDP nodes
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
13 © Cloudera, Inc. All rights reserved.
ACCELERATED DEEP LEARNING WITH GPUS
Multi-tenant GPU support on-premises or cloud
• Extend CDSW to deep learning
• Schedule & share GPU resources
• Train on GPUs, deploy on CPUs
• Works on-premises or cloud
CDSW
GPUCPU
HDP
CPU
HDP
single-node
training
distributed
training, scoring
“Our data scientists want GPUs, but
we need multi-tenancy. If they go to
the cloud on their own, it’s
expensive and we lose governance.”
GPU Available on HDP 3.x
GPU
14 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Accelerate and simplify machine learning from research to production
ANALYZE DATA
• Explore data securely and
share insights with the
team
TRAIN MODELS
• Run, track, and compare
reproducible experiments
DEPLOY APIs
• Deploy and monitor models
as APIs to serve predictions
MANAGE SHARED RESOURCES
• Provide a secure, collaborative, self-service platform for your data science teams
15 © Cloudera, Inc. All rights reserved.
INTRODUCING EXPERIMENTS
Versioned model training runs for evaluation and reproducibility
Data scientists can now...
• Create a snapshot of model code,
dependencies, and configuration
necessary to train the model
• Build and execute the training run
in an isolated container
• Track specified model metrics,
performance, and model artifacts
• Inspect, compare, or deploy prior
models
16 © Cloudera, Inc. All rights reserved.
INTRODUCING MODELS
Machine learning models as one-click microservices (REST APIs)
1. Choose file, e.g. score.py
2. Choose function, e.g. forecast
f = open('model.pk', 'rb')
model = pickle.load(f)
def forecast(data):
return model.predict(data)
3. Choose resources
4. Deploy!
Running model containers also have
access to CDH for data lookups.
17 © Cloudera, Inc. All rights reserved.
MODEL MANAGEMENT
View, test, monitor, and update models by team or project
18 © Cloudera, Inc. All rights reserved.
DEMO
19 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH 1.5 on HDP 3.1.0 & 2.6.5
Accelerate and simplify machine learning from research to production in HDP install base
© Cloudera, Inc. All rights reserved. 20
CDSW
TRAINING
• Self-paced video training
• Learn end-to-end ML
workflows in CDSW
• Code examples
• Python and R tracks
• Customers can register at
university.cloudera.com
THANK YOU
22 © Cloudera, Inc. All rights reserved.
DISCLAIMER
The information in this document is proprietary to Cloudera. No part of this document may be reproduced,
copied or transmitted in any form for any purpose without the express prior written permission of Cloudera.
This document is a preliminary version and not subject to your license agreement or any other agreement
with Cloudera. This document contains only intended strategies, developments and functionalities of
Cloudera products and is not intended to be binding upon Cloudera to any particular course of business,
product strategy and/or development. Please note that this document is subject to change and may be
changed by Cloudera at any time without notice.
Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant
the accuracy or completeness of the information, text, graphics, links or other items contained within this
material. This document is provided without a warranty of any kind, either express or implied, including but
not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement.
Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect
or consequential damages that may result from the use of these materials. The limitation shall not apply in
cases of gross negligence.
23 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT CLOUDERA
Our philosophy
We empower our customers to
run their business on data with an
open platform:
● Your data
● Open algorithms
● Running anywhere
We accelerate enterprise data science.

More Related Content

What's hot

What's hot (20)

Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptx
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with Dremio
 
Azure purview
Azure purviewAzure purview
Azure purview
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Modern Data Warehouse with Azure Synapse.pdf
Modern Data Warehouse with Azure Synapse.pdfModern Data Warehouse with Azure Synapse.pdf
Modern Data Warehouse with Azure Synapse.pdf
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 

Similar to Introducing Cloudera Data Science Workbench for HDP 2.12.19

Similar to Introducing Cloudera Data Science Workbench for HDP 2.12.19 (20)

Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Edge to ai analytics from edge to cloud with efficient movement of machine data
Edge to ai  analytics from edge to cloud with efficient movement of machine dataEdge to ai  analytics from edge to cloud with efficient movement of machine data
Edge to ai analytics from edge to cloud with efficient movement of machine data
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Enterprise machine learning on k8s lessons learned and the road ahead
Enterprise machine learning on k8s   lessons learned and the road aheadEnterprise machine learning on k8s   lessons learned and the road ahead
Enterprise machine learning on k8s lessons learned and the road ahead
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 

Introducing Cloudera Data Science Workbench for HDP 2.12.19

  • 1. MACHINE LEARNING IN THE ENTERPRISE SAUMITRA BURAGOHAIN | VICE PRESIDENT, PRODUCT MANAGEMENT VIDYA RAMAN | DIRECTOR, PRODUCT MANAGEMENT MICHAEL GREGORY, MACHINE LEARNING FIELD ENGINEERING LEAD
  • 2. 2 © Cloudera, Inc. All rights reserved. The Industry’s First Enterprise Data Cloud From the Edge to AI Data Engineering Data Warehouse IoT, Ingest & Streaming AI, ML & Data Science Enterprise Data Cloud
  • 3. 3© Cloudera, Inc. All rights reserved. OUR APPROACH TO MACHINE LEARNING Modern enterprise platform, tools and expert guidance to help you unlock business value with ML/AI Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  • 4. 4 © Cloudera, Inc. All rights reserved. PLATFORM
  • 5. 5 © Cloudera, Inc. All rights reserved. Infinitely Scalable (Billions of files, Exabytes) Low TCO (Less Storage Overhead) BIGGER TRUSTED Data Swamp->Data Lake SMARTER Deep Learning frameworks (TensorFlow, Caffe) GPU Pooling/Isolation Faster time to deployment (Containerized Micro-Services) FASTER App 1 App 2 S3, ADLS/WASB, GCS with Truly Incremental Replication HYBRID HDP MARKET DRIVERS
  • 6. 6 © Cloudera, Inc. All rights reserved. DATA PLATFORM: HDP Enterprise Grade Data Management: Hybrid, Secure and Scalable ● On-Premises and Multi-Cloud ● Enterprise Security & Data Lineage ● Choice of Deep Learning Support (Co-locate TensorFlow/GPU/Data) ● Dockerized for Packaging/Isolation
  • 7. 7 © Cloudera, Inc. All rights reserved. THE CHALLENGE Balance these needs DATA SCIENCE • Access to granular data • Flexibility • Preferred open source tools • Elastic provisioning • Compute • Storage • Reproducible research • Path to production DATA MANAGEMENT • Security • Governance • Standards • Low maintenance • Low cost • Self-service access
  • 8. 8 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH
  • 9. 9 © Cloudera, Inc. All rights reserved. A PLATFORM FOR MACHINE LEARNING • Open platform • Complete lifecycle • Team collaboration • Enterprise ready • Runs anywhere RESEARCH | PRODUCTION LOCAL | SPARK | HIVE DEPLOYMENT COMPUTE OPEN SOURCE ECOSYSTEMALGORITHMS SELF-SERVICE TOOLS SOLUTIONS | USE CASESAPPS CLOUD ON-PREMISES ADLSS3 HDFS KUDU CATALOG | SECURITY | GOVERNANCE SHARED CONTEXT
  • 10. 10 © Cloudera, Inc. All rights reserved. THE TYPICAL SOLUTION “If I can’t use my favorite tools, I’ll…” • Copy data to my laptop • Copy data to a data science appliance • Copy data to a cloud service Why this is a problem: • Complicates security • Breaks data governance • Adds latency to process • Makes collaboration more difficult • Complicates model management and deployment • Creates infrastructure silos
  • 11. 11 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Accelerate Machine Learning from Research to Production For data scientists • Experiment faster Use R, Python, or Scala with on- demand compute and secure CDH data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  • 12. 12 © Cloudera, Inc. All rights reserved. A MODERN DATA SCIENCE ARCHITECTURE Containerized environments with scalable, on-demand compute • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Schedule & share GPU resources • Run Spark, Impala, and other CDH services • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages Ranger for Security and Atlas for Governance HDP HDP Ambari gateway node(s) HDP nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine
  • 13. 13 © Cloudera, Inc. All rights reserved. ACCELERATED DEEP LEARNING WITH GPUS Multi-tenant GPU support on-premises or cloud • Extend CDSW to deep learning • Schedule & share GPU resources • Train on GPUs, deploy on CPUs • Works on-premises or cloud CDSW GPUCPU HDP CPU HDP single-node training distributed training, scoring “Our data scientists want GPUs, but we need multi-tenancy. If they go to the cloud on their own, it’s expensive and we lose governance.” GPU Available on HDP 3.x GPU
  • 14. 14 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Accelerate and simplify machine learning from research to production ANALYZE DATA • Explore data securely and share insights with the team TRAIN MODELS • Run, track, and compare reproducible experiments DEPLOY APIs • Deploy and monitor models as APIs to serve predictions MANAGE SHARED RESOURCES • Provide a secure, collaborative, self-service platform for your data science teams
  • 15. 15 © Cloudera, Inc. All rights reserved. INTRODUCING EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can now... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  • 16. 16 © Cloudera, Inc. All rights reserved. INTRODUCING MODELS Machine learning models as one-click microservices (REST APIs) 1. Choose file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources 4. Deploy! Running model containers also have access to CDH for data lookups.
  • 17. 17 © Cloudera, Inc. All rights reserved. MODEL MANAGEMENT View, test, monitor, and update models by team or project
  • 18. 18 © Cloudera, Inc. All rights reserved. DEMO
  • 19. 19 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH 1.5 on HDP 3.1.0 & 2.6.5 Accelerate and simplify machine learning from research to production in HDP install base
  • 20. © Cloudera, Inc. All rights reserved. 20 CDSW TRAINING • Self-paced video training • Learn end-to-end ML workflows in CDSW • Code examples • Python and R tracks • Customers can register at university.cloudera.com
  • 22. 22 © Cloudera, Inc. All rights reserved. DISCLAIMER The information in this document is proprietary to Cloudera. No part of this document may be reproduced, copied or transmitted in any form for any purpose without the express prior written permission of Cloudera. This document is a preliminary version and not subject to your license agreement or any other agreement with Cloudera. This document contains only intended strategies, developments and functionalities of Cloudera products and is not intended to be binding upon Cloudera to any particular course of business, product strategy and/or development. Please note that this document is subject to change and may be changed by Cloudera at any time without notice. Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials. The limitation shall not apply in cases of gross negligence.
  • 23. 23 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT CLOUDERA Our philosophy We empower our customers to run their business on data with an open platform: ● Your data ● Open algorithms ● Running anywhere We accelerate enterprise data science.