SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Cloudera Data Science Workbench
Arun Pamulapati, Senior Systems Engineer
Cloudera Certified Developer and Administrator
Rich Stevens, Customer Account Manager
2© Cloudera, Inc. All rights reserved.
● The information in this document is proprietary to Cloudera. No part of this document may be
reproduced, copied or transmitted in any form for any purpose without the express prior written
permission of Cloudera.
● This document is a preliminary version and not subject to your license agreement or any other
agreement with Cloudera. This document contains only intended strategies, developments and
functionalities of Cloudera products and is not intended to be binding upon Cloudera to any particular
course of business, product strategy and/or development. Please note that this document is subject to
change and may be changed by Cloudera at any time without notice.
● Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not
warrant the accuracy or completeness of the information, text, graphics, links or other items contained
within this material. This document is provided without a warranty of any kind, either express or
implied, including but not limited to the implied warranties of merchantability, fitness for a particular
purpose or non-infringement.
● Cloudera shall have no liability for damages of any kind including without limitation direct, special,
indirect or consequential damages that may result from the use of these materials. The limitation shall
not apply in cases of gross negligence.
Disclaimer
3© Cloudera, Inc. All rights reserved.
● Introduction
● The state of the industry
● What it takes to succeed
● CDSW Architecture
● CDSW Demo
● CDH Stack
● Q&A
Agenda
4© Cloudera, Inc. All rights reserved.
Getting	to	know	you	…
How many of you have used any Hadoop projects before?
Used Hive or Impala / SQL on Hadoop?
Deployed one or more Hadoop applications into production?
How many of you would describe yourselves as a Data Scientist?
Deployed one or more Hadoop applications into sandbox environment?
What languages do you use?
• Python
• R/sparklyr
• Scala
• SparkMLlib
• H2O
• TensorFlow
5© Cloudera, Inc. All rights reserved.
Cost of compute
Data volume
Time
1950
s
1960
s
1970
s
1980
s
1990
s
2000
s
2010
s
Why machine learning is a hot topic today.
6© Cloudera, Inc. All rights reserved.
We are in the age of machine learning
Data has never been
more plentiful
Open source data science and
machine learning libraries are
rapidly evolving
Flexible commodity storage
and compute make scalable
production machine learning
affordable
Data Analytics Deployment
7© Cloudera, Inc. All rights reserved.
Driving	Board-Level	Value
Drive	Customer	
Insights/Know	your	
Customers
Lower	Business	
Risk/	Secure	your	
Business
• Market	Segmentation
• Customer	360
• Next	Best	Offer	
• Churn	Analysis
• Click-Stream	
• Risk	Modeling	&	Analysis
• Network	Threat	
Detection
• Credit	Risk	Assessment
• SPAM	Detection
8© Cloudera, Inc. All rights reserved.
But there are practical challenges
Most data science done
at small scale, individually,
and is difficult to replicate
Very few models
reach production
Teams have different,
conflicting requests for
languages & libraries
Data needs to move
across multiple
different systems
Data Analytics Deployment
9© Cloudera, Inc. All rights reserved.
Help more data scientists
use the power of Cloudera
Use a powerful, familiar
environment with direct access
to Cloudera data and compute
Data Scientist
Data Engineer
Make it easy and secure to
add new users, use cases
Offer secure self-service
analytics and a faster path to
production on common,
affordable infrastructure
Enterprise Architect
Hadoop Admin
Our goal: Open data science at enterprise scale
10© Cloudera, Inc. All rights reserved.
Shared: Data, Operations, Governance, Security, Metadata
Data Engineering Data Science Deployment
Data Wrangling
Visualization
and Analysis
Model Training
& Testing Batch Scoring
Online Scoring
Serving
Data GovernanceCuration
Processing
Acquisition
Reports, Dashboards
Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration
Support the complete data science workflow
From data to exploration to action
11© Cloudera, Inc. All rights reserved.
Data	science	at	enterprise	scale	requires	a	full	stack
• Support	 unlimited	data
• Provide	sufficient	tools	for	Analysts
• Provide	sufficient	tools	for
Data	Scientists	+	Data	Engineers	
• Enable	real-time	use	cases
• Provide	data	governance
• Provide	full-stack	security
• Deploy	in	the	cloud
• Integrate	with	partner	tools
• Be	easy	for	IT	to	deploy/maintain
• Accelerate	your	data	and	machine	
learning	strategy	with	expert	research	
and	advising
✓Hadoop
✓Impala,	Hive,	Hue
✓Spark,	Data	Science	Workbench
✓Kafka,	Spark	Streaming
✓Navigator	+	Partners
✓Kerberos,	 Sentry,	Record	Service,	KMS/KTS	
✓Cloudera	Director/	Altus	(PaaS)
✓Rich	Ecosystem
✓Cloudera	Manager	+	Director
✓Cloudera	Fast	Forward	Labs
12© Cloudera, Inc. All rights reserved.
Accelerates data science from
development to production with:
●Secure self-service data access
●On-demand compute
●Support for Python, R, and Scala
●Project dependency isolation for
multiple library versions
●Workflow automation, version
control, collaboration and sharing
Cloudera Data Science Workbench
Self-service data science for the enterprise
13© Cloudera, Inc. All rights reserved.
A modern data science architecture
CDH CDH
Cloudera Manager
gateway nodes CDH nodes
●Built on Docker and Kubernetes
●Runs on dedicated gateway nodes
●User sessions run in isolated
“engine” containers which:
○Host Kerberos-authenticated
Python/R/Scala runtimes
○Interact with Spark via YARN
client mode (Driver runs in
container, workers on CDH)
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
14© Cloudera, Inc. All rights reserved.
Demo
15© Cloudera, Inc. All rights reserved.
CDH	platform
OPERATIONS
Cloudera	Manager
Cloudera	Director
DATA	
MANAGEMENT
Cloudera	Navigator
Encrypt	and	KeyTrustee
Optimizer
STRUCTURED
Sqoop
UNSTRUCTURED
Kafka,	Flume
PROCESS,	ANALYZE,	SERVE
UNIFIED	SERVICES
RESOURCE	 MANAGEMENT
YARN
SECURITY
Sentry,	RecordService
STORE
INTEGRATE
BATCH
Spark,	Hive,	Pig	
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
OTHER
Kite
NoSQL
HBase
OTHER
Object	Store
FILESYSTEM
HDFS
RELATIONAL
Kudu
16© Cloudera, Inc. All rights reserved.
Cloudera Fast Forward Labs
Scientists and engineers Led by Hilary Mason (Fortune 40 under 40)
Our team blends research fluency with engineering skill and experience
building products at top technologycompanies.
17© Cloudera, Inc. All rights reserved.
Thank you
Arun Pamulapati
apamulapati@cloudera.com
678.524.3778
https://github.com/pamulapati/cdswdemo

More Related Content

What's hot

Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
Cloudera, Inc.
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera, Inc.
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in Churn
Cloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
Cloudera, Inc.
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
Cloudera, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Cloudera, Inc.
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Cloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
 

What's hot (20)

Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in Churn
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
 

Similar to NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench

Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
The Hive
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Timothy Spann
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
Cloudera, Inc.
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
DataWorks Summit
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
Timothy Spann
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Cloudera 5.3 Update
Cloudera 5.3 UpdateCloudera 5.3 Update
Cloudera 5.3 Update
Cloudera, Inc.
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
Josh Yeh
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
TheInevitableCloud
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Edge to ai analytics from edge to cloud with efficient movement of machine data
Edge to ai  analytics from edge to cloud with efficient movement of machine dataEdge to ai  analytics from edge to cloud with efficient movement of machine data
Edge to ai analytics from edge to cloud with efficient movement of machine data
Timothy Spann
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
DataWorks Summit
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
Cloudera, Inc.
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
ClouderaUserGroups
 

Similar to NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench (20)

Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Cloudera 5.3 Update
Cloudera 5.3 UpdateCloudera 5.3 Update
Cloudera 5.3 Update
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Edge to ai analytics from edge to cloud with efficient movement of machine data
Edge to ai  analytics from edge to cloud with efficient movement of machine dataEdge to ai  analytics from edge to cloud with efficient movement of machine data
Edge to ai analytics from edge to cloud with efficient movement of machine data
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 

More from NOVA DATASCIENCE

Nova Data Science Meetup 9-20-2017 Introduction
Nova Data Science Meetup 9-20-2017 IntroductionNova Data Science Meetup 9-20-2017 Introduction
Nova Data Science Meetup 9-20-2017 Introduction
NOVA DATASCIENCE
 
Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...
Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...
Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...
NOVA DATASCIENCE
 
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA DATASCIENCE
 
NOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpus
NOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpusNOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpus
NOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpus
NOVA DATASCIENCE
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA DATASCIENCE
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA DATASCIENCE
 

More from NOVA DATASCIENCE (6)

Nova Data Science Meetup 9-20-2017 Introduction
Nova Data Science Meetup 9-20-2017 IntroductionNova Data Science Meetup 9-20-2017 Introduction
Nova Data Science Meetup 9-20-2017 Introduction
 
Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...
Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...
Nova Data Science Meetup 9-20-2017 Presentation How AI Powers the Comcast X1 ...
 
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
 
NOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpus
NOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpusNOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpus
NOVA Data Science Meetup 5/10/2017 - Presentation Building a gigaword corpus
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
 

Recently uploaded

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench

  • 1. 1© Cloudera, Inc. All rights reserved. Cloudera Data Science Workbench Arun Pamulapati, Senior Systems Engineer Cloudera Certified Developer and Administrator Rich Stevens, Customer Account Manager
  • 2. 2© Cloudera, Inc. All rights reserved. ● The information in this document is proprietary to Cloudera. No part of this document may be reproduced, copied or transmitted in any form for any purpose without the express prior written permission of Cloudera. ● This document is a preliminary version and not subject to your license agreement or any other agreement with Cloudera. This document contains only intended strategies, developments and functionalities of Cloudera products and is not intended to be binding upon Cloudera to any particular course of business, product strategy and/or development. Please note that this document is subject to change and may be changed by Cloudera at any time without notice. ● Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. ● Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials. The limitation shall not apply in cases of gross negligence. Disclaimer
  • 3. 3© Cloudera, Inc. All rights reserved. ● Introduction ● The state of the industry ● What it takes to succeed ● CDSW Architecture ● CDSW Demo ● CDH Stack ● Q&A Agenda
  • 4. 4© Cloudera, Inc. All rights reserved. Getting to know you … How many of you have used any Hadoop projects before? Used Hive or Impala / SQL on Hadoop? Deployed one or more Hadoop applications into production? How many of you would describe yourselves as a Data Scientist? Deployed one or more Hadoop applications into sandbox environment? What languages do you use? • Python • R/sparklyr • Scala • SparkMLlib • H2O • TensorFlow
  • 5. 5© Cloudera, Inc. All rights reserved. Cost of compute Data volume Time 1950 s 1960 s 1970 s 1980 s 1990 s 2000 s 2010 s Why machine learning is a hot topic today.
  • 6. 6© Cloudera, Inc. All rights reserved. We are in the age of machine learning Data has never been more plentiful Open source data science and machine learning libraries are rapidly evolving Flexible commodity storage and compute make scalable production machine learning affordable Data Analytics Deployment
  • 7. 7© Cloudera, Inc. All rights reserved. Driving Board-Level Value Drive Customer Insights/Know your Customers Lower Business Risk/ Secure your Business • Market Segmentation • Customer 360 • Next Best Offer • Churn Analysis • Click-Stream • Risk Modeling & Analysis • Network Threat Detection • Credit Risk Assessment • SPAM Detection
  • 8. 8© Cloudera, Inc. All rights reserved. But there are practical challenges Most data science done at small scale, individually, and is difficult to replicate Very few models reach production Teams have different, conflicting requests for languages & libraries Data needs to move across multiple different systems Data Analytics Deployment
  • 9. 9© Cloudera, Inc. All rights reserved. Help more data scientists use the power of Cloudera Use a powerful, familiar environment with direct access to Cloudera data and compute Data Scientist Data Engineer Make it easy and secure to add new users, use cases Offer secure self-service analytics and a faster path to production on common, affordable infrastructure Enterprise Architect Hadoop Admin Our goal: Open data science at enterprise scale
  • 10. 10© Cloudera, Inc. All rights reserved. Shared: Data, Operations, Governance, Security, Metadata Data Engineering Data Science Deployment Data Wrangling Visualization and Analysis Model Training & Testing Batch Scoring Online Scoring Serving Data GovernanceCuration Processing Acquisition Reports, Dashboards Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration Support the complete data science workflow From data to exploration to action
  • 11. 11© Cloudera, Inc. All rights reserved. Data science at enterprise scale requires a full stack • Support unlimited data • Provide sufficient tools for Analysts • Provide sufficient tools for Data Scientists + Data Engineers • Enable real-time use cases • Provide data governance • Provide full-stack security • Deploy in the cloud • Integrate with partner tools • Be easy for IT to deploy/maintain • Accelerate your data and machine learning strategy with expert research and advising ✓Hadoop ✓Impala, Hive, Hue ✓Spark, Data Science Workbench ✓Kafka, Spark Streaming ✓Navigator + Partners ✓Kerberos, Sentry, Record Service, KMS/KTS ✓Cloudera Director/ Altus (PaaS) ✓Rich Ecosystem ✓Cloudera Manager + Director ✓Cloudera Fast Forward Labs
  • 12. 12© Cloudera, Inc. All rights reserved. Accelerates data science from development to production with: ●Secure self-service data access ●On-demand compute ●Support for Python, R, and Scala ●Project dependency isolation for multiple library versions ●Workflow automation, version control, collaboration and sharing Cloudera Data Science Workbench Self-service data science for the enterprise
  • 13. 13© Cloudera, Inc. All rights reserved. A modern data science architecture CDH CDH Cloudera Manager gateway nodes CDH nodes ●Built on Docker and Kubernetes ●Runs on dedicated gateway nodes ●User sessions run in isolated “engine” containers which: ○Host Kerberos-authenticated Python/R/Scala runtimes ○Interact with Spark via YARN client mode (Driver runs in container, workers on CDH) Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine
  • 14. 14© Cloudera, Inc. All rights reserved. Demo
  • 15. 15© Cloudera, Inc. All rights reserved. CDH platform OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr OTHER Kite NoSQL HBase OTHER Object Store FILESYSTEM HDFS RELATIONAL Kudu
  • 16. 16© Cloudera, Inc. All rights reserved. Cloudera Fast Forward Labs Scientists and engineers Led by Hilary Mason (Fortune 40 under 40) Our team blends research fluency with engineering skill and experience building products at top technologycompanies.
  • 17. 17© Cloudera, Inc. All rights reserved. Thank you Arun Pamulapati apamulapati@cloudera.com 678.524.3778 https://github.com/pamulapati/cdswdemo