SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Cloudera & The Cloud
2© Cloudera, Inc. All rights reserved.
Explosion of data and devices (IoT)
30B
connected
devices
440x
more
data
Transformation of IT infrastructure
open
source
cloud
machine
learning
$200B
total
market1
1 IDC Worldwide Big Data and Business Analytics Market Through 2020
The data-driven enterprise
3© Cloudera, Inc. All rights reserved.
The Big Shift
In 2017
58% on-premises
11% private cloud
25% public cloud
Source: 451 Research, Voice of the
Enterprise: Workloads and Key Projects,
Cloud Transformation, 2017.
By 2019
38% on-premises
15% private cloud
41% public cloud
4© Cloudera, Inc. All rights reserved.
My organization
is moving to the cloud,
why should we
consider Cloudera?
5© Cloudera, Inc. All rights reserved.
Current State
6© Cloudera, Inc. All rights reserved.
Instant, self-service
access to data and
resources
Application performance
Job-oriented tools
Choice
Secure, controlled
provisioning
Predictable costs
Systems-oriented tools
Standards and portability
KNOWLEDGE WORKERS INFRASTRUCTURE TEAM
Stakeholders
Advance strategic
initiatives
Link analytics to business
Reduce admin burden
Integrated solutions
DATA TEAM
7© Cloudera, Inc. All rights reserved.
–+
• Speed of deployment
• Tenant isolation
• Self-service
• Workload elasticity
• Shared storage
• Pay-as-you-go
• Bring your own tools
• Bring your own data
• Disjointed services
• Too many data copies
• Multiple security frameworks
• Difficult to troubleshoot workloads
• No shared metadata
• Unable to track data lineage
• Few on-premises integration services
• Proprietary services
• Cloud lock-in
CLOUD
BENEFITS
CLOUD
PROBLEMS
8© Cloudera, Inc. All rights reserved.
Deployment options
Bare Metal Private Cloud Cloud IaaS Cloud PaaS
Applications Applications Applications Applications
Clusters Clusters Clusters Clusters
Operating System Operating System Operating System Operating System
Network Network Network Network
Storage Storage Storage Storage
Servers Servers Servers Servers
Customer managed Vendor managed
On-premises
9© Cloudera, Inc. All rights reserved.
The Answer
10© Cloudera, Inc. All rights reserved.
● The modern platform for machine
learning and analytics
● with multiple deployment options
● and one shared data experience
11© Cloudera, Inc. All rights reserved.
The modern platform for ML and Analytics...
DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
DATA PROCESSING
• Cost efficient
• Reliable
• Scalable
• Based on Spark,
MapReduce,
Hive & Pig
• Supported by
Workload
Analytics
FAST BI & SQL
• Flexibility
• Elastic scale
• Go beyond SQL
• Based on
Impala & Hive
• SQL dev enviro
• Supported by
Workload
Analytics
MACHINE LEARNING
• Fast dev to
production
• Secure self-
serve
• Based on
Python, R, and
Spark
• ML dev
environment
(CDSW)
ONLINE & REAL-TIME
• High throughput,
low latency
• Strongly
consistent
• Based on
Hbase, Kudu
& Spark
streaming
12© Cloudera, Inc. All rights reserved.
...With multiple deployment options...
OPERATIONAL
DATABASE
DATA
SCIENCE
ANALYTIC
DATABASE
DATA
ENGINEERING
DATA
ENGINEERING
ANALYTIC
DATABASE
PRIVATE CLOUDBARE METAL INFRASTRUCTURE SERVICES
(in beta)
SDX in EDH clusters SDX Reference Architecture
Altus SDX (in beta)
Via Cloudera Manager and Director
via Altus PaaS
13© Cloudera, Inc. All rights reserved.
• Shared catalog
• Unified security
• Consistent governance
• Easy workload management
• Flexible ingest and replication
...And one Shared Data Experience
14© Cloudera, Inc. All rights reserved. 14
The modern platform for machine learning and analytics optimized for the cloud
DATA CATALOG
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
EXTENSIBLE
SERVICES
CORE
SERVICES DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
AWS S3 AWS EBS HDFS KUDU
STORAGE
SERVICES
Cloudera Enterprise
PRIVATE CLOUDBARE METAL INFRASTRUCTURE
DEPLOYMENT
OPTIONS SERVICES
15© Cloudera, Inc. All rights reserved.
"Better to bet on cloud providers
for infrastructure, Cloudera for data,
analytics and security fabric, and
leave the rest to the ecosystem"
16© Cloudera, Inc. All rights reserved.
$
• 2-4x storage savings, faster ramp
• Lower risk of data breach
• Analysts more productive on jobs
• Safe self-service, no shadow IT
• Less effort for audits/compliance
• IT more strategic, less admin time
• Deployment choices
• Portability and price arbitrage
• No proprietary lock-in
• Eliminate data copies
• Single security framework
• Easy to troubleshoot workloads
• Universally shared metadata
• Easy to track data lineage
• Unified services
• Same solution as on-premises
• Same solution in multi-cloud
• Open source standards
+
CLOUDERA
ADVANTAGES
BUSINESS
VALUE
17© Cloudera, Inc. All rights reserved.
Cloud Customer Successes
● Manufacturing Customer
● Analytic DB, Data Science &
Engineering on AWS
“We can deliver all of the data with less compute and much
less complexity.”
- Joe Smith, ACME Manufacturing
● Advisory and tech for finserv
● Cloudera EDH on AWS
“Cloudera gives us the best of both worlds—the ability to
capture and process thousands of critical events at scale and
the option to deploy on prem, in the cloud, or in a hybrid
architecture. And running Cloudera on AWS enables us to
leverage a world-class, reliable cloud infrastructure that
supports our changing business needs and the needs of our
customers.”
- Kaushik Deka, CTO and Director of Engineering
18© Cloudera, Inc. All rights reserved.
1. Learn Cloudera product offerings
2. Deploy using AWS Quickstart template
3. Leverage Professional Services
4. Optimize based on changing business need
Call to Action
19© Cloudera, Inc. All rights reserved.
Best practices for running your
Cloudera Enterprise cluster on AWS
20© Cloudera, Inc. All rights reserved.
Quick AWS Component Review
EC2 - servers
S3 - object storage
EBS - block storage
RDS - we still need RDBMS
VPC - everything needs network
21© Cloudera, Inc. All rights reserved.
AWS Service Limits
Default limits on most AWS services (most can be increased)
• EC2 - ten (10) m4.4xlarge instances per region
• EBS - 20 TB of Throughput Optimized HDD (st1) per region
• S3 - 100 buckets per account
• VPC - 5 VPC per region
Defaults might limit your cluster, so plan ahead!
22© Cloudera, Inc. All rights reserved.
Deployment
Topology
22© Cloudera, Inc. All rights reserved.
23© Cloudera, Inc. All rights reserved.
Deployment Topology
Two types of deployments from network perspective:
• Public Subnet - direct access to Internet & AWS services
• Private Subnet - instances must go through a VPC endpoints to
reach AWS services & NAT instances for Internet
If you require high-bandwidth access to data sources on Internet or
AWS services in another region, deploy to a public subnet.
Public doesn’t mean open to world. Limit access using Security
Groups.
24© Cloudera, Inc. All rights reserved.
Deployment Topology (Public Subnet)
25© Cloudera, Inc. All rights reserved.
Deployment Topology (Private Subnet)
26© Cloudera, Inc. All rights reserved.
A Word About Edge Nodes
Edge nodes:
• Have direct access to cluster
• Users user these nodes via client applications
• Web applications, BI tools, or just Hadoop command-line client
Avoid direct user access to the cluster!
27© Cloudera, Inc. All rights reserved.
Deployment Topology with Edge Nodes
28© Cloudera, Inc. All rights reserved.
Deployment Topology with Edge Nodes
29© Cloudera, Inc. All rights reserved.
Roles and
Instance
Types
29© Cloudera, Inc. All rights reserved.
30© Cloudera, Inc. All rights reserved.
So you want a cluster, eh?
31© Cloudera, Inc. All rights reserved.
Deployment Model
Two types of deployments:
• Persistent - long-lived, always on, no spin-up time
• Temporary - short-lived, can be started/stopped to save money
Impacts job setup time, instance types, storage method, and cost.
32© Cloudera, Inc. All rights reserved.
Instance Types
Ephemeral
• have locally attached storage, HDD or SSD
• on instance termination, data is irrecoverable
EBS-only
• no local storage, must mount EBS volumes
• on instance termination, data is safe in EBS
33© Cloudera, Inc. All rights reserved.
Networking
Connectivity
Security
33© Cloudera, Inc. All rights reserved.
34© Cloudera, Inc. All rights reserved.
Storage Configuration
Three types of storage:
• Instance Storage
• Elastic Block Storage (EBS)
• Object Storage (S3)
35© Cloudera, Inc. All rights reserved.
Storage Configuration - Instance Storage
• Attached to EC2 instances, like physical disks on physical server
• Lifetime of storage == Lifetime of EC2 instance
• Each EC2 instance has different amounts of storage
c1.xlarge has 4 x 420 GB
d2.8xlarge has 24 x 2 TB
• For HDFS data directories, we like HDD instance storage
• Backup planning -- multi-instance shutdowns, multi-VM AWS events
36© Cloudera, Inc. All rights reserved.
Storage Configuration - EBS
• Persistent block level storage volumes
• Can be encrypted at rest w/ negligible impact to latency/throughput
• OS: General Purpose (gp2)
• DFS: Throughput Optimized HDD (st1), Cold HDD (sc1)
• Baseline & burst performance increase with size of provisioned
volume
e.g. 500 GB st1 baseline throughput 20 MB/s; 1000 GB → 40 MB/s
37© Cloudera, Inc. All rights reserved.
Storage Configuration - EBS Recommendations
Instance selection
• Use EBS-optimized instances OR instances with 10Gb+ network
• Minimum dedicated EBS bandwidth of 1000 Mb/s (125 MB/s)
Volume selection
• Baseline performance, 40 MB/s or better (1000 GB st1, 3200 GB
sc1)
• Do not exceed instance’s dedicated EBS bandwidth!
38© Cloudera, Inc. All rights reserved.
Storage Configuration - S3
• Great for cold backup: durable, available, inexpensive
• For hot backup, use a second HDFS cluster
• Hive and Spark can also use S3 directly
• Standard data operations can read from & write to S3 buckets
39© Cloudera, Inc. All rights reserved.
Storage Configuration
Three types of storage:
• Instance Storage
• Elastic Block Storage (EBS)
• Object Storage (S3)
• Root Device
40© Cloudera, Inc. All rights reserved.
Storage Configuration - Root Device
• For operating system and logs
• Use EBS gp2 volumes as root devices
• At least 500 GB for OS, CDH software, and logs
• Do not use instance storage for the root device!
41© Cloudera, Inc. All rights reserved. 41© Cloudera, Inc. All rights reserved.
Capacity
Planning
42© Cloudera, Inc. All rights reserved.
Capacity Planning
• AWS makes expansion easy; advance planning makes things easier
• Consider workloads: how much storage vs compute? balanced?
• Consider data replication (3x), growth, retention
• Low storage density, r3.8xlarge or c4.8xlarge provide less storage
but higher compute and memory
• High storage density, d2.8xlarge offers 48 TB per instance with a
good amount of compute and memory
43© Cloudera, Inc. All rights reserved.
Cloudera Enterprise Hardware Requirements Guide
tiny.cloudera.com/hw-reqs
44© Cloudera, Inc. All rights reserved. 44© Cloudera, Inc. All rights reserved.
Provisioning
Instances
45© Cloudera, Inc. All rights reserved.
Provisioning Instances
• Cloudera Director automates most things
• Manual via EC2 command-line API tool or AWS management
console
• Don’t forget your databases (either RDS or self-managed)
• Cloudera Altus
46© Cloudera, Inc. All rights reserved.
Provisioning Instances
No matter which route you take...
• root device: 500 GB+ gp2 EBS volume
• master metadata: ephemeral or recommended gp2 EBS volumes
• DFS data: ephemeral or recommended st1/sc1 EBS volumes
• use tags to indicate the role instances/volumes will play
47© Cloudera, Inc. All rights reserved.
Thank You

More Related Content

What's hot

Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera, Inc.
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
Cloudera, Inc.
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...
Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
Cloudera, Inc.
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Cloudera, Inc.
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
Cloudera, Inc.
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
Cloudera, Inc.
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Cloudera, Inc.
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
Cloudera, Inc.
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
Cloudera, Inc.
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera, Inc.
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Cloudera, Inc.
 

What's hot (20)

Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 

Similar to Big data journey to the cloud 5.30.18 asher bartch

A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Cloudera, Inc.
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera, Inc.
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
Adam Doyle
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
EDB
 
101_Customer_Move and Modernize Siebel_07012021.pptx
101_Customer_Move and Modernize Siebel_07012021.pptx101_Customer_Move and Modernize Siebel_07012021.pptx
101_Customer_Move and Modernize Siebel_07012021.pptx
BhagavathyPadmanabha1
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Cloudera, Inc.
 
Introducing Azure Arc
Introducing Azure ArcIntroducing Azure Arc
Introducing Azure Arc
Mohamed Wali
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
GoDataDriven
 
Leveraging Operational Data in the Cloud
Leveraging Operational Data in the CloudLeveraging Operational Data in the Cloud
Leveraging Operational Data in the Cloud
Inductive Automation
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
DataWorks Summit
 

Similar to Big data journey to the cloud 5.30.18 asher bartch (20)

A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
101_Customer_Move and Modernize Siebel_07012021.pptx
101_Customer_Move and Modernize Siebel_07012021.pptx101_Customer_Move and Modernize Siebel_07012021.pptx
101_Customer_Move and Modernize Siebel_07012021.pptx
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Introducing Azure Arc
Introducing Azure ArcIntroducing Azure Arc
Introducing Azure Arc
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
Leveraging Operational Data in the Cloud
Leveraging Operational Data in the CloudLeveraging Operational Data in the Cloud
Leveraging Operational Data in the Cloud
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
Cloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 

Big data journey to the cloud 5.30.18 asher bartch

  • 1. 1© Cloudera, Inc. All rights reserved. Cloudera & The Cloud
  • 2. 2© Cloudera, Inc. All rights reserved. Explosion of data and devices (IoT) 30B connected devices 440x more data Transformation of IT infrastructure open source cloud machine learning $200B total market1 1 IDC Worldwide Big Data and Business Analytics Market Through 2020 The data-driven enterprise
  • 3. 3© Cloudera, Inc. All rights reserved. The Big Shift In 2017 58% on-premises 11% private cloud 25% public cloud Source: 451 Research, Voice of the Enterprise: Workloads and Key Projects, Cloud Transformation, 2017. By 2019 38% on-premises 15% private cloud 41% public cloud
  • 4. 4© Cloudera, Inc. All rights reserved. My organization is moving to the cloud, why should we consider Cloudera?
  • 5. 5© Cloudera, Inc. All rights reserved. Current State
  • 6. 6© Cloudera, Inc. All rights reserved. Instant, self-service access to data and resources Application performance Job-oriented tools Choice Secure, controlled provisioning Predictable costs Systems-oriented tools Standards and portability KNOWLEDGE WORKERS INFRASTRUCTURE TEAM Stakeholders Advance strategic initiatives Link analytics to business Reduce admin burden Integrated solutions DATA TEAM
  • 7. 7© Cloudera, Inc. All rights reserved. –+ • Speed of deployment • Tenant isolation • Self-service • Workload elasticity • Shared storage • Pay-as-you-go • Bring your own tools • Bring your own data • Disjointed services • Too many data copies • Multiple security frameworks • Difficult to troubleshoot workloads • No shared metadata • Unable to track data lineage • Few on-premises integration services • Proprietary services • Cloud lock-in CLOUD BENEFITS CLOUD PROBLEMS
  • 8. 8© Cloudera, Inc. All rights reserved. Deployment options Bare Metal Private Cloud Cloud IaaS Cloud PaaS Applications Applications Applications Applications Clusters Clusters Clusters Clusters Operating System Operating System Operating System Operating System Network Network Network Network Storage Storage Storage Storage Servers Servers Servers Servers Customer managed Vendor managed On-premises
  • 9. 9© Cloudera, Inc. All rights reserved. The Answer
  • 10. 10© Cloudera, Inc. All rights reserved. ● The modern platform for machine learning and analytics ● with multiple deployment options ● and one shared data experience
  • 11. 11© Cloudera, Inc. All rights reserved. The modern platform for ML and Analytics... DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE DATA PROCESSING • Cost efficient • Reliable • Scalable • Based on Spark, MapReduce, Hive & Pig • Supported by Workload Analytics FAST BI & SQL • Flexibility • Elastic scale • Go beyond SQL • Based on Impala & Hive • SQL dev enviro • Supported by Workload Analytics MACHINE LEARNING • Fast dev to production • Secure self- serve • Based on Python, R, and Spark • ML dev environment (CDSW) ONLINE & REAL-TIME • High throughput, low latency • Strongly consistent • Based on Hbase, Kudu & Spark streaming
  • 12. 12© Cloudera, Inc. All rights reserved. ...With multiple deployment options... OPERATIONAL DATABASE DATA SCIENCE ANALYTIC DATABASE DATA ENGINEERING DATA ENGINEERING ANALYTIC DATABASE PRIVATE CLOUDBARE METAL INFRASTRUCTURE SERVICES (in beta) SDX in EDH clusters SDX Reference Architecture Altus SDX (in beta) Via Cloudera Manager and Director via Altus PaaS
  • 13. 13© Cloudera, Inc. All rights reserved. • Shared catalog • Unified security • Consistent governance • Easy workload management • Flexible ingest and replication ...And one Shared Data Experience
  • 14. 14© Cloudera, Inc. All rights reserved. 14 The modern platform for machine learning and analytics optimized for the cloud DATA CATALOG SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE AWS S3 AWS EBS HDFS KUDU STORAGE SERVICES Cloudera Enterprise PRIVATE CLOUDBARE METAL INFRASTRUCTURE DEPLOYMENT OPTIONS SERVICES
  • 15. 15© Cloudera, Inc. All rights reserved. "Better to bet on cloud providers for infrastructure, Cloudera for data, analytics and security fabric, and leave the rest to the ecosystem"
  • 16. 16© Cloudera, Inc. All rights reserved. $ • 2-4x storage savings, faster ramp • Lower risk of data breach • Analysts more productive on jobs • Safe self-service, no shadow IT • Less effort for audits/compliance • IT more strategic, less admin time • Deployment choices • Portability and price arbitrage • No proprietary lock-in • Eliminate data copies • Single security framework • Easy to troubleshoot workloads • Universally shared metadata • Easy to track data lineage • Unified services • Same solution as on-premises • Same solution in multi-cloud • Open source standards + CLOUDERA ADVANTAGES BUSINESS VALUE
  • 17. 17© Cloudera, Inc. All rights reserved. Cloud Customer Successes ● Manufacturing Customer ● Analytic DB, Data Science & Engineering on AWS “We can deliver all of the data with less compute and much less complexity.” - Joe Smith, ACME Manufacturing ● Advisory and tech for finserv ● Cloudera EDH on AWS “Cloudera gives us the best of both worlds—the ability to capture and process thousands of critical events at scale and the option to deploy on prem, in the cloud, or in a hybrid architecture. And running Cloudera on AWS enables us to leverage a world-class, reliable cloud infrastructure that supports our changing business needs and the needs of our customers.” - Kaushik Deka, CTO and Director of Engineering
  • 18. 18© Cloudera, Inc. All rights reserved. 1. Learn Cloudera product offerings 2. Deploy using AWS Quickstart template 3. Leverage Professional Services 4. Optimize based on changing business need Call to Action
  • 19. 19© Cloudera, Inc. All rights reserved. Best practices for running your Cloudera Enterprise cluster on AWS
  • 20. 20© Cloudera, Inc. All rights reserved. Quick AWS Component Review EC2 - servers S3 - object storage EBS - block storage RDS - we still need RDBMS VPC - everything needs network
  • 21. 21© Cloudera, Inc. All rights reserved. AWS Service Limits Default limits on most AWS services (most can be increased) • EC2 - ten (10) m4.4xlarge instances per region • EBS - 20 TB of Throughput Optimized HDD (st1) per region • S3 - 100 buckets per account • VPC - 5 VPC per region Defaults might limit your cluster, so plan ahead!
  • 22. 22© Cloudera, Inc. All rights reserved. Deployment Topology 22© Cloudera, Inc. All rights reserved.
  • 23. 23© Cloudera, Inc. All rights reserved. Deployment Topology Two types of deployments from network perspective: • Public Subnet - direct access to Internet & AWS services • Private Subnet - instances must go through a VPC endpoints to reach AWS services & NAT instances for Internet If you require high-bandwidth access to data sources on Internet or AWS services in another region, deploy to a public subnet. Public doesn’t mean open to world. Limit access using Security Groups.
  • 24. 24© Cloudera, Inc. All rights reserved. Deployment Topology (Public Subnet)
  • 25. 25© Cloudera, Inc. All rights reserved. Deployment Topology (Private Subnet)
  • 26. 26© Cloudera, Inc. All rights reserved. A Word About Edge Nodes Edge nodes: • Have direct access to cluster • Users user these nodes via client applications • Web applications, BI tools, or just Hadoop command-line client Avoid direct user access to the cluster!
  • 27. 27© Cloudera, Inc. All rights reserved. Deployment Topology with Edge Nodes
  • 28. 28© Cloudera, Inc. All rights reserved. Deployment Topology with Edge Nodes
  • 29. 29© Cloudera, Inc. All rights reserved. Roles and Instance Types 29© Cloudera, Inc. All rights reserved.
  • 30. 30© Cloudera, Inc. All rights reserved. So you want a cluster, eh?
  • 31. 31© Cloudera, Inc. All rights reserved. Deployment Model Two types of deployments: • Persistent - long-lived, always on, no spin-up time • Temporary - short-lived, can be started/stopped to save money Impacts job setup time, instance types, storage method, and cost.
  • 32. 32© Cloudera, Inc. All rights reserved. Instance Types Ephemeral • have locally attached storage, HDD or SSD • on instance termination, data is irrecoverable EBS-only • no local storage, must mount EBS volumes • on instance termination, data is safe in EBS
  • 33. 33© Cloudera, Inc. All rights reserved. Networking Connectivity Security 33© Cloudera, Inc. All rights reserved.
  • 34. 34© Cloudera, Inc. All rights reserved. Storage Configuration Three types of storage: • Instance Storage • Elastic Block Storage (EBS) • Object Storage (S3)
  • 35. 35© Cloudera, Inc. All rights reserved. Storage Configuration - Instance Storage • Attached to EC2 instances, like physical disks on physical server • Lifetime of storage == Lifetime of EC2 instance • Each EC2 instance has different amounts of storage c1.xlarge has 4 x 420 GB d2.8xlarge has 24 x 2 TB • For HDFS data directories, we like HDD instance storage • Backup planning -- multi-instance shutdowns, multi-VM AWS events
  • 36. 36© Cloudera, Inc. All rights reserved. Storage Configuration - EBS • Persistent block level storage volumes • Can be encrypted at rest w/ negligible impact to latency/throughput • OS: General Purpose (gp2) • DFS: Throughput Optimized HDD (st1), Cold HDD (sc1) • Baseline & burst performance increase with size of provisioned volume e.g. 500 GB st1 baseline throughput 20 MB/s; 1000 GB → 40 MB/s
  • 37. 37© Cloudera, Inc. All rights reserved. Storage Configuration - EBS Recommendations Instance selection • Use EBS-optimized instances OR instances with 10Gb+ network • Minimum dedicated EBS bandwidth of 1000 Mb/s (125 MB/s) Volume selection • Baseline performance, 40 MB/s or better (1000 GB st1, 3200 GB sc1) • Do not exceed instance’s dedicated EBS bandwidth!
  • 38. 38© Cloudera, Inc. All rights reserved. Storage Configuration - S3 • Great for cold backup: durable, available, inexpensive • For hot backup, use a second HDFS cluster • Hive and Spark can also use S3 directly • Standard data operations can read from & write to S3 buckets
  • 39. 39© Cloudera, Inc. All rights reserved. Storage Configuration Three types of storage: • Instance Storage • Elastic Block Storage (EBS) • Object Storage (S3) • Root Device
  • 40. 40© Cloudera, Inc. All rights reserved. Storage Configuration - Root Device • For operating system and logs • Use EBS gp2 volumes as root devices • At least 500 GB for OS, CDH software, and logs • Do not use instance storage for the root device!
  • 41. 41© Cloudera, Inc. All rights reserved. 41© Cloudera, Inc. All rights reserved. Capacity Planning
  • 42. 42© Cloudera, Inc. All rights reserved. Capacity Planning • AWS makes expansion easy; advance planning makes things easier • Consider workloads: how much storage vs compute? balanced? • Consider data replication (3x), growth, retention • Low storage density, r3.8xlarge or c4.8xlarge provide less storage but higher compute and memory • High storage density, d2.8xlarge offers 48 TB per instance with a good amount of compute and memory
  • 43. 43© Cloudera, Inc. All rights reserved. Cloudera Enterprise Hardware Requirements Guide tiny.cloudera.com/hw-reqs
  • 44. 44© Cloudera, Inc. All rights reserved. 44© Cloudera, Inc. All rights reserved. Provisioning Instances
  • 45. 45© Cloudera, Inc. All rights reserved. Provisioning Instances • Cloudera Director automates most things • Manual via EC2 command-line API tool or AWS management console • Don’t forget your databases (either RDS or self-managed) • Cloudera Altus
  • 46. 46© Cloudera, Inc. All rights reserved. Provisioning Instances No matter which route you take... • root device: 500 GB+ gp2 EBS volume • master metadata: ephemeral or recommended gp2 EBS volumes • DFS data: ephemeral or recommended st1/sc1 EBS volumes • use tags to indicate the role instances/volumes will play
  • 47. 47© Cloudera, Inc. All rights reserved. Thank You