1© Cloudera, Inc. All rights reserved.
One Hadoop, Multiple Clouds
Andrei Savu | Tech Lead, Cloudera Director
2© Cloudera, Inc. All rights reserved.
About me
Tech Lead on Cloudera Director
Previously founder of axemblr.com
Contributed to Apache Whirr (PMC) & jclouds.
Twitter: https://twitter.com/andreisavu
LinkedIn: https://www.linkedin.com/in/sandrei
3© Cloudera, Inc. All rights reserved.
Cloudera Director
cloudera.com/director
Deploy and manage
enterprise-grade
Hadoop in the cloud
AWS & Google Cloud
Extensible via plugins
Journey to the Cloud
5© Cloudera, Inc. All rights reserved.
Do you use a public or
private cloud?
How do you run and
manage Hadoop?
6© Cloudera, Inc. All rights reserved.
What is this talk
about?
State of the World
Architectural Patterns
Imagine the Future
7© Cloudera, Inc. All rights reserved.
Gartner's 2015 Hype
Cycle for Emerging
Technologies (source)
Advanced Analytics
Hybrid Cloud
Internet of Things
8© Cloudera, Inc. All rights reserved.
Hybrid Clouds
Cloud Exchange
Application Portability
Private-Public
Public-Public
9© Cloudera, Inc. All rights reserved.
Cloud Wars
AWS
Microsoft Azure
Google Cloud
VMWare
Openstack
etc.
10© Cloudera, Inc. All rights reserved.
Data has Mass and
Gravity
11© Cloudera, Inc. All rights reserved.
Hadoop Environments
On-Premise versus Cloud
On-Premise Cloud
Storage Direct Attached Direct Attached or Object Store
Data Not shared across clusters Shared across multiple clusters
Sizing Fixed-size Dynamic based on load
Usage Model All users share cluster Clusters created as needed for apps/users
Resource Management (YARN)
HDFS
Process Discover Model Serve
Industry Standard Servers
(CPU, Memory, & Direct Attached Storage)
Resource Management (YARN)
HDFS
Process Discover Model Serve
Industry Standard Servers
(CPU & Memory)
Object
Storage
12© Cloudera, Inc. All rights reserved.
Cloud providers
shipping distributions
of Hadoop
Integration
Unlock Query Engines
Migration workloads
Is that a sustainable
advantage? Or just a
temporary stop gap?
13© Cloudera, Inc. All rights reserved.
Maturity level
On-prem vs. Cloud
Monitoring
Dev / Test / Prod
Availability
Durability
14© Cloudera, Inc. All rights reserved.
Common Architectural Patterns in the Cloud
Object Storage
Source Data Seed Data Backup/DR
ETL/MODELING
(Spark, MapReduce)
• Short-running clusters
• Elastic workload
• No local storage
necessary
|WASB |SWIFT |BLOB
• Long-running clusters
• Sized to demand
• Some local storage
BI/ANALYTICS
(Impala, Solr)
• Fixed clusters
• Periodic sync
• Default to local
storage
APP DELIVERY
(HBase, Kudu)
15© Cloudera, Inc. All rights reserved.
Cluster lifecycle
management
Create / Terminate
Discovery
Metadata
Monitoring
16© Cloudera, Inc. All rights reserved.
Work Queue
Workflows
Dispatch
Tracking
Decoupled
Fault Tolerant
17© Cloudera, Inc. All rights reserved.
Common Architectural Patterns in the Cloud
Object Storage
Source Data Seed Data Backup/DR
ETL/MODELING
(Spark, MapReduce)
• Short-running clusters
• Elastic workload
• No local storage
necessary
|WASB |SWIFT |BLOB
• Long-running clusters
• Sized to demand
• Some local storage
BI/ANALYTICS
(Impala, Solr)
• Fixed clusters
• Periodic sync
• Default to local
storage
APP DELIVERY
(HBase, Kudu)
18© Cloudera, Inc. All rights reserved.
Multi-user
Secure
Isolated
Friendly
19© Cloudera, Inc. All rights reserved.
Elastic
Grow or shrink
Business hours
Number of users
Storage vs. Compute
Cost efficient
20© Cloudera, Inc. All rights reserved.
Common Architectural Patterns in the Cloud
Object Storage
Source Data Seed Data Backup/DR
ETL/MODELING
(Spark, MapReduce)
• Short-running clusters
• Elastic workload
• No local storage
necessary
|WASB |SWIFT |BLOB
• Long-running clusters
• Sized to demand
• Some local storage
BI/ANALYTICS
(Impala, Solr)
• Fixed clusters
• Periodic sync
• Default to local
storage
APP DELIVERY
(HBase, Kudu)
21© Cloudera, Inc. All rights reserved.
Advanced Monitoring
Latency
Resource utilization
Consistent performance
22© Cloudera, Inc. All rights reserved.
High availability and
failure domains
Data durability
Repair within SLA
Host-to-instance
23© Cloudera, Inc. All rights reserved.
Backup and disaster
recovery
Object store centric
Active-Standby
24© Cloudera, Inc. All rights reserved.
Imagine the Future
Portable Experience
Self-service
Self-healing
Granular Security
Advanced Governance
Complete Management
What’s your vision?
26© Cloudera, Inc. All rights reserved.
Thank you!
asavu@cloudera.com
27© Cloudera, Inc. All rights reserved.
Resources
Cloudera Director: http://www.cloudera.com/director
Interested in API level integration and scripting?
https://github.com/cloudera/director-sdk
https://github.com/cloudera/director-scripts
Interested in integration with another cloud platform?
https://github.com/cloudera/director-spi
https://github.com/cloudera/director-google-plugin
28© Cloudera, Inc. All rights reserved.
What’s new in Cloudera Director 1.5?
http://blog.cloudera.com/blog/2015/08/whats-new-in-
cloudera-director-1-5/
Get Started
AWS Reference Guide
GCP Reference Guide
Try It Out
AWS Quickstart
Resources
Cloudera Director
Screenshots
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
© 2014 Cloudera, Inc. All rights reserved.
45© Cloudera, Inc. All rights reserved.
Thank you!
asavu@cloudera.com

One Hadoop, Multiple Clouds

  • 1.
    1© Cloudera, Inc.All rights reserved. One Hadoop, Multiple Clouds Andrei Savu | Tech Lead, Cloudera Director
  • 2.
    2© Cloudera, Inc.All rights reserved. About me Tech Lead on Cloudera Director Previously founder of axemblr.com Contributed to Apache Whirr (PMC) & jclouds. Twitter: https://twitter.com/andreisavu LinkedIn: https://www.linkedin.com/in/sandrei
  • 3.
    3© Cloudera, Inc.All rights reserved. Cloudera Director cloudera.com/director Deploy and manage enterprise-grade Hadoop in the cloud AWS & Google Cloud Extensible via plugins
  • 4.
  • 5.
    5© Cloudera, Inc.All rights reserved. Do you use a public or private cloud? How do you run and manage Hadoop?
  • 6.
    6© Cloudera, Inc.All rights reserved. What is this talk about? State of the World Architectural Patterns Imagine the Future
  • 7.
    7© Cloudera, Inc.All rights reserved. Gartner's 2015 Hype Cycle for Emerging Technologies (source) Advanced Analytics Hybrid Cloud Internet of Things
  • 8.
    8© Cloudera, Inc.All rights reserved. Hybrid Clouds Cloud Exchange Application Portability Private-Public Public-Public
  • 9.
    9© Cloudera, Inc.All rights reserved. Cloud Wars AWS Microsoft Azure Google Cloud VMWare Openstack etc.
  • 10.
    10© Cloudera, Inc.All rights reserved. Data has Mass and Gravity
  • 11.
    11© Cloudera, Inc.All rights reserved. Hadoop Environments On-Premise versus Cloud On-Premise Cloud Storage Direct Attached Direct Attached or Object Store Data Not shared across clusters Shared across multiple clusters Sizing Fixed-size Dynamic based on load Usage Model All users share cluster Clusters created as needed for apps/users Resource Management (YARN) HDFS Process Discover Model Serve Industry Standard Servers (CPU, Memory, & Direct Attached Storage) Resource Management (YARN) HDFS Process Discover Model Serve Industry Standard Servers (CPU & Memory) Object Storage
  • 12.
    12© Cloudera, Inc.All rights reserved. Cloud providers shipping distributions of Hadoop Integration Unlock Query Engines Migration workloads Is that a sustainable advantage? Or just a temporary stop gap?
  • 13.
    13© Cloudera, Inc.All rights reserved. Maturity level On-prem vs. Cloud Monitoring Dev / Test / Prod Availability Durability
  • 14.
    14© Cloudera, Inc.All rights reserved. Common Architectural Patterns in the Cloud Object Storage Source Data Seed Data Backup/DR ETL/MODELING (Spark, MapReduce) • Short-running clusters • Elastic workload • No local storage necessary |WASB |SWIFT |BLOB • Long-running clusters • Sized to demand • Some local storage BI/ANALYTICS (Impala, Solr) • Fixed clusters • Periodic sync • Default to local storage APP DELIVERY (HBase, Kudu)
  • 15.
    15© Cloudera, Inc.All rights reserved. Cluster lifecycle management Create / Terminate Discovery Metadata Monitoring
  • 16.
    16© Cloudera, Inc.All rights reserved. Work Queue Workflows Dispatch Tracking Decoupled Fault Tolerant
  • 17.
    17© Cloudera, Inc.All rights reserved. Common Architectural Patterns in the Cloud Object Storage Source Data Seed Data Backup/DR ETL/MODELING (Spark, MapReduce) • Short-running clusters • Elastic workload • No local storage necessary |WASB |SWIFT |BLOB • Long-running clusters • Sized to demand • Some local storage BI/ANALYTICS (Impala, Solr) • Fixed clusters • Periodic sync • Default to local storage APP DELIVERY (HBase, Kudu)
  • 18.
    18© Cloudera, Inc.All rights reserved. Multi-user Secure Isolated Friendly
  • 19.
    19© Cloudera, Inc.All rights reserved. Elastic Grow or shrink Business hours Number of users Storage vs. Compute Cost efficient
  • 20.
    20© Cloudera, Inc.All rights reserved. Common Architectural Patterns in the Cloud Object Storage Source Data Seed Data Backup/DR ETL/MODELING (Spark, MapReduce) • Short-running clusters • Elastic workload • No local storage necessary |WASB |SWIFT |BLOB • Long-running clusters • Sized to demand • Some local storage BI/ANALYTICS (Impala, Solr) • Fixed clusters • Periodic sync • Default to local storage APP DELIVERY (HBase, Kudu)
  • 21.
    21© Cloudera, Inc.All rights reserved. Advanced Monitoring Latency Resource utilization Consistent performance
  • 22.
    22© Cloudera, Inc.All rights reserved. High availability and failure domains Data durability Repair within SLA Host-to-instance
  • 23.
    23© Cloudera, Inc.All rights reserved. Backup and disaster recovery Object store centric Active-Standby
  • 24.
    24© Cloudera, Inc.All rights reserved. Imagine the Future Portable Experience Self-service Self-healing Granular Security Advanced Governance Complete Management What’s your vision?
  • 26.
    26© Cloudera, Inc.All rights reserved. Thank you! asavu@cloudera.com
  • 27.
    27© Cloudera, Inc.All rights reserved. Resources Cloudera Director: http://www.cloudera.com/director Interested in API level integration and scripting? https://github.com/cloudera/director-sdk https://github.com/cloudera/director-scripts Interested in integration with another cloud platform? https://github.com/cloudera/director-spi https://github.com/cloudera/director-google-plugin
  • 28.
    28© Cloudera, Inc.All rights reserved. What’s new in Cloudera Director 1.5? http://blog.cloudera.com/blog/2015/08/whats-new-in- cloudera-director-1-5/ Get Started AWS Reference Guide GCP Reference Guide Try It Out AWS Quickstart Resources
  • 29.
  • 30.
    © 2014 Cloudera,Inc. All rights reserved.
  • 31.
    © 2014 Cloudera,Inc. All rights reserved.
  • 32.
    © 2014 Cloudera,Inc. All rights reserved.
  • 33.
    © 2014 Cloudera,Inc. All rights reserved.
  • 34.
    © 2014 Cloudera,Inc. All rights reserved.
  • 35.
    © 2014 Cloudera,Inc. All rights reserved.
  • 36.
    © 2014 Cloudera,Inc. All rights reserved.
  • 37.
    © 2014 Cloudera,Inc. All rights reserved.
  • 38.
    © 2014 Cloudera,Inc. All rights reserved.
  • 39.
    © 2014 Cloudera,Inc. All rights reserved.
  • 40.
    © 2014 Cloudera,Inc. All rights reserved.
  • 41.
    © 2014 Cloudera,Inc. All rights reserved.
  • 42.
    © 2014 Cloudera,Inc. All rights reserved.
  • 43.
    © 2014 Cloudera,Inc. All rights reserved.
  • 44.
    © 2014 Cloudera,Inc. All rights reserved.
  • 45.
    45© Cloudera, Inc.All rights reserved. Thank you! asavu@cloudera.com