The slide deck I presented at NYC Big Data Meetup just before Strata + Hadoop World 2015. It goes into details on what's different about running Hadoop in the cloud, main use case and some lessons learned from working with customers.
Optimizing AI for immediate response in Smart CCTV
ย
One Hadoop, Multiple Clouds - NYC Big Data Meetup
1. 1ยฉ Cloudera, Inc. All rights reserved.
One Hadoop, Multiple Clouds
Andrei Savu | Tech Lead, Cloudera Director
2. 2ยฉ Cloudera, Inc. All rights reserved.
About me
Tech Lead on Cloudera Director
Previously founder of axemblr.com
Contributed to Apache Whirr (PMC) & jclouds.
Twitter: https://twitter.com/andreisavu
LinkedIn: https://www.linkedin.com/in/sandrei
3. 3ยฉ Cloudera, Inc. All rights reserved.
Cloudera Director
cloudera.com/director
Deploy and manage
enterprise-grade
Hadoop in the cloud
AWS & Google Cloud
Extensible via plugins
5. 5ยฉ Cloudera, Inc. All rights reserved.
Do you use a public or
private cloud?
How do you run and
manage Hadoop?
6. 6ยฉ Cloudera, Inc. All rights reserved.
What is this talk
about?
State of the World
Architectural Patterns
Imagine the Future
7. 7ยฉ Cloudera, Inc. All rights reserved.
Gartner's 2015 Hype
Cycle for Emerging
Technologies (source)
Advanced Analytics
Hybrid Cloud
Internet of Things
8. 8ยฉ Cloudera, Inc. All rights reserved.
Hybrid Clouds
Cloud Exchange
Application Portability
Private-Public
Public-Public
9. 9ยฉ Cloudera, Inc. All rights reserved.
Cloud Wars
AWS
Microsoft Azure
Google Cloud
VMWare
Openstack
etc.
11. 11ยฉ Cloudera, Inc. All rights reserved.
Hadoop Environments
On-Premise versus Cloud
On-Premise Cloud
Storage Direct Attached Direct Attached or Object Store
Data Not shared across clusters Shared across multiple clusters
Sizing Fixed-size Dynamic based on load
Usage Model All users share cluster Clusters created as needed for apps/users
Resource Management (YARN)
HDFS
Process Discover Model Serve
Industry Standard Servers
(CPU, Memory, & Direct Attached Storage)
Resource Management (YARN)
HDFS
Process Discover Model Serve
Industry Standard Servers
(CPU & Memory)
Object
Storage
12. 12ยฉ Cloudera, Inc. All rights reserved.
Cloud providers
shipping distributions
of Hadoop
Integration
Unlock Query Engines
Migration workloads
Is that a sustainable
advantage? Or just a
temporary stop gap?
13. 13ยฉ Cloudera, Inc. All rights reserved.
Maturity level
On-prem vs. Cloud
Monitoring
Dev / Test / Prod
Availability
Durability
14. 14ยฉ Cloudera, Inc. All rights reserved.
Common Architectural Patterns in the Cloud
Object Storage
Source Data Seed Data Backup/DR
ETL/MODELING
(Spark, MapReduce)
โข Short-running clusters
โข Elastic workload
โข No local storage
necessary
|WASB |SWIFT |BLOB
โข Long-running clusters
โข Sized to demand
โข Some local storage
BI/ANALYTICS
(Impala, Solr)
โข Fixed clusters
โข Periodic sync
โข Default to local
storage
APP DELIVERY
(HBase, Kudu)
15. 15ยฉ Cloudera, Inc. All rights reserved.
Cluster lifecycle
management
Create / Terminate
Discovery
Metadata
Monitoring
16. 16ยฉ Cloudera, Inc. All rights reserved.
Work Queue
Workflows
Dispatch
Tracking
Decoupled
Fault Tolerant
17. 17ยฉ Cloudera, Inc. All rights reserved.
Common Architectural Patterns in the Cloud
Object Storage
Source Data Seed Data Backup/DR
ETL/MODELING
(Spark, MapReduce)
โข Short-running clusters
โข Elastic workload
โข No local storage
necessary
|WASB |SWIFT |BLOB
โข Long-running clusters
โข Sized to demand
โข Some local storage
BI/ANALYTICS
(Impala, Solr)
โข Fixed clusters
โข Periodic sync
โข Default to local
storage
APP DELIVERY
(HBase, Kudu)
18. 18ยฉ Cloudera, Inc. All rights reserved.
Multi-user
Secure
Isolated
Friendly
19. 19ยฉ Cloudera, Inc. All rights reserved.
Elastic
Grow or shrink
Business hours
Number of users
Storage vs. Compute
Cost efficient
20. 20ยฉ Cloudera, Inc. All rights reserved.
Common Architectural Patterns in the Cloud
Object Storage
Source Data Seed Data Backup/DR
ETL/MODELING
(Spark, MapReduce)
โข Short-running clusters
โข Elastic workload
โข No local storage
necessary
|WASB |SWIFT |BLOB
โข Long-running clusters
โข Sized to demand
โข Some local storage
BI/ANALYTICS
(Impala, Solr)
โข Fixed clusters
โข Periodic sync
โข Default to local
storage
APP DELIVERY
(HBase, Kudu)
21. 21ยฉ Cloudera, Inc. All rights reserved.
Advanced Monitoring
Latency
Resource utilization
Consistent performance
22. 22ยฉ Cloudera, Inc. All rights reserved.
High availability and
failure domains
Data durability
Repair within SLA
Host-to-instance
23. 23ยฉ Cloudera, Inc. All rights reserved.
Backup and disaster
recovery
Object store centric
Active-Standby
24. 24ยฉ Cloudera, Inc. All rights reserved.
Imagine the Future
Portable Experience
Self-service
Self-healing
Granular Security
Advanced Governance
Complete Management
Whatโs your vision?
27. 27ยฉ Cloudera, Inc. All rights reserved.
Resources
Cloudera Director: http://www.cloudera.com/director
Interested in API level integration and scripting?
https://github.com/cloudera/director-sdk
https://github.com/cloudera/director-scripts
Interested in integration with another cloud platform?
https://github.com/cloudera/director-spi
https://github.com/cloudera/director-google-plugin
28. 28ยฉ Cloudera, Inc. All rights reserved.
Whatโs new in Cloudera Director 1.5?
http://blog.cloudera.com/blog/2015/08/whats-new-in-
cloudera-director-1-5/
Get Started
AWS Reference Guide
GCP Reference Guide
Try It Out
AWS Quickstart
Resources