This deck covers key considerations and provides advice for enterprises looking to run production-scale Cloudera on AWS. We touch on everything from security to governance to selecting the right instance type for your Hadoop workload (Spark, Impala, Search, etc).
2. What We’re Going to Cover
• Hadoop in the cloud
• Architectural and access patterns
• Deployment and management
• Security and governance
• How to get started
8. Why AWS for Hadoop?
Immediate AvailabilityBroad & Deep
Capabilities
Scalable
Deploy the infrastructure you
need almost instantly without
long provisioning cycles.
Find everything you need to collect,
store, process, analyze and visualize
Big Data.
Scale from a few gigabytes to
several petabytes; and from a few
machines to thousands of nodes
with just a few clicks.
9. Global Footprint
Over 1 million active customers across 190 countries
1,700 government agencies
4,500 educational institutions
11 regions
30 availability zones
53 edge locations
Everyday, AWS adds enough new server capacity to support Amazon.com when it was a
$7 billion global enterprise.
Region
Edge Location
10. Administration
& Security
Access
Control
Identity
Management
Key Management
& Storage
Monitoring
& Logs
Resource &
Usage Auditing
Platform
Services
Analytics App Services Developer Tools & Operations Mobile Services
Data
Pipelines
Data
Warehouse
Hadoop
Real-time
Streaming Data
Application
Lifecycle
Management
Containers
Deployment
DevOps
Event-driven
Computing
Resource
Templates
Identity
Mobile
Analytics
Push
Notifications
Sync
App
Streaming
Email
Queuing &
Notifications
Search
Transcoding
Workflow
Core
Services
CDN
Compute
(VMs, Auto-scaling
& Load Balancing)
Databases
(Relational,
NoSQL, Caching)
Networking
(VPC, DX, DNS)
Storage
(Object, Block
and Archival)
Infrastructure
Availability
Zones
Points of
Presence
Regions
Enterprise
Applications
Business
Email
Sharing &
Collaboration
Virtual
Desktop
Technical &
Business Support
Account
Management
Partner
Ecosystem
Professional
Services
Security &
Pricing Reports
Solutions
Architects
Support
Training &
Certification
22. Familiar Security
Model
Validated and driven by
customers’ security experts
Benefits all customers
PEOPLE & PROCESS
SYSTEM
NETWORK
PHYSICAL
Security is Job Zero
23. AWS Foundation Services
Compute Storage Database Networking
AWS Global Infrastructure
Regions
Availability Zones
Edge Locations
Network
Security
Server
Security
Customer applications & content
You get to define
your controls IN
the Cloud
AWS takes care
of the security
OF the Cloud
You
AWS And You Share Responsibility for Security
Data
Security
Access
Control
26. Full visibility of your AWS environment
• CloudTrail will record access to API calls and save logs in your S3
buckets, no matter how those API calls were made
Who did what and when and from where (IP address)
• CloudTrail support for many AWS services and growing - includes
EC2, EBS, VPC, RDS, IAM and RedShift
• Easily Aggregate all log information
Monitoring: Get consistent visibility of logs
35. Cloudera on AWS Checklist
1. Use the right AWS instance types for the right Hadoop workload
2. Take a Dev/Ops approach to managing the Hadoop lifecycle on AWS
3. Amazon S3 provides good, low-cost cloud storage for Hadoop jobs
4. Security is everyone’s responsibility. Enforce multi-layer security that covers cloud,
cluster, and data access, authentication, and encryption
5. Metadata is key to data management and governance. Without it, your users can’t
answer the questions that matter to your business