Amazon EC2 Spot instances provide acceleration, scale, and deep cost savings to run time-critical, hyper-scale workloads for rapid data analysis. In this session, AOL and Metamarkets will present lessons learned and best practices from scaling their big data workloads using popular platforms like Presto, Spark and Druid.
AOL will present how they process, store, and analyze big data securely and cost effectively using Presto. AOL achieved 70% savings by separating compute and storage, dynamically resizing clusters based on volume and complexity, and using AWS Lambda to orchestrate processing pipelines. Metamarkets, an industry leader in interactive analytics, will present how they leverage Amazon EBS to persist 185 TiB of (compressed) state to run Druid historical nodes on EC2 Spot instances. They will also cover how they run Spark for batch jobs to process 1-4 PiB of data across 200 B to 1 T events/day, saving more than 60% in costs.
2. Amazon EC2 Spot instances
• Regular EC2 instances opened to the Spot market when
spare
• Prices on average 70-80% lower than On-Demand
• Best suited for workloads that can scale with compute
• Accelerate jobs 5-10 times e.g. run faster CI/CD pipelines
(case study: Yelp)
• Reduce costs by 5-10 times, scale stateless web applications
(case study: Mapbox, Ad-tech)
• Generate better business insights from your event stream
3. In this session
• Use Case: context and history
• AOL: Separation of Compute and Storage using Amazon EMR and
EC2 Spot instances
• Architecture
• Cost Optimization
• Orchestration
• Monitoring
• Best Practices
• Metamarkets: Spark and Druid on EC2 Spot instances
• Architecture Overview: Real-time, Batch Jobs, Lambda
• Spark on Spot instances
• Druid on Spot instances
• Monitoring
4. Business Intelligence Data Set
• Event Data
• Timestamp
• Dimensions/Attributes
• Measures
• Total data set is huge, billions of events per day
5. Relational Databases
Traditional Data Warehouse Star
Schema
• FACT table contains primary information
and measures to aggregate
• DIM tables contain additional attributes
about entities
• Queries involve joins between central
FACT and DIM tables
Performance degrades as data scales.
6. Key/Value Stores
Fast writes, fast lookups
• Pre-compute every possible query
• As more columns are added, query
space grows exponentially
• Primary key is a hash of timestamp
and dimensions
• Value is measure to aggregate
• Shuffle data from storage to
computational buffer - slow
• Difficult to create intelligent indexes
Precomputation Range Scans
7. General Compute Engines
SQL on Hadoop
• Scale with compute power
• Generate up to 5-10x faster
business insights with cheaper
compute
• Or just reduce costs by 80-90%
11. Architecture
AWS Lambda :
Orchestration
Elastic IP
Amazon
EMR Hive
AWS IAM
Amazon
S3 : Data
Lake
Amazon Dynamo DB :
Data Validation
Amazon
EMR Hive
client
Amazon RDS : Hive Metastore
Data Processing Data Analytics
Amazon EMR
Presto
Elastic IP
Amazon
EMR Presto
client
12. Key features and advantages
• Separation of compute and storage
• Scale compute and storage independently
• Separate data processing and analytics
• Hive for processing, Presto for analytics
• No data migration
• S3 Data lake
• Single source of truth
• Columnar format for performance and compression
• VPC design
• Identified by Name Tags
• AOL CIDR, VPN
• Few lines of code change vs big data migration efforts
14. Amazon EC2 Spot Instances
• Keep in mind
• Availability
• Spot pricing vary for
• Instance Types
• Availability Zone
• Different provisioning time
• AOL Requirement
• Major restatement - 15-20K EC2 Instances
• Data for 15+ countries
• Frequency : HLY, DLY, WLY, MTD, MLY, 28 Days
15. EMR Deployment Setup
• Set up VPC in all regions
• Ensure Spot Limits
• Setup Hard EC2 limit per AZ
• Multiple instance types
• Define Instance Type-Core Mapping
• Data Volume
• Code Complexity
• Pay actual price not bid price!
16. Deployment Logic Diagram
Data Volume
+
Code
Complexity
Pick
Instance
Type
Sorted Spot
Price AZ
List
Number of
Cores = A
Next AZ
in List?
Open/Active
Instances =
B
A + B <
AZ Limit
Kick off
EMR
Yes
Yes
No
No
17. Average Cost Saving Graphs
**m3.xlarge Sept’2016 Cost
On Demand
Static AZ
~80%
Savings
18. Average Cost Saving Graphs
Static AZ
Cheap AZ
10-15%
Savings
**m3.xlarge Sept’2016 Cost
19. Size ( GB
)
Cores Hours
Local AZ
Cost
Cheaper
AZ Cost
Transfer
Cost
Total
Cost
Cost
Savings
50 100 2 3,431 2,847 365 3,212 6%
100 300 3 20,586 17,082 730 17,812 13%
200 500 5 51,465 42,705 1,460 44,165 14%
300 700 7 109,792 91,104 2,190 93,294 15%
Why cheaper AZ matters?
• Data transfer cost
• Worst Case scenario – Cheaper AZ not in local region
• More Data => More Nodes + More Hours
Size ( GB
)
Cores Hours
Local AZ
Cost
Cheaper
AZ Cost
Transfer
Cost
Total
Cost
Cost
Savings
10 25 1 429 356 73 429 0%
**m3.xlarge Sept’2016 Cost
25. AOL DW Process Pipeline
Amazon S3
Amazon S3
Amazon EMR
Amazon EMR
AWS Lambda
Python Boto
AWS Lambda
Python Boto
26. Benefits & Suggestions
• Improved SLA due to Event based model
• Serverless – Zero Administration
• Millisecond response time
• Pricing – 1 million requests/month Free
• Generic utilities for Extensibility
• Built in Auto Scaling
• CloudWatch Logging
• Replaced ~2000 Autosys jobs
30. Good to have
• S3 Lifecycle based on Tags
• Terminate Long STARTING EMR Cluster
• Python 3 Lambda Support
• Lambda Code Test/Deployment
• Kappa
• Global EMR Dashboard
• Redshift External Tables
31. Recap
• Transient Spot Architecture
• S3 as Data Lake
• Cost Optimization
• Dynamic choice of Spot AZ and Number of Cores
• Server less Process Pipeline
• AWS Lambda for event driven design
• Automated EMR Monitoring
• Reduce Manual intervention for 1000s of clusters
32. Photo Credits
• Gabor Kiss - http://bit.ly/2epkQJY
• AustinPixels- http://bit.ly/2eAenqr
• Mike - http://bit.ly/2eqGx82
33. Related Sessions
• AWS re:Invent 2015 | (BDT208) A Technical Introduction
to Amazon Elastic MapReduce
• https://www.youtube.com/watch?v=WnFYoiRqEHw
• AWS re:Invent 2015 | (BDT210) Building Scalable Big
Data Solutions: Intel & AOL
• https://www.youtube.com/watch?v=2yZginBYcEo
46. Why Spark?
The Good:
+ No HDFS
+ Good enough partial failure recovery
+ Native Mesos, Yarn, and Stand-alone
The Bad:
+ Rough to configure multi-tenant
47. Spark
+ Between 1 and 4 PiB / day
(mem bytes spilled)
+ Between 200B and 1T events / day
+ Peak days can be up to 5x baseline
Think Big.
49. Tradeoff
+ More complex job failure handling
+ “Did my job die because of Me, Spark, the
Data, or the Market?”
+ More random delays
+ More man-hours to manage, or
automation to build
51. Druid on Spot
Some of our Historical nodes run on Spot
185 TB (compressed)
state on EBS on Spot
⅕ of a petabyte can vanish… and come back in
15 minutes
52. Druid Historical Data
1 hr < EVENT_TIME < X Months
X Months < EVENT_TIME < Y Months
HOT
Y Months < EVENT_TIME < Z Years
COLD
ICY
55. Using EBS With Druid on Spot
+ Define a “pool” tag or EBS volumes
+ If EBS “pool” is “empty” (no unmounted volumes)
Create a new volume (with proper tags) and mount it
+ Otherwise, claim drive from pool
+ Sanity check on volume, discard if unrecoverable
56. Using EBS With Druid on Spot
+ Monitor spot notifications[1] to stop gracefully
+ If stop is detected, prepare to die gracefully
+ Stop applications (hook)
+ Unmount volume cleanly
+ Do not actually terminate instance; wait for death
[1] https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/
57. Terrifying to Boring
(Originally ran without EBS reattachment)
[ops] Search Alert: More than 0 results found
for "DRUID - Spot Market Fluctuations"
Now mundane.
58. Druid Tips
+ Coordinator (thing that moves state around) does
better with NO tier than with a half-tier
+ Flapping nodes can cause backpressure, better to
kill entire tier than repeatedly flap up and down.
+ Nodes usually have a burn-in time before they reach
steady-state fast queries (few minutes)
59. Druid + Spot + EBS
Accomplished by EBS re-attachment
Metamarkets is proud to Open Source this tool
Be Open.
65. Spot Caveats
+ Switching from Spot to On-Demand does NOT
always work
+ Pricing strategy tuned to value of lost work
+ Scaling in a Spot market must be done SLOWLY
(tens of nodes at a time)
+ us-east-1 is crowded
66. Lessons Learned… “If I could do it all over
again”
+ Multi-homed (at least by AZ) from the very start
+ us-west
+ More ZK quorums
+ Build on cluster resource framework
68. Metamarkets and Spot
+ Metamarkets has great internal tooling for Spot
market insight
+ Druid uses EBS reattachment
+ Spark works well with proper configuration