SlideShare a Scribd company logo
1 of 71
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
November 30, 2016
Disrupting Big Data with
Cost-effective Compute
Charles Allen, Metamarkets
Durga Nemani, Gaurav Agrawal, AOL
Anu Sharma, Amazon EC2
CMP302
Amazon EC2 Spot instances
• Regular EC2 instances opened to the Spot market when
spare
• Prices on average 70-80% lower than On-Demand
• Best suited for workloads that can scale with compute
• Accelerate jobs 5-10 times e.g. run faster CI/CD pipelines
(case study: Yelp)
• Reduce costs by 5-10 times, scale stateless web applications
(case study: Mapbox, Ad-tech)
• Generate better business insights from your event stream
In this session
• Use Case: context and history
• AOL: Separation of Compute and Storage using Amazon EMR and
EC2 Spot instances
• Architecture
• Cost Optimization
• Orchestration
• Monitoring
• Best Practices
• Metamarkets: Spark and Druid on EC2 Spot instances
• Architecture Overview: Real-time, Batch Jobs, Lambda
• Spark on Spot instances
• Druid on Spot instances
• Monitoring
Business Intelligence Data Set
• Event Data
• Timestamp
• Dimensions/Attributes
• Measures
• Total data set is huge, billions of events per day
Relational Databases
Traditional Data Warehouse Star
Schema
• FACT table contains primary information
and measures to aggregate
• DIM tables contain additional attributes
about entities
• Queries involve joins between central
FACT and DIM tables
Performance degrades as data scales.
Key/Value Stores
Fast writes, fast lookups
• Pre-compute every possible query
• As more columns are added, query
space grows exponentially
• Primary key is a hash of timestamp
and dimensions
• Value is measure to aggregate
• Shuffle data from storage to
computational buffer - slow
• Difficult to create intelligent indexes
Precomputation Range Scans
General Compute Engines
SQL on Hadoop
• Scale with compute power
• Generate up to 5-10x faster
business insights with cheaper
compute
• Or just reduce costs by 80-90%
Pioneers to Settlers
Algorithmic Efficiency to Mundane Efficiency
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Separation of Compute and Storage
Durga Nemani, System Architect, AOL
Gaurav Agrawal, Software Engineer, AOL
Big Data Processing with Amazon EMR and EC2 Spot instances
Architecture
Architecture
AWS Lambda :
Orchestration
Elastic IP
Amazon
EMR Hive
AWS IAM
Amazon
S3 : Data
Lake
Amazon Dynamo DB :
Data Validation
Amazon
EMR Hive
client
Amazon RDS : Hive Metastore
Data Processing Data Analytics
Amazon EMR
Presto
Elastic IP
Amazon
EMR Presto
client
Key features and advantages
• Separation of compute and storage
• Scale compute and storage independently
• Separate data processing and analytics
• Hive for processing, Presto for analytics
• No data migration
• S3 Data lake
• Single source of truth
• Columnar format for performance and compression
• VPC design
• Identified by Name Tags
• AOL CIDR, VPN
• Few lines of code change vs big data migration efforts
Cost Optimization
Amazon EC2 Spot Instances
• Keep in mind
• Availability
• Spot pricing vary for
• Instance Types
• Availability Zone
• Different provisioning time
• AOL Requirement
• Major restatement - 15-20K EC2 Instances
• Data for 15+ countries
• Frequency : HLY, DLY, WLY, MTD, MLY, 28 Days
EMR Deployment Setup
• Set up VPC in all regions
• Ensure Spot Limits
• Setup Hard EC2 limit per AZ
• Multiple instance types
• Define Instance Type-Core Mapping
• Data Volume
• Code Complexity
• Pay actual price not bid price!
Deployment Logic Diagram
Data Volume
+
Code
Complexity
Pick
Instance
Type
Sorted Spot
Price AZ
List
Number of
Cores = A
Next AZ
in List?
Open/Active
Instances =
B
A + B <
AZ Limit
Kick off
EMR
Yes
Yes
No
No
Average Cost Saving Graphs
**m3.xlarge Sept’2016 Cost
On Demand
Static AZ
~80%
Savings
Average Cost Saving Graphs
Static AZ
Cheap AZ
10-15%
Savings
**m3.xlarge Sept’2016 Cost
Size ( GB
)
Cores Hours
Local AZ
Cost
Cheaper
AZ Cost
Transfer
Cost
Total
Cost
Cost
Savings
50 100 2 3,431 2,847 365 3,212 6%
100 300 3 20,586 17,082 730 17,812 13%
200 500 5 51,465 42,705 1,460 44,165 14%
300 700 7 109,792 91,104 2,190 93,294 15%
Why cheaper AZ matters?
• Data transfer cost
• Worst Case scenario – Cheaper AZ not in local region
• More Data => More Nodes + More Hours
Size ( GB
)
Cores Hours
Local AZ
Cost
Cheaper
AZ Cost
Transfer
Cost
Total
Cost
Cost
Savings
10 25 1 429 356 73 429 0%
**m3.xlarge Sept’2016 Cost
EMR Region Distribution
us-east-1
20%
ap-northeast-1
1%
sa-east-1
9%
ap-southeast-1
3%
ap-southeast-2
9%
us-west-2
26%
us-west-1
4%
eu-west-1
28%
AOL DW Sept-Oct 2016
80% times
Cheaper AZ is
not in local
region
Average Cost Saving Graphs
9/1/16
9/2/16
9/3/16
9/4/16
9/5/16
9/6/16
9/7/16
9/8/16
9/9/16
9/10/16
9/11/16
9/12/16
9/13/16
9/14/16
9/15/16
9/16/16
9/17/16
9/18/16
9/19/16
9/20/16
9/21/16
9/22/16
9/23/16
9/24/16
9/25/16
9/26/16
9/27/16
9/28/16
9/29/16
9/30/16
Static AZ
Cheap AZ
AZ+Data+Code
15-22%
Savings
**m3.xlarge Sept’2016 Cost
Orchestration: AWS Lambda
Process Pipeline Overview
• Multiple Stages b/w Raw Data & Final Summary
• Ensure dependencies
• Integration with Data services
• Extensible, Scalable & Reliable
• Recovery Options
• Notifications
• Directed Acyclic Graph
Sample DW Workflow
a
b c
e
jg h i
d
Operations
AOL DW Process Pipeline
Amazon S3
Amazon S3
Amazon EMR
Amazon EMR
AWS Lambda
Python Boto
AWS Lambda
Python Boto
Benefits & Suggestions
• Improved SLA due to Event based model
• Serverless – Zero Administration
• Millisecond response time
• Pricing – 1 million requests/month Free
• Generic utilities for Extensibility
• Built in Auto Scaling
• CloudWatch Logging
• Replaced ~2000 Autosys jobs
EMR Monitoring
EMR Monitoring - Prunella
• Tons of clusters/day
• EMR Failure causes
• Network Connectivity
• Bootstrap Actions
• Zero OPS Hours
• SLA improvement
• No datacenter dependency
• Notifications – Email/Slack
Good to have
• S3 Lifecycle based on Tags
• Terminate Long STARTING EMR Cluster
• Python 3 Lambda Support
• Lambda Code Test/Deployment
• Kappa
• Global EMR Dashboard
• Redshift External Tables
Recap
• Transient Spot Architecture
• S3 as Data Lake
• Cost Optimization
• Dynamic choice of Spot AZ and Number of Cores
• Server less Process Pipeline
• AWS Lambda for event driven design
• Automated EMR Monitoring
• Reduce Manual intervention for 1000s of clusters
Photo Credits
• Gabor Kiss - http://bit.ly/2epkQJY
• AustinPixels- http://bit.ly/2eAenqr
• Mike - http://bit.ly/2eqGx82
Related Sessions
• AWS re:Invent 2015 | (BDT208) A Technical Introduction
to Amazon Elastic MapReduce
• https://www.youtube.com/watch?v=WnFYoiRqEHw
• AWS re:Invent 2015 | (BDT210) Building Scalable Big
Data Solutions: Intel & AOL
• https://www.youtube.com/watch?v=2yZginBYcEo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Spark and Druid on EC2 Spot Instances
Charles Allen, Metamarkets
About Me
Director of Platform
Druid PMC
@drcrallen
charles.allen@metamarkets.com
Special thanks to Jisoo Kim
Programmatic data is 100x larger than Wall
Street
Metamarkets
+ Industry leader in interactive analytics for
programmatic marketing
+ > 100B events / day
+ Typical peak approx 2,000,000 events / sec
+ Massaged, joined, HA replicated → 3M/s
Move fast. Think big. Be open. Have fun.
Metamarkets
+ Event ingestion lag down to few ms
+ Dynamic queries
+ Query latency less than 1 second
+ Specially tailored for real-time bidding
Current Spot Usage
Current Spot Usage
+ Spark
+ Druid
+ Jenkins
Brief Architecture Overview - Real-time
Kafka
Druid Real-
time
Indexing
Kafka / Samza
Very Fast
● Pretty accurate
● On-time data
Brief Architecture Overview - Batch
Kafka
Druid
Historical
S3 Spark
A Few Hours Later
● Highly accurate
● Deduplicated
● Late data
Brief Architecture Overview (Lambda)
Real Time
Batch
Kafka
ΔFew Hrs
Historical
User
Key Technologies Used
+ Kafka
+ Samza
+ Spark
+ Druid
Spark on Spot
Why Spark?
The Good:
+ No HDFS
+ Good enough partial failure recovery
+ Native Mesos, Yarn, and Stand-alone
The Bad:
+ Rough to configure multi-tenant
Spark
+ Between 1 and 4 PiB / day
(mem bytes spilled)
+ Between 200B and 1T events / day
+ Peak days can be up to 5x baseline
Think Big.
Cost Savings
SPARK
Savings vs. on-demand
Approx equal to 3-year term
>60%
Tradeoff
+ More complex job failure handling
+ “Did my job die because of Me, Spark, the
Data, or the Market?”
+ More random delays
+ More man-hours to manage, or
automation to build
Druid on Spot
Druid on Spot
Some of our Historical nodes run on Spot
185 TB (compressed)
state on EBS on Spot
⅕ of a petabyte can vanish… and come back in
15 minutes
Druid Historical Data
1 hr < EVENT_TIME < X Months
X Months < EVENT_TIME < Y Months
HOT
Y Months < EVENT_TIME < Z Years
COLD
ICY
Historical Tier QPS (Logscale)
Historical Tier QPS (Logscale)
Spot can
go here
Using EBS With Druid on Spot
+ Define a “pool” tag or EBS volumes
+ If EBS “pool” is “empty” (no unmounted volumes)
Create a new volume (with proper tags) and mount it
+ Otherwise, claim drive from pool
+ Sanity check on volume, discard if unrecoverable
Using EBS With Druid on Spot
+ Monitor spot notifications[1] to stop gracefully
+ If stop is detected, prepare to die gracefully
+ Stop applications (hook)
+ Unmount volume cleanly
+ Do not actually terminate instance; wait for death
[1] https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/
Terrifying to Boring
(Originally ran without EBS reattachment)
[ops] Search Alert: More than 0 results found
for "DRUID - Spot Market Fluctuations"
Now mundane.
Druid Tips
+ Coordinator (thing that moves state around) does
better with NO tier than with a half-tier
+ Flapping nodes can cause backpressure, better to
kill entire tier than repeatedly flap up and down.
+ Nodes usually have a burn-in time before they reach
steady-state fast queries (few minutes)
Druid + Spot + EBS
Accomplished by EBS re-attachment
Metamarkets is proud to Open Source this tool
Be Open.
Monitoring
Spot Price on the AWS Management Console
If only there was some tool that allowed
powerful, drill-down analytics on real-time
markets…
x1.32xl price stability across zones
Final Thoughts
Spot Caveats
+ Switching from Spot to On-Demand does NOT
always work
+ Pricing strategy tuned to value of lost work
+ Scaling in a Spot market must be done SLOWLY
(tens of nodes at a time)
+ us-east-1 is crowded
Lessons Learned… “If I could do it all over
again”
+ Multi-homed (at least by AZ) from the very start
+ us-west
+ More ZK quorums
+ Build on cluster resource framework
We Are Hiring!
Have Fun!
Metamarkets and Spot
+ Metamarkets has great internal tooling for Spot
market insight
+ Druid uses EBS reattachment
+ Spark works well with proper configuration
Thank you!
Remember to complete
your evaluations!
Related Sessions
• AWS re:Invent 2015 | (BDT208) A Technical Introduction
to Amazon Elastic MapReduce
• https://www.youtube.com/watch?v=WnFYoiRqEHw
AWS re:Invent 2015 | (BDT210) Building Scalable Big Data
Solutions: Intel & AOL
• https://www.youtube.com/watch?v=2yZginBYcEo

More Related Content

What's hot

AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...Amazon Web Services
 
SEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOpsSEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOpsAmazon Web Services
 
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)Amazon Web Services
 
Ponencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - MadridPonencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - MadridAmazon Web Services
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaAmazon Web Services
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)Amazon Web Services
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...Amazon Web Services
 
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)Amazon Web Services
 
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)Amazon Web Services
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...Amazon Web Services
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services
 
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...Amazon Web Services
 
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)Amazon Web Services
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveAmazon Web Services
 
Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceAmazon Web Services
 
SRV301 Getting the most Bang for your buck with #EC2 #Winning
SRV301 Getting the most Bang for your buck with #EC2 #WinningSRV301 Getting the most Bang for your buck with #EC2 #Winning
SRV301 Getting the most Bang for your buck with #EC2 #WinningAmazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 

What's hot (20)

AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
 
SEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOpsSEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOps
 
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
 
Ponencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - MadridPonencia Principal - AWS Summit - Madrid
Ponencia Principal - AWS Summit - Madrid
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
 
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
AWS re:Invent 2016: Large-scale AWS Migrations (ENT204)
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
 
Operating your Production API
Operating your Production APIOperating your Production API
Operating your Production API
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
 
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
AWS re:Invent 2016: Running Lean Architectures: How to Optimize for Cost Effi...
 
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
SRV301 Getting the most Bang for your buck with #EC2 #Winning
SRV301 Getting the most Bang for your buck with #EC2 #WinningSRV301 Getting the most Bang for your buck with #EC2 #Winning
SRV301 Getting the most Bang for your buck with #EC2 #Winning
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
 

Viewers also liked

2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of Engagement2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of EngagementAerospike, Inc.
 
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...Amazon Web Services
 
Ops, DevOps, NoOps and AWS Lambda
Ops, DevOps, NoOps and AWS LambdaOps, DevOps, NoOps and AWS Lambda
Ops, DevOps, NoOps and AWS LambdaMatthew Boeckman
 
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)Amazon Web Services
 
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...Amazon Web Services
 
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...Amazon Web Services
 
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...Amazon Web Services
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
 
AWS re:Invent 2016: Know Before You Go
AWS re:Invent 2016: Know Before You GoAWS re:Invent 2016: Know Before You Go
AWS re:Invent 2016: Know Before You GoAmazon Web Services
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
 
CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012
CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012
CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...Amazon Web Services
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
 
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory SystemsHybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory SystemsMicronTechnology
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)Amazon Web Services
 

Viewers also liked (20)

2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of Engagement2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of Engagement
 
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
AWS re:Invent 2016: Workshop: Stretching Scalability: Doing more with Amazon ...
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 
Ops, DevOps, NoOps and AWS Lambda
Ops, DevOps, NoOps and AWS LambdaOps, DevOps, NoOps and AWS Lambda
Ops, DevOps, NoOps and AWS Lambda
 
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
AWS re:Invent 2016: How Citus Enables Scalable PostgreSQL on AWS (DAT207)
 
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
 
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
AWS re:Invent 2016: Metering Big Data at AWS: From 0 to 100 Million Records i...
 
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
AWS re:Invent 2016: Know Before You Go
AWS re:Invent 2016: Know Before You GoAWS re:Invent 2016: Know Before You Go
AWS re:Invent 2016: Know Before You Go
 
AWS business essentials
AWS business essentials AWS business essentials
AWS business essentials
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 
CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012
CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012
CPN101 Revving up Your Applications - Compute - AWS re: Invent 2012
 
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
 
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory SystemsHybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
 

Similar to AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)

AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimizationYogesh Sharma
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWSJoseph K. Ziegler
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAmazon Web Services
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Szabolcs Zajdó
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningAmazon Web Services
 
Get the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNINGGet the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNINGAmazon Web Services
 
AWS Cloud Computing for Startups Werner Vogels -part i
AWS Cloud Computing for Startups   Werner Vogels -part iAWS Cloud Computing for Startups   Werner Vogels -part i
AWS Cloud Computing for Startups Werner Vogels -part iAmazon Web Services
 
Benefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftBenefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftAmazon Web Services LATAM
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...Amazon Web Services Korea
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAmazon Web Services
 
AWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfAWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfSal Marcus
 
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applicationsAWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applicationsAmazon Web Services
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningAmazon Web Services
 

Similar to AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302) (20)

AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimization
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWS
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #Winning
 
Get the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNINGGet the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNING
 
AWS Cloud Computing for Startups Werner Vogels -part i
AWS Cloud Computing for Startups   Werner Vogels -part iAWS Cloud Computing for Startups   Werner Vogels -part i
AWS Cloud Computing for Startups Werner Vogels -part i
 
Benefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftBenefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
 
AWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfAWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdf
 
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applicationsAWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
AWS Summit Stockholm 2014 – B5 – The TCO of cloud applications
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #Winning
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 30, 2016 Disrupting Big Data with Cost-effective Compute Charles Allen, Metamarkets Durga Nemani, Gaurav Agrawal, AOL Anu Sharma, Amazon EC2 CMP302
  • 2. Amazon EC2 Spot instances • Regular EC2 instances opened to the Spot market when spare • Prices on average 70-80% lower than On-Demand • Best suited for workloads that can scale with compute • Accelerate jobs 5-10 times e.g. run faster CI/CD pipelines (case study: Yelp) • Reduce costs by 5-10 times, scale stateless web applications (case study: Mapbox, Ad-tech) • Generate better business insights from your event stream
  • 3. In this session • Use Case: context and history • AOL: Separation of Compute and Storage using Amazon EMR and EC2 Spot instances • Architecture • Cost Optimization • Orchestration • Monitoring • Best Practices • Metamarkets: Spark and Druid on EC2 Spot instances • Architecture Overview: Real-time, Batch Jobs, Lambda • Spark on Spot instances • Druid on Spot instances • Monitoring
  • 4. Business Intelligence Data Set • Event Data • Timestamp • Dimensions/Attributes • Measures • Total data set is huge, billions of events per day
  • 5. Relational Databases Traditional Data Warehouse Star Schema • FACT table contains primary information and measures to aggregate • DIM tables contain additional attributes about entities • Queries involve joins between central FACT and DIM tables Performance degrades as data scales.
  • 6. Key/Value Stores Fast writes, fast lookups • Pre-compute every possible query • As more columns are added, query space grows exponentially • Primary key is a hash of timestamp and dimensions • Value is measure to aggregate • Shuffle data from storage to computational buffer - slow • Difficult to create intelligent indexes Precomputation Range Scans
  • 7. General Compute Engines SQL on Hadoop • Scale with compute power • Generate up to 5-10x faster business insights with cheaper compute • Or just reduce costs by 80-90%
  • 8. Pioneers to Settlers Algorithmic Efficiency to Mundane Efficiency
  • 9. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Separation of Compute and Storage Durga Nemani, System Architect, AOL Gaurav Agrawal, Software Engineer, AOL Big Data Processing with Amazon EMR and EC2 Spot instances
  • 11. Architecture AWS Lambda : Orchestration Elastic IP Amazon EMR Hive AWS IAM Amazon S3 : Data Lake Amazon Dynamo DB : Data Validation Amazon EMR Hive client Amazon RDS : Hive Metastore Data Processing Data Analytics Amazon EMR Presto Elastic IP Amazon EMR Presto client
  • 12. Key features and advantages • Separation of compute and storage • Scale compute and storage independently • Separate data processing and analytics • Hive for processing, Presto for analytics • No data migration • S3 Data lake • Single source of truth • Columnar format for performance and compression • VPC design • Identified by Name Tags • AOL CIDR, VPN • Few lines of code change vs big data migration efforts
  • 14. Amazon EC2 Spot Instances • Keep in mind • Availability • Spot pricing vary for • Instance Types • Availability Zone • Different provisioning time • AOL Requirement • Major restatement - 15-20K EC2 Instances • Data for 15+ countries • Frequency : HLY, DLY, WLY, MTD, MLY, 28 Days
  • 15. EMR Deployment Setup • Set up VPC in all regions • Ensure Spot Limits • Setup Hard EC2 limit per AZ • Multiple instance types • Define Instance Type-Core Mapping • Data Volume • Code Complexity • Pay actual price not bid price!
  • 16. Deployment Logic Diagram Data Volume + Code Complexity Pick Instance Type Sorted Spot Price AZ List Number of Cores = A Next AZ in List? Open/Active Instances = B A + B < AZ Limit Kick off EMR Yes Yes No No
  • 17. Average Cost Saving Graphs **m3.xlarge Sept’2016 Cost On Demand Static AZ ~80% Savings
  • 18. Average Cost Saving Graphs Static AZ Cheap AZ 10-15% Savings **m3.xlarge Sept’2016 Cost
  • 19. Size ( GB ) Cores Hours Local AZ Cost Cheaper AZ Cost Transfer Cost Total Cost Cost Savings 50 100 2 3,431 2,847 365 3,212 6% 100 300 3 20,586 17,082 730 17,812 13% 200 500 5 51,465 42,705 1,460 44,165 14% 300 700 7 109,792 91,104 2,190 93,294 15% Why cheaper AZ matters? • Data transfer cost • Worst Case scenario – Cheaper AZ not in local region • More Data => More Nodes + More Hours Size ( GB ) Cores Hours Local AZ Cost Cheaper AZ Cost Transfer Cost Total Cost Cost Savings 10 25 1 429 356 73 429 0% **m3.xlarge Sept’2016 Cost
  • 21. Average Cost Saving Graphs 9/1/16 9/2/16 9/3/16 9/4/16 9/5/16 9/6/16 9/7/16 9/8/16 9/9/16 9/10/16 9/11/16 9/12/16 9/13/16 9/14/16 9/15/16 9/16/16 9/17/16 9/18/16 9/19/16 9/20/16 9/21/16 9/22/16 9/23/16 9/24/16 9/25/16 9/26/16 9/27/16 9/28/16 9/29/16 9/30/16 Static AZ Cheap AZ AZ+Data+Code 15-22% Savings **m3.xlarge Sept’2016 Cost
  • 23. Process Pipeline Overview • Multiple Stages b/w Raw Data & Final Summary • Ensure dependencies • Integration with Data services • Extensible, Scalable & Reliable • Recovery Options • Notifications • Directed Acyclic Graph
  • 24. Sample DW Workflow a b c e jg h i d Operations
  • 25. AOL DW Process Pipeline Amazon S3 Amazon S3 Amazon EMR Amazon EMR AWS Lambda Python Boto AWS Lambda Python Boto
  • 26. Benefits & Suggestions • Improved SLA due to Event based model • Serverless – Zero Administration • Millisecond response time • Pricing – 1 million requests/month Free • Generic utilities for Extensibility • Built in Auto Scaling • CloudWatch Logging • Replaced ~2000 Autosys jobs
  • 28. EMR Monitoring - Prunella • Tons of clusters/day • EMR Failure causes • Network Connectivity • Bootstrap Actions • Zero OPS Hours • SLA improvement • No datacenter dependency • Notifications – Email/Slack
  • 29.
  • 30. Good to have • S3 Lifecycle based on Tags • Terminate Long STARTING EMR Cluster • Python 3 Lambda Support • Lambda Code Test/Deployment • Kappa • Global EMR Dashboard • Redshift External Tables
  • 31. Recap • Transient Spot Architecture • S3 as Data Lake • Cost Optimization • Dynamic choice of Spot AZ and Number of Cores • Server less Process Pipeline • AWS Lambda for event driven design • Automated EMR Monitoring • Reduce Manual intervention for 1000s of clusters
  • 32. Photo Credits • Gabor Kiss - http://bit.ly/2epkQJY • AustinPixels- http://bit.ly/2eAenqr • Mike - http://bit.ly/2eqGx82
  • 33. Related Sessions • AWS re:Invent 2015 | (BDT208) A Technical Introduction to Amazon Elastic MapReduce • https://www.youtube.com/watch?v=WnFYoiRqEHw • AWS re:Invent 2015 | (BDT210) Building Scalable Big Data Solutions: Intel & AOL • https://www.youtube.com/watch?v=2yZginBYcEo
  • 34. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Spark and Druid on EC2 Spot Instances Charles Allen, Metamarkets
  • 35. About Me Director of Platform Druid PMC @drcrallen charles.allen@metamarkets.com Special thanks to Jisoo Kim
  • 36. Programmatic data is 100x larger than Wall Street
  • 37. Metamarkets + Industry leader in interactive analytics for programmatic marketing + > 100B events / day + Typical peak approx 2,000,000 events / sec + Massaged, joined, HA replicated → 3M/s Move fast. Think big. Be open. Have fun.
  • 38. Metamarkets + Event ingestion lag down to few ms + Dynamic queries + Query latency less than 1 second + Specially tailored for real-time bidding
  • 40. Current Spot Usage + Spark + Druid + Jenkins
  • 41. Brief Architecture Overview - Real-time Kafka Druid Real- time Indexing Kafka / Samza Very Fast ● Pretty accurate ● On-time data
  • 42. Brief Architecture Overview - Batch Kafka Druid Historical S3 Spark A Few Hours Later ● Highly accurate ● Deduplicated ● Late data
  • 43. Brief Architecture Overview (Lambda) Real Time Batch Kafka ΔFew Hrs Historical User
  • 44. Key Technologies Used + Kafka + Samza + Spark + Druid
  • 46. Why Spark? The Good: + No HDFS + Good enough partial failure recovery + Native Mesos, Yarn, and Stand-alone The Bad: + Rough to configure multi-tenant
  • 47. Spark + Between 1 and 4 PiB / day (mem bytes spilled) + Between 200B and 1T events / day + Peak days can be up to 5x baseline Think Big.
  • 48. Cost Savings SPARK Savings vs. on-demand Approx equal to 3-year term >60%
  • 49. Tradeoff + More complex job failure handling + “Did my job die because of Me, Spark, the Data, or the Market?” + More random delays + More man-hours to manage, or automation to build
  • 51. Druid on Spot Some of our Historical nodes run on Spot 185 TB (compressed) state on EBS on Spot ⅕ of a petabyte can vanish… and come back in 15 minutes
  • 52. Druid Historical Data 1 hr < EVENT_TIME < X Months X Months < EVENT_TIME < Y Months HOT Y Months < EVENT_TIME < Z Years COLD ICY
  • 53. Historical Tier QPS (Logscale)
  • 54. Historical Tier QPS (Logscale) Spot can go here
  • 55. Using EBS With Druid on Spot + Define a “pool” tag or EBS volumes + If EBS “pool” is “empty” (no unmounted volumes) Create a new volume (with proper tags) and mount it + Otherwise, claim drive from pool + Sanity check on volume, discard if unrecoverable
  • 56. Using EBS With Druid on Spot + Monitor spot notifications[1] to stop gracefully + If stop is detected, prepare to die gracefully + Stop applications (hook) + Unmount volume cleanly + Do not actually terminate instance; wait for death [1] https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/
  • 57. Terrifying to Boring (Originally ran without EBS reattachment) [ops] Search Alert: More than 0 results found for "DRUID - Spot Market Fluctuations" Now mundane.
  • 58. Druid Tips + Coordinator (thing that moves state around) does better with NO tier than with a half-tier + Flapping nodes can cause backpressure, better to kill entire tier than repeatedly flap up and down. + Nodes usually have a burn-in time before they reach steady-state fast queries (few minutes)
  • 59. Druid + Spot + EBS Accomplished by EBS re-attachment Metamarkets is proud to Open Source this tool Be Open.
  • 61. Spot Price on the AWS Management Console If only there was some tool that allowed powerful, drill-down analytics on real-time markets…
  • 62.
  • 63. x1.32xl price stability across zones
  • 65. Spot Caveats + Switching from Spot to On-Demand does NOT always work + Pricing strategy tuned to value of lost work + Scaling in a Spot market must be done SLOWLY (tens of nodes at a time) + us-east-1 is crowded
  • 66. Lessons Learned… “If I could do it all over again” + Multi-homed (at least by AZ) from the very start + us-west + More ZK quorums + Build on cluster resource framework
  • 68. Metamarkets and Spot + Metamarkets has great internal tooling for Spot market insight + Druid uses EBS reattachment + Spark works well with proper configuration
  • 71. Related Sessions • AWS re:Invent 2015 | (BDT208) A Technical Introduction to Amazon Elastic MapReduce • https://www.youtube.com/watch?v=WnFYoiRqEHw AWS re:Invent 2015 | (BDT210) Building Scalable Big Data Solutions: Intel & AOL • https://www.youtube.com/watch?v=2yZginBYcEo