Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
FIELD NOTES
DR. MATT WOOD
GM, PRODUCT S TRATEGY
FROM EXPEDITIONS IN THE CLOUD
GOOD MORNING
MATTHEW@AMAZON .COM
@MZA
THANK YOU
TECHNOLOGY ADOPTION
“WHAT IS CLOUD
COMPUTING?”
“WHAT IS BIG DATA?”
VOLUME
VELOCITY
VARIABILITY
CONSTRAINTS
BOXED IN
CONSTRAINTS
PRODUCTIVITY
Quick provisioning Resource fit
Designed for iteration
Canonical data stores Streaming data Task clusters
SPARK WITH AMAZON EMR
Production workloads on AWS
Security event 

streaming
Machine learning &

ad targeting
Web analytic...
S3 MLlib on Spark
Clicks, videos,

interactives
S3 MLlib on Spark
Clicks, videos,

interactives
S3
Customer 

segments
S3 MLlib on Spark
Clicks, videos,

interactives
S3
Customer 

segments
Forecasting
Ad sales
Map user

to ads
S3 MLlib on Spark
Clicks, videos,

interactives
S3
Customer 

segments
Forecasting
Ad sales
Map user

to ads
Recommendatio...
S3 MLlib on Spark
Clicks, videos,

interactives
S3
Customer 

segments
Forecasting
Ad sales
Map user

to ads
Recommendatio...
Node.js app
Elastic Beanstalk
Node.js app Kinesis
Click stream
Elastic Beanstalk
Node.js app Kinesis
Click stream
Elastic Beanstalk
Spark Streaming
EMR cluster
5 minute

windows
100MB
S3
JSON and 

CSV
Node.js app Kinesis
Click stream
Elastic Beanstalk
Spark Streaming
EMR cluster
5 minute

windows
100MB
S3
JSON and 

CSV
E...
Node.js app Kinesis
Click stream
Elastic Beanstalk
Spark Streaming
EMR cluster
5 minute

windows
100MB
S3
JSON and 

CSV
E...
S3
Ad impressions

& clicks
24/7 Spark

cluster
Impression

RDD
S3
Ad impressions

& clicks
24/7 Spark

cluster
Impression

RDD
Interactive 

dashboard
Revenue

forecast
S3
Ad impressions

& clicks
24/7 Spark

cluster
Impression

RDD
Interactive 

dashboard
Revenue

forecast
Click stream

lo...
S3
Ad impressions

& clicks
24/7 Spark

cluster
Impression

RDD
Interactive 

dashboard
Revenue

forecast
Click stream

lo...
PRODUCTIVITY
Quick provisioning
Designed for iteration
Resource fit
SPARK ON EMR
New
Provision and scale managed Spark clusters, as first class citizens
RAPID PROVISIONING OF ELASTIC CLUSTERS
Provision new clusters in minutes
High memory, high CPU, high IO instances
Access c...
DIRECT ACCESS TO DATA ON S3
Access objects directly on Amazon S3
Server-side and client-side encryption with customer cont...
INTEGRATION WITH THE SPOT MARKET
Bid on under utilized capacity on EC2
“Name your price” clusters
Very low cost at high sc...
SPARK ON EMR
New
No additional cost. Available today.
aws.amazon.com/emr/spark
THANK YOU
MATTHEW@AMAZON .COM
@MZA
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Upcoming SlideShare
Loading in …5
×

Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)

1,404 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics
  • Be the first to comment

Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)

  1. 1. FIELD NOTES DR. MATT WOOD GM, PRODUCT S TRATEGY FROM EXPEDITIONS IN THE CLOUD
  2. 2. GOOD MORNING MATTHEW@AMAZON .COM @MZA
  3. 3. THANK YOU
  4. 4. TECHNOLOGY ADOPTION
  5. 5. “WHAT IS CLOUD COMPUTING?”
  6. 6. “WHAT IS BIG DATA?”
  7. 7. VOLUME VELOCITY VARIABILITY
  8. 8. CONSTRAINTS
  9. 9. BOXED IN
  10. 10. CONSTRAINTS
  11. 11. PRODUCTIVITY Quick provisioning Resource fit Designed for iteration
  12. 12. Canonical data stores Streaming data Task clusters
  13. 13. SPARK WITH AMAZON EMR Production workloads on AWS Security event streaming Machine learning & ad targeting Web analytics Revenue forecasting Ad targeting & recommendations PersonalizationPredictive marketing App search
  14. 14. S3 MLlib on Spark Clicks, videos, interactives
  15. 15. S3 MLlib on Spark Clicks, videos, interactives S3 Customer segments
  16. 16. S3 MLlib on Spark Clicks, videos, interactives S3 Customer segments Forecasting Ad sales Map user to ads
  17. 17. S3 MLlib on Spark Clicks, videos, interactives S3 Customer segments Forecasting Ad sales Map user to ads Recommendation Spark cluster Clicks, videos, interactives
  18. 18. S3 MLlib on Spark Clicks, videos, interactives S3 Customer segments Forecasting Ad sales Map user to ads Recommendation Spark cluster Clicks, videos, interactives Kafka Every click
  19. 19. Node.js app Elastic Beanstalk
  20. 20. Node.js app Kinesis Click stream Elastic Beanstalk
  21. 21. Node.js app Kinesis Click stream Elastic Beanstalk Spark Streaming EMR cluster 5 minute windows 100MB S3 JSON and CSV
  22. 22. Node.js app Kinesis Click stream Elastic Beanstalk Spark Streaming EMR cluster 5 minute windows 100MB S3 JSON and CSV EMRElasticSearch Redshift
  23. 23. Node.js app Kinesis Click stream Elastic Beanstalk Spark Streaming EMR cluster 5 minute windows 100MB S3 JSON and CSV EMRElasticSearch Redshift
  24. 24. S3 Ad impressions & clicks 24/7 Spark cluster Impression RDD
  25. 25. S3 Ad impressions & clicks 24/7 Spark cluster Impression RDD Interactive dashboard Revenue forecast
  26. 26. S3 Ad impressions & clicks 24/7 Spark cluster Impression RDD Interactive dashboard Revenue forecast Click stream logs Batch Spark clusters Redshift
  27. 27. S3 Ad impressions & clicks 24/7 Spark cluster Impression RDD Interactive dashboard Revenue forecast Click stream logs Batch Spark clusters Redshift Data exploration and testing
  28. 28. PRODUCTIVITY Quick provisioning Designed for iteration Resource fit
  29. 29. SPARK ON EMR New Provision and scale managed Spark clusters, as first class citizens
  30. 30. RAPID PROVISIONING OF ELASTIC CLUSTERS Provision new clusters in minutes High memory, high CPU, high IO instances Access cluster instances directly Clusters run within a VPC Add or remove capacity on running clusters
  31. 31. DIRECT ACCESS TO DATA ON S3 Access objects directly on Amazon S3 Server-side and client-side encryption with customer controlled keys Multiple clusters can access canonical data in S3 No need to copy or manage the data on the cluster Mix S3 and HDFS on a cluster
  32. 32. INTEGRATION WITH THE SPOT MARKET Bid on under utilized capacity on EC2 “Name your price” clusters Very low cost at high scale Lowest cost for time insensitive workloads Also on-demand and reserve capacity pricing
  33. 33. SPARK ON EMR New No additional cost. Available today. aws.amazon.com/emr/spark
  34. 34. THANK YOU MATTHEW@AMAZON .COM @MZA

×