BDT201 AWS Data Pipeline - AWS re: Invent 2012

6,952 views

Published on

In this session, we'll review the features and architecture of the new AWS Data Pipeline service and explain how you can use it to better manage your data-driven workloads. We'll then go over a few examples of setting up and provisioning a pipeline in the system.

2 Comments
11 Likes
Statistics
Notes
No Downloads
Views
Total views
6,952
On SlideShare
0
From Embeds
0
Number of Embeds
59
Actions
Shares
0
Downloads
119
Comments
2
Likes
11
Embeds 0
No embeds

No notes for slide

BDT201 AWS Data Pipeline - AWS re: Invent 2012

  1. Amazon S3 Amazon RDS Amazon AmazonDynamoDB Redshift HDFS On (Amazon EMR) Premise
  2. Amazon DynamoDB Amazon S3
  3. Amazon S3 Amazon RDS Amazon AmazonDynamoDB Redshift HDFS On (Amazon EMR) Premise
  4. Amazon S3 Amazon RDS Amazon AmazonDynamoDB Redshift HDFS On (Amazon EMR) Premise
  5. Amazon S3 Amazon RDS Amazon AmazonDynamoDB Redshift HDFS On (Amazon EMR) Premise
  6. Amazon S3 Amazon RDS Amazon AmazonDynamoDB Redshift HDFS On (Amazon EMR) Premise
  7. Amazon S3 Amazon RDS Amazon AmazonDynamoDB Redshift HDFS On (Amazon EMR) Premise
  8. Input DatanodeActivity[Output Datanode]
  9. Input Datanode with precondition checkActivity with failure & delay notificationsOuput Datanode
  10. Data DataData Stores Data Stores Compute Resources
  11. Start Interval[End]
  12. Noon Today 1 hour
  13. 12-1pm X1-2pm2-3pm …..
  14. 12-1pm X1-2pm2-3pm X 1 day …..
  15. Monthly DailyHourly Quarterly Yearly Weekly
  16. S3 logs (hourly) Geolocation data Per-geographyusage computation (hourly) Redshift results
  17. S3 logs (hourly) Geolocation dataPrecondition: files exist Precondition: ./geo_available Per-geography usage computation (hourly) Redshift results
  18. Dynamo RDSevent data demographics Hive-basedanalysis (hourly) Redshift results
  19. Hourly click updates Hourly event analysis Daily reporting SQL
  20. Custom Amazon RDS Amazon S3 Amazon demographics logs Precondition DynamoDB event data Hive scriptEMR usage-by-geo job Amazon Redshift DW table Amazon Redshift Amazon EC2 DW table report generation
  21. Custom Amazon RDS Amazon S3 Amazon demographics logs Precondition DynamoDB event data Hive scriptEMR usage-by-geo job Amazon Redshift DW table Amazon Redshift Amazon EC2 DW table report generation
  22. We Manage You Manage EMR Clusters EC2 EC2 Instances InstancesEMR Clusters On Premise Resources
  23. { "objects" : [ { "name" : “My Copy”, "type" : “Copy Action”, “input”: {“ref” : “My RDS Data”}, “output”: {“ref” : “My S3 Data”}, ”runsOn” : {“ref”: “My Instance”}, "schedule" : { "ref" : “My Schedule" } }, { "name" : ”My Instance”, "type" : ”EC2Instance”, "instanceType" : "m1.small”, "schedule" : { "ref” : “My Schedule" } },…..}
  24. On AWS On PremiseHigh $1/month $2.50/monthFrequencyLow Frequency $.60/month $1.50/month
  25. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.

×