Cloud World Forum: Large Scale Data Analysis on AWS

624 views

Published on

In this talk from the Cloud World Forum Big Data event in June this year, I discuss the benefits of using the AWS Cloud for large scale computation and data processing workloads.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
624
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cloud World Forum: Large Scale Data Analysis on AWS

  1. 1. ianmas@amazon.com @IanMmmm LARGE SCALE DATA ANALYSIS WITH AWS
 
 Ian Massingham – Technical Evangelist
  2. 2. THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN DERIVE FROM IT!
  3. 3. THE COST OF DATA GENERATION IS FALLING!
  4. 4. We are constantly producing more data
  5. 5. From all types of industries
  6. 6. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  7. 7. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Lower cost, higher throughput
  8. 8. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Lower cost, higher throughput Highly constrained
  9. 9. + ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND = REMOVE CONSTRAINTS
  10. 10. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  11. 11. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! AWS Import / Export AWS Direct Connect
  12. 12. Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect
  13. 13. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2
  14. 14. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon EC2 Amazon Elastic MapReduce
  15. 15. AMAZON EC2
 ELASTIC COMPUTE CLOUD!
  16. 16. 3 HOURS
 FOR $4828.85/hr!
  17. 17. Instead of $20+ MILLIONS in infrastructure!
  18. 18. SIMULATING
 205,000 COMPOUNDS!
  19. 19. 18 HOURS
 FOR $1833.33/hr!
  20. 20. Instead of $68+ MILLIONS in infrastructure!
  21. 21. GPU INSTANCES" " G2" CG1  1x NVIDIA Kepler GK104
 8 vCPU (Intel Xeon E5-2670) 2x NVIDIA Fermi M2050
 16 vCPU (Intel Xeon X5570) $0.65/h $2.10/h
  22. 22. ON A SINGLE INSTANCE COMPUTE TIME: 4h COST: 4h x $2.1 = $8.4
  23. 23. ON MULTIPLE INSTANCES COMPUTE TIME: 1h COST: 1h x 4 x $2.1 = $8.4
  24. 24. AMAZON ELASTIC MAPREDUCE
 HADOOP AS A SERVICE!
  25. 25. •  SPLITS DATA INTO PIECES •  LETS PROCESSING OCCUR •  GATHERS THE RESULTS!
  26. 26. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2
  27. 27. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  28. 28. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! BATCH PROCESSING
  29. 29. GENERATE ➔ ➔ SHARE! STREAM PROCESSING
  30. 30. AMAZON KINESIS
 REAL-TIME DATA STREAM PROCESSING!
  31. 31. Real-time response to content in semi-structured data streams
 
 Relatively simple computations on data (aggregates, filters, sliding window, etc.)
  32. 32. Hourly server logs: how your systems went wrong an hour ago Weekly / Monthly Bill: What you spent this past billing cycle Daily customer report from your website: tells you what deal or ad to try next time Daily fraud reports: tells you if there was fraud yesterday Daily business reports: tells me how customers used AWS services yesterday Real-time metrics: what just went wrong now Real-time spending alerts/caps: guaranteeing you can’t overspend Real-time analysis: what to offer the current customer now Real-time detection: blocks fraudulent use now Fast ETL into Amazon Redshift: how are customers using services now
  33. 33. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  34. 34. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 Amazon EC2 Amazon Elastic MapReduce Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 AWS Import / Export AWS Direct Connect
  35. 35. GENERATE ➔ ➔ SHARE! STREAM PROCESSING
  36. 36. GENERATE ➔ ➔ SHARE! STREAM PROCESSING Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 Amazon Kinesis Stream Processing on Amazon EC2
  37. 37. WANT TO KNOW MORE? aws.amazon.com/solutions/case-studies/big-data/!
  38. 38. ianmas@amazon.com @IanMmmm THANK YOU
 
 Ian Massingham – Technical Evangelist
  39. 39. Further References Atomic Fiction Case-Study Video! https://www.youtube.com/watch?v=ljHo1_5sWxo! Slideshare with full details on the Schrodinger Materials Science case-study! http://www.slideshare.net/insideHPC/cycle-computing-recordbreaking-peta-scale-hpc-run! Real-time Streaming and Querying with Amazon Kinesis and Amazon EMR Video! https://www.youtube.com/watch?v=NIa33ZwFa8E! ! !

×