Large Scale Data Analysis with AWS

1,417 views

Published on

This presentation from the AWS Lab at Cloud Expo Europe 2014 explores large scale data analysis on AWS. The cost of data generation is falling. Storing, analyzing and sharing data using the tools that AWS offers a low cost and easy to use solution for creating value from your data assets.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,417
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Large Scale Data Analysis with AWS

  1. 1. LARGE SCALE DATA ANALYSIS WITH AWS Carlos Conde – Sr. Mgr. Solutions Architecture carlosco@amazon.com @caarlco
  2. 2. THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN DERIVE FROM IT
  3. 3. THE COST OF DATA GENERATION IS FALLING
  4. 4. DATA VOLUME Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  5. 5. GENERATE  STORE  ANALYZE  SHARE
  6. 6. Lower cost, higher throughput GENERATE  STORE  ANALYZE  SHARE
  7. 7. Lower cost, higher throughput GENERATE  STORE  ANALYZE  SHARE Highly constrained
  8. 8. + ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND = REMOVE CONSTRAINTS
  9. 9. GENERATE  STORE  ANALYZE  SHARE
  10. 10. AWS Import /Export AWS Direct Connect GENERATE  STORE  ANALYZE  SHARE
  11. 11. Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect
  12. 12. Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE
  13. 13. AMAZON S3 SIMPLE STORAGE SERVICE
  14. 14. CASE STUDY: SPOTIFY ADDS 20,000 TRACKS/DAY TO ITS CATALOGUE
  15. 15. AMAZON DYNAMODB HIGH-PERFORMANCE, FULLY MANAGED NoSQL DATABASE SERVICE
  16. 16. DURABLE & AVAILABLE CONSISTENT, DISK-ONLY WRITES (SSD)
  17. 17. LOW LATENCY AVERAGE READS < 5MS, WRITES < 10MS
  18. 18. NO ADMINISTRATION
  19. 19. CASE STUDY: SHAZAM SUPPORTED 500,000 WRITES/SEC DURING SUPER BOWL
  20. 20. AMAZON REDSHIFT FULLY MANAGED, PETA-BYTE SCALE DATAWAREHOUSE ON AWS
  21. 21. DESIGN OBJECTIVES: A petabyte-scale data warehouse service that was… A Lot Faster AMAZON REDSHIFT A Lot Cheaper A Whole Lot Simpler
  22. 22. 30 MINUTES DOWN TO 12 SECONDS
  23. 23. AMAZON REDSHIFT LETS YOU START SMALL AND GROW BIG Eight Extra Large Node (HS1.8XL) Extra Large Node (HS1.XL) Cluster 2-100 Nodes (32 TB – 1.6 PB) X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L X L X L X L X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L X L X L X L X L 8X L 8X L Cluster 2-32 Nodes (4 TB – 64 TB) 8X L 8X L Single Node (2 TB) 8X L 8X L X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L 8X L
  24. 24. CREATE A DATAWAREHOUSE IN MINUTES
  25. 25. JDBC/ODBC
  26. 26. ON-DEMAND PRICING
  27. 27. PRICE PER TB / YEAR
  28. 28. GENERATE  STORE  ANALYZE  SHARE Amazon EC2 Amazon Elastic MapReduce
  29. 29. AMAZON EC2 ELASTIC COMPUTE CLOUD
  30. 30. 3 HOURS FOR $4828.85/hr
  31. 31. Instead of $20+ MILLIONS in infrastructure
  32. 32. GPU INSTANCES G2 CG1 1x NVIDIA Kepler GK104 8 vCPU (Intel Xeon E5-2670) $ 2x NVIDIA Fermi M2050 16 vCPU (Intel Xeon X5570) 0.65/h $ 2.10/h
  33. 33. ON A SINGLE INSTANCE COMPUTE TIME: 4h COST: 4h x $2.1 = $8.4
  34. 34. ON MULTIPLE INSTANCES COMPUTE TIME: 1h COST: 1h x 4 x $2.1 = $8.4
  35. 35. AMAZON ELASTIC MAPREDUCE HADOOP AS A SERVICE
  36. 36. • SPLITS DATA INTO PIECES • LETS PROCESSING OCCUR • GATHERS THE RESULTS
  37. 37. CASE STUDY: "WITH AMAZON EMR WE CAN ANALYZE 100% OF THE DATA, NOT JUST A SAMPLE" - Sanjeevan Bala, Head of Data Planning & Analytics, Channel 4
  38. 38. Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE
  39. 39. PUBLIC DATA SETS http://aws.amazon.com/publicdatasets
  40. 40. GENERATE  STORE  ANALYZE  SHARE
  41. 41. GENERATE  STORE  ANALYZE  SHARE BATCH PROCESSING
  42. 42. STREAM GENERATE  PROCESSING  SHARE
  43. 43. AMAZON KINESIS REAL-TIME DATA STREAM PROCESSING
  44. 44. Real-time response to content in semi-structured data streams Relatively simple computations on data (aggregates, filters, sliding window, etc.)
  45. 45. Hourly server logs: how your systems went wrong an hour ago Real-time metrics: what just went wrong now Weekly / Monthly Bill: What you spent this past billing cycle Real-time spending alerts/caps: guaranteeing you can’t overspend Daily customer report from your website: tells you what deal or ad to try next time Real-time analysis: what to offer the current customer now Daily fraud reports: tells you if there was fraud yesterday Daily business reports: tells me how customers used AWS services yesterday Real-time detection: blocks fraudulent use now Fast ETL into Amazon Redshift: how are customers using services now
  46. 46. GENERATE  STORE  ANALYZE  SHARE
  47. 47. AWS Import / Export AWS Direct Connect Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE Amazon EC2 Amazon Elastic MapReduce
  48. 48. STREAM GENERATE  PROCESSING  SHARE
  49. 49. Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 STREAM GENERATE  PROCESSING  SHARE Amazon Kinesis Stream Processing on Amazon EC2
  50. 50. FROM DATA TO ACTIONABLE INFORMATION
  51. 51. THANK YOU Carlos Conde – Sr. Mgr. Solutions Architecture carlosco@amazon.com @caarlco

×