Big Data Analytics with Amazon Web Services

7,925 views

Published on

Slides from the online seminar, August 1st 2012.

Published in: Technology

Big Data Analytics with Amazon Web Services

  1. 1. Big Data Analyticswith Amazon Web Services Dr. Matt WoodAn Online Seminar for Partners. Wednesday 1st August.
  2. 2. Hello, and thank you.
  3. 3. Big Data Analytics An introduction
  4. 4. Big Data Analytics An introduction The story of analytics on AWS
  5. 5. Big Data Analytics An introduction The story of analytics on AWS Integrating partners
  6. 6. Big Data Analytics An introduction The story of analytics on AWS Integrating partners Partner success stories
  7. 7. 1INTRODUCING BIG DATA
  8. 8. Data for competitive advantage.
  9. 9. Using data Customer segmentation, financial modeling, system analysis, line-of-sight, business intelligence.
  10. 10. Generation Collection & storageAnalytics & computationCollaboration & sharing
  11. 11. Cost of data generation is falling.
  12. 12. lower cost,increased throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  13. 13. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  14. 14. Very high barrier to turning data into information.
  15. 15. Move from adata generation challenge to analytics challenge.
  16. 16. Enter the Cloud.
  17. 17. Remove the constraints.
  18. 18. Enable data-driven innovation.
  19. 19. Move to a distributed data approach.
  20. 20. Maturation of two things.
  21. 21. Software for distributed storage and analysisMaturation of two things.
  22. 22. Software for distributed storage and analysisMaturation of two things. Infrastructure for distributed storage and analysis
  23. 23. Software Frameworks for data-intensive workloads. Distributed by design.
  24. 24. Infrastructure Platform for data-intensive workloads. Distributed by design.
  25. 25. Support thedata timeline.
  26. 26. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  27. 27. Generation Collection & storageAnalytics & computationCollaboration & sharing
  28. 28. Lower thebarrier to entry.
  29. 29. Accelerate time to market and increase agility.
  30. 30. Enable new business opportunities.
  31. 31. Washington Post Pinterest NASA
  32. 32. “AWS enables Pfizer to exploredifficult or deep scientificquestions in a timely, scalablemanner and helps us make betterdecisions more quickly”Michael Miller, Pfizer
  33. 33. 2THE STORY OF ANALYTICS
  34. 34. EC2Utility computing. 6 years young.
  35. 35. Scale out systems Embarrassingly parallel problems. Queue based distribution. Small, medium and high scale.
  36. 36. Cost optimization. EC2Utility computing. 6 years young.
  37. 37. Achieving economies of scale100% Time
  38. 38. Achieving economies of scale100% Reserved capacity Time
  39. 39. Achieving economies of scale100% On-demand Reserved capacity Time
  40. 40. Achieving economies of scale UNUSED CAPACITY100% On-demand Reserved capacity Time
  41. 41. Spot Instances Bid on unused EC2 capacity. Very large discount. Perfect for batch runs. Balance cost and scale.
  42. 42. $650 per hour
  43. 43. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up.
  44. 44. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up. Complex cluster configuration and management.
  45. 45. Amazon Elastic MapReduce Managed Hadoop clusters. Easy to provision and monitor. Write two functions. Scale up. Optimized for S3 access.
  46. 46. S3Input data UNDER THE i i HOOD
  47. 47. S3 Input dataCode Elastic MapReduce UNDER THE i i HOOD
  48. 48. S3 Input dataCode Elastic Name MapReduce node UNDER THE i i HOOD
  49. 49. S3 Input dataCode Elastic Name MapReduce node Elastic cluster UNDER THE i i HOOD
  50. 50. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster UNDER THE i i HOOD
  51. 51. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster UNDER THE i i HOOD
  52. 52. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster UNDER THE i i HOOD
  53. 53. S3Input data Output S3 + SimpleDB UNDER THE i i HOOD
  54. 54. Performance
  55. 55. Performance Compute performance
  56. 56. UNDER THE i iCluster Compute HOOD Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings
  57. 57. UNDER THE i iCluster Compute HOOD Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings + GPU enabled instances
  58. 58. Performance Compute performance
  59. 59. IO performancePerformance Compute performance
  60. 60. NoSQLUnstructured data storage.
  61. 61. DynamoDB Predictable, consistent performance Unlimited storage Single digit millisecond latencies No schema for unstructured data Backed on solid state drives
  62. 62. ...and SSDs for all. New Hi1 storage instances.
  63. 63. UNDER THE i ihi1.4xlarge HOOD 2 x 1Tb SSDs 10 GigE network HVM: 90k IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write
  64. 64. “The hi1.4xlarge configuration isabout half the system cost for thesame throughput.”Netflixhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
  65. 65. EBSElastic Block Store
  66. 66. Provisioned IOPS Provision required IO performance
  67. 67. Provisioned IOPS Provision required IO performance + EBS-optimized instances with dedicated throughput
  68. 68. Generation Collection & storageAnalytics & computationCollaboration & sharing
  69. 69. Performance + ease of use
  70. 70. 3PARTNER INTEGRATION
  71. 71. Extend platform with partners
  72. 72. Innovate on behalf of customers
  73. 73. Remove undifferentiated heavy lifting
  74. 74. MapR distribution for EMR Rolled the Amazon Hadoop optimizations into MapR Choice for EMR customers Easy deployment for MapR customers
  75. 75. MapR distribution for EMR Hadoop distribution Integrated into EMR NFS and ODBC drivers High availability and cluster mirroring
  76. 76. Informatica on EMR Enterprise data toolchain “Swiss army knife” for data formats Data integration Available to all on EMR
  77. 77. AWS MarketplaceKarmasphere, Marketshare, Acunu Cassandra, Metamarkets, Aspera and more. aws.amazon.com/marketplace
  78. 78. 4PARTNER SUCCESS STORIES
  79. 79. Razorfish
  80. 80. 3.5 billion records71MM unique cookies 1.7MM targeted ads per day
  81. 81. 3.5 billion records 71MM unique cookies 1.7MM targeted ads per day500% improvement in return on ad spend.
  82. 82. Cycle Computing + Schrodinger
  83. 83. 30k cores, $4200 an hour (compared to $10+ million)
  84. 84. Marketshare+ TicketmasterOptimize live event pricing
  85. 85. Reduced developer infrastructuremanagement time by 3 hours a day
  86. 86. Thank you!
  87. 87. Q&Amatthew@amazon.com @mza on Twitter

×