Big Data Analytics with AWS and AWS Marketplace Webinar

2,103 views

Published on

https://aws.amazon.com/marketplace/ref=tsm_slideshare_bigdata

Big Data Analytics with AWS and AWS Marketplace Webinar

  1. 1. Big Data Analyticswith Amazon Web Services Dr. Matt Wood An Online Seminar. Tuesday 16th October.
  2. 2. Hello, and thank you.
  3. 3. Big Data Analytics An introduction
  4. 4. Big Data Analytics An introduction The story of analytics on AWS
  5. 5. Big Data Analytics An introduction The story of analytics on AWS AWS Marketplace
  6. 6. Big Data Analytics An introduction The story of analytics on AWS AWS Marketplace Success story: Brightcove
  7. 7. 1INTRODUCING BIG DATA
  8. 8. Data for competitive advantage.
  9. 9. Using data Customer segmentation, financial modeling, system analysis, line-of-sight, business intelligence.
  10. 10. Generation Collection & storageAnalytics & computationCollaboration & sharing
  11. 11. Cost of data generation is falling.
  12. 12. lower cost,increased throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  13. 13. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  14. 14. Very high barrier to turning data into information.
  15. 15. Move from adata generation challenge to analytics challenge.
  16. 16. Enter the Cloud.
  17. 17. Remove the constraints.
  18. 18. Enable data-driven innovation.
  19. 19. Move to a distributed data approach.
  20. 20. Maturation of two things.
  21. 21. Software for distributed storage and analysisMaturation of two things.
  22. 22. Software for distributed storage and analysisMaturation of two things. Infrastructure for distributed storage and analysis
  23. 23. Software Frameworks for data-intensive workloads. Distributed by design.
  24. 24. Infrastructure Platform for data-intensive workloads. Distributed by design.
  25. 25. Support thedata timeline.
  26. 26. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  27. 27. Generation Collection & storageAnalytics & computationCollaboration & sharing
  28. 28. Lower thebarrier to entry.
  29. 29. Accelerate time to market and increase agility.
  30. 30. Enable new business opportunities.
  31. 31. Washington Post Pinterest NASA
  32. 32. “AWS enables Pfizer to exploredifficult or deep scientificquestions in a timely, scalablemanner and helps us make betterdecisions more quickly”Michael Miller, Pfizer
  33. 33. 2THE STORY OF ANALYTICS
  34. 34. EC2Utility computing. 6 years young.
  35. 35. Scale out systems Embarrassingly parallel problems. Queue based distribution. Small, medium and high scale.
  36. 36. Cost optimization. EC2Utility computing. 6 years young.
  37. 37. Achieving economies of scale100% Time
  38. 38. Achieving economies of scale100% Reserved capacity Time
  39. 39. Achieving economies of scale100% On-demand Reserved capacity Time
  40. 40. Achieving economies of scale UNUSED CAPACITY100% On-demand Reserved capacity Time
  41. 41. Spot Instances Bid on unused EC2 capacity. Very large discount. Perfect for batch runs. Balance cost and scale.
  42. 42. <$1000 per hour
  43. 43. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up.
  44. 44. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up. Complex cluster configuration and management.
  45. 45. Amazon Elastic MapReduce Managed Hadoop clusters. Easy to provision and monitor. Write two functions. Scale up. Optimized for S3 access.
  46. 46. S3Input data
  47. 47. S3 Input dataCode Elastic MapReduce
  48. 48. S3 Input dataCode Elastic Name MapReduce node
  49. 49. S3 Input dataCode Elastic Name MapReduce node Elastic cluster
  50. 50. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
  51. 51. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  52. 52. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  53. 53. S3Input data Output S3 + SimpleDB
  54. 54. Performance
  55. 55. Performance Compute performance
  56. 56. Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings
  57. 57. Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings + GPU enabled instances
  58. 58. Performance Compute performance
  59. 59. IO performancePerformance Compute performance
  60. 60. NoSQLUnstructured data storage.
  61. 61. DynamoDB Predictable, consistent performance Unlimited storage Single digit millisecond latencies No schema for unstructured data Backed on solid state drives
  62. 62. ...and SSDs for all. New Hi1 storage instances.
  63. 63. hi1.4xlarge 2 x 1Tb SSDs 10 GigE network HVM: 90k IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write
  64. 64. “The hi1.4xlarge configuration isabout half the system cost for thesame throughput.”Netflixhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
  65. 65. Generation Collection & storageAnalytics & computationCollaboration & sharing
  66. 66. Performance + ease of use
  67. 67. 3AWS MARKETPLACE
  68. 68. Extend platform with partners
  69. 69. Innovate on behalf of customers
  70. 70. Remove undifferentiated heavy lifting
  71. 71. AWS Marketplaceaws.amazon.com/marketplace
  72. 72. Generation Collection & storageAnalytics & computationCollaboration & sharing
  73. 73. Generation Collection & storageAnalytics & computationCollaboration & sharing
  74. 74. Collection & storage Acunu Reflex Apache Cassandra NoSQL database MongoDB With and without EBS RAID storage Couchbase Community and Enterprise editions ScaleArc MySQL load balancing
  75. 75. Generation Collection & storageAnalytics & computationCollaboration & sharing
  76. 76. Generation Collection & storageAnalytics & computationCollaboration & sharing
  77. 77. Analytics & computation KarmaSphere Analytics for Amazon Elastic MapReduce MapR M5 Hadoop Distribution Metamarkets Event based data processing
  78. 78. Analytics & computation StackIQ Rocks+ HPC clusters with MPI, Grid Engine Univa Grid Engine One click cluster deployment Quantivo Data association analytics
  79. 79. Generation Collection & storageAnalytics & computationCollaboration & sharing
  80. 80. Generation Collection & storageAnalytics & computationCollaboration & sharing
  81. 81. Collaboration & sharingAspera Faspex 20 Mbps data transfer
  82. 82. 4SUCCESS STORY

×