Your SlideShare is downloading. ×
Big Data Analytics with Amazon Web Services
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data Analytics with Amazon Web Services

4,343
views

Published on

Slides from the online seminar, August 1st 2012.

Slides from the online seminar, August 1st 2012.

Published in: Technology

0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,343
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Analyticswith Amazon Web Services Dr. Matt WoodAn Online Seminar for Partners. Wednesday 1st August.
  • 2. Hello, and thank you.
  • 3. Big Data Analytics An introduction
  • 4. Big Data Analytics An introduction The story of analytics on AWS
  • 5. Big Data Analytics An introduction The story of analytics on AWS Integrating partners
  • 6. Big Data Analytics An introduction The story of analytics on AWS Integrating partners Partner success stories
  • 7. 1INTRODUCING BIG DATA
  • 8. Data for competitive advantage.
  • 9. Using data Customer segmentation, financial modeling, system analysis, line-of-sight, business intelligence.
  • 10. Generation Collection & storageAnalytics & computationCollaboration & sharing
  • 11. Cost of data generation is falling.
  • 12. lower cost,increased throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  • 13. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  • 14. Very high barrier to turning data into information.
  • 15. Move from adata generation challenge to analytics challenge.
  • 16. Enter the Cloud.
  • 17. Remove the constraints.
  • 18. Enable data-driven innovation.
  • 19. Move to a distributed data approach.
  • 20. Maturation of two things.
  • 21. Software for distributed storage and analysisMaturation of two things.
  • 22. Software for distributed storage and analysisMaturation of two things. Infrastructure for distributed storage and analysis
  • 23. Software Frameworks for data-intensive workloads. Distributed by design.
  • 24. Infrastructure Platform for data-intensive workloads. Distributed by design.
  • 25. Support thedata timeline.
  • 26. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  • 27. Generation Collection & storageAnalytics & computationCollaboration & sharing
  • 28. Lower thebarrier to entry.
  • 29. Accelerate time to market and increase agility.
  • 30. Enable new business opportunities.
  • 31. Washington Post Pinterest NASA
  • 32. “AWS enables Pfizer to exploredifficult or deep scientificquestions in a timely, scalablemanner and helps us make betterdecisions more quickly”Michael Miller, Pfizer
  • 33. 2THE STORY OF ANALYTICS
  • 34. EC2Utility computing. 6 years young.
  • 35. Scale out systems Embarrassingly parallel problems. Queue based distribution. Small, medium and high scale.
  • 36. Cost optimization. EC2Utility computing. 6 years young.
  • 37. Achieving economies of scale100% Time
  • 38. Achieving economies of scale100% Reserved capacity Time
  • 39. Achieving economies of scale100% On-demand Reserved capacity Time
  • 40. Achieving economies of scale UNUSED CAPACITY100% On-demand Reserved capacity Time
  • 41. Spot Instances Bid on unused EC2 capacity. Very large discount. Perfect for batch runs. Balance cost and scale.
  • 42. $650 per hour
  • 43. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up.
  • 44. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up. Complex cluster configuration and management.
  • 45. Amazon Elastic MapReduce Managed Hadoop clusters. Easy to provision and monitor. Write two functions. Scale up. Optimized for S3 access.
  • 46. S3Input data UNDER THE i i HOOD
  • 47. S3 Input dataCode Elastic MapReduce UNDER THE i i HOOD
  • 48. S3 Input dataCode Elastic Name MapReduce node UNDER THE i i HOOD
  • 49. S3 Input dataCode Elastic Name MapReduce node Elastic cluster UNDER THE i i HOOD
  • 50. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster UNDER THE i i HOOD
  • 51. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster UNDER THE i i HOOD
  • 52. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster UNDER THE i i HOOD
  • 53. S3Input data Output S3 + SimpleDB UNDER THE i i HOOD
  • 54. Performance
  • 55. Performance Compute performance
  • 56. UNDER THE i iCluster Compute HOOD Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings
  • 57. UNDER THE i iCluster Compute HOOD Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings + GPU enabled instances
  • 58. Performance Compute performance
  • 59. IO performancePerformance Compute performance
  • 60. NoSQLUnstructured data storage.
  • 61. DynamoDB Predictable, consistent performance Unlimited storage Single digit millisecond latencies No schema for unstructured data Backed on solid state drives
  • 62. ...and SSDs for all. New Hi1 storage instances.
  • 63. UNDER THE i ihi1.4xlarge HOOD 2 x 1Tb SSDs 10 GigE network HVM: 90k IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write
  • 64. “The hi1.4xlarge configuration isabout half the system cost for thesame throughput.”Netflixhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
  • 65. EBSElastic Block Store
  • 66. Provisioned IOPS Provision required IO performance
  • 67. Provisioned IOPS Provision required IO performance + EBS-optimized instances with dedicated throughput
  • 68. Generation Collection & storageAnalytics & computationCollaboration & sharing
  • 69. Performance + ease of use
  • 70. 3PARTNER INTEGRATION
  • 71. Extend platform with partners
  • 72. Innovate on behalf of customers
  • 73. Remove undifferentiated heavy lifting
  • 74. MapR distribution for EMR Rolled the Amazon Hadoop optimizations into MapR Choice for EMR customers Easy deployment for MapR customers
  • 75. MapR distribution for EMR Hadoop distribution Integrated into EMR NFS and ODBC drivers High availability and cluster mirroring
  • 76. Informatica on EMR Enterprise data toolchain “Swiss army knife” for data formats Data integration Available to all on EMR
  • 77. AWS MarketplaceKarmasphere, Marketshare, Acunu Cassandra, Metamarkets, Aspera and more. aws.amazon.com/marketplace
  • 78. 4PARTNER SUCCESS STORIES
  • 79. Razorfish
  • 80. 3.5 billion records71MM unique cookies 1.7MM targeted ads per day
  • 81. 3.5 billion records 71MM unique cookies 1.7MM targeted ads per day500% improvement in return on ad spend.
  • 82. Cycle Computing + Schrodinger
  • 83. 30k cores, $4200 an hour (compared to $10+ million)
  • 84. Marketshare+ TicketmasterOptimize live event pricing
  • 85. Reduced developer infrastructuremanagement time by 3 hours a day
  • 86. Thank you!
  • 87. Q&Amatthew@amazon.com @mza on Twitter

×