Your SlideShare is downloading. ×
0
Big Data Analyticswith Amazon Web Services            Dr. Matt Wood   An Online Seminar. Tuesday 16th October.
Hello, and thank you.
Big Data Analytics   An introduction
Big Data Analytics   An introduction   The story of analytics on AWS
Big Data Analytics   An introduction   The story of analytics on AWS   AWS Marketplace
Big Data Analytics   An introduction   The story of analytics on AWS   AWS Marketplace   Success story: Brightcove
1INTRODUCING BIG DATA
Data for competitive     advantage.
Using data  Customer segmentation,  financial modeling,  system analysis,  line-of-sight,  business intelligence.
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Cost of data generation       is falling.
lower cost,increased throughput                             Generation                         Collection & storage       ...
Generation                          HIGHLY CONSTRAINED  Collection & storageAnalytics & computationCollaboration & sharing
Very high barrier to turning  data into information.
Move from adata generation challenge to    analytics challenge.
Enter the Cloud.
Remove the constraints.
Enable data-driven innovation.
Move to a distributed data        approach.
Maturation of two things.
Software for distributed      storage and analysisMaturation of two things.
Software for distributed      storage and analysisMaturation of two things.  Infrastructure for distributed       storage ...
Software  Frameworks for  data-intensive workloads.  Distributed by design.
Infrastructure  Platform for  data-intensive workloads.  Distributed by design.
Support thedata timeline.
Generation                          HIGHLY CONSTRAINED  Collection & storageAnalytics & computationCollaboration & sharing
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Lower thebarrier to entry.
Accelerate time to market   and increase agility.
Enable new business   opportunities.
Washington Post   Pinterest    NASA
“AWS enables Pfizer to exploredifficult or deep scientificquestions in a timely, scalablemanner and helps us make betterde...
2THE STORY OF ANALYTICS
EC2Utility computing. 6 years young.
Scale out systems Embarrassingly parallel problems. Queue based distribution. Small, medium and high scale.
Cost optimization.    EC2Utility computing. 6 years young.
Achieving economies of scale100%                                      Time
Achieving economies of scale100%               Reserved capacity                                      Time
Achieving economies of scale100%                On-demand               Reserved capacity                                 ...
Achieving economies of scale                                   UNUSED CAPACITY100%                On-demand               ...
Spot Instances Bid on unused EC2 capacity. Very large discount. Perfect for batch runs. Balance cost and scale.
<$1000 per hour
Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up.
Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up. Complex c...
Amazon Elastic MapReduce Managed Hadoop clusters. Easy to provision and monitor. Write two functions. Scale up. Optimized ...
S3Input data
S3        Input dataCode     Elastic       MapReduce
S3        Input dataCode     Elastic     Name       MapReduce     node
S3        Input dataCode     Elastic     Name       MapReduce     node                            Elastic                 ...
S3        Input dataCode     Elastic     Name       MapReduce     node                                      HDFS          ...
S3        Input dataCode     Elastic              Name       MapReduce              node                         Queries  ...
S3        Input dataCode     Elastic              Name                            Output       MapReduce              node...
S3Input data                    Output                  S3 + SimpleDB
Performance
Performance Compute performance
Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings
Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings + GPU enabled instances
Performance Compute performance
IO performancePerformance Compute performance
NoSQLUnstructured data storage.
DynamoDB Predictable, consistent performance Unlimited storage Single digit millisecond latencies No schema for unstructur...
...and SSDs for all.  New Hi1 storage instances.
hi1.4xlarge  2 x 1Tb SSDs  10 GigE network  HVM: 90k IOPS read, 9k to 75k write  PV: 120k IOPS read, 10k to 85k write
“The hi1.4xlarge configuration isabout half the system cost for thesame throughput.”Netflixhttp://techblog.netflix.com/201...
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Performance + ease of use
3AWS MARKETPLACE
Extend platform with     partners
Innovate on behalf of    customers
Remove undifferentiated    heavy lifting
AWS Marketplaceaws.amazon.com/marketplace
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Collection & storage    Acunu Reflex    Apache Cassandra NoSQL database    MongoDB    With and without EBS RAID storage   ...
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Analytics & computation      KarmaSphere Analytics      for Amazon Elastic MapReduce      MapR M5      Hadoop Distribution...
Analytics & computation      StackIQ Rocks+      HPC clusters with MPI, Grid Engine      Univa Grid Engine      One click ...
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Generation  Collection & storageAnalytics & computationCollaboration & sharing
Collaboration & sharingAspera Faspex   20 Mbps data transfer
4SUCCESS STORY
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
Upcoming SlideShare
Loading in...5
×

Big Data Analytics with Amazon Web Services

328

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
328
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data Analytics with Amazon Web Services"

  1. 1. Big Data Analyticswith Amazon Web Services Dr. Matt Wood An Online Seminar. Tuesday 16th October.
  2. 2. Hello, and thank you.
  3. 3. Big Data Analytics An introduction
  4. 4. Big Data Analytics An introduction The story of analytics on AWS
  5. 5. Big Data Analytics An introduction The story of analytics on AWS AWS Marketplace
  6. 6. Big Data Analytics An introduction The story of analytics on AWS AWS Marketplace Success story: Brightcove
  7. 7. 1INTRODUCING BIG DATA
  8. 8. Data for competitive advantage.
  9. 9. Using data Customer segmentation, financial modeling, system analysis, line-of-sight, business intelligence.
  10. 10. Generation Collection & storageAnalytics & computationCollaboration & sharing
  11. 11. Cost of data generation is falling.
  12. 12. lower cost,increased throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  13. 13. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  14. 14. Very high barrier to turning data into information.
  15. 15. Move from adata generation challenge to analytics challenge.
  16. 16. Enter the Cloud.
  17. 17. Remove the constraints.
  18. 18. Enable data-driven innovation.
  19. 19. Move to a distributed data approach.
  20. 20. Maturation of two things.
  21. 21. Software for distributed storage and analysisMaturation of two things.
  22. 22. Software for distributed storage and analysisMaturation of two things. Infrastructure for distributed storage and analysis
  23. 23. Software Frameworks for data-intensive workloads. Distributed by design.
  24. 24. Infrastructure Platform for data-intensive workloads. Distributed by design.
  25. 25. Support thedata timeline.
  26. 26. Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
  27. 27. Generation Collection & storageAnalytics & computationCollaboration & sharing
  28. 28. Lower thebarrier to entry.
  29. 29. Accelerate time to market and increase agility.
  30. 30. Enable new business opportunities.
  31. 31. Washington Post Pinterest NASA
  32. 32. “AWS enables Pfizer to exploredifficult or deep scientificquestions in a timely, scalablemanner and helps us make betterdecisions more quickly”Michael Miller, Pfizer
  33. 33. 2THE STORY OF ANALYTICS
  34. 34. EC2Utility computing. 6 years young.
  35. 35. Scale out systems Embarrassingly parallel problems. Queue based distribution. Small, medium and high scale.
  36. 36. Cost optimization. EC2Utility computing. 6 years young.
  37. 37. Achieving economies of scale100% Time
  38. 38. Achieving economies of scale100% Reserved capacity Time
  39. 39. Achieving economies of scale100% On-demand Reserved capacity Time
  40. 40. Achieving economies of scale UNUSED CAPACITY100% On-demand Reserved capacity Time
  41. 41. Spot Instances Bid on unused EC2 capacity. Very large discount. Perfect for batch runs. Balance cost and scale.
  42. 42. <$1000 per hour
  43. 43. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up.
  44. 44. Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up. Complex cluster configuration and management.
  45. 45. Amazon Elastic MapReduce Managed Hadoop clusters. Easy to provision and monitor. Write two functions. Scale up. Optimized for S3 access.
  46. 46. S3Input data
  47. 47. S3 Input dataCode Elastic MapReduce
  48. 48. S3 Input dataCode Elastic Name MapReduce node
  49. 49. S3 Input dataCode Elastic Name MapReduce node Elastic cluster
  50. 50. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
  51. 51. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  52. 52. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  53. 53. S3Input data Output S3 + SimpleDB
  54. 54. Performance
  55. 55. Performance Compute performance
  56. 56. Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings
  57. 57. Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings + GPU enabled instances
  58. 58. Performance Compute performance
  59. 59. IO performancePerformance Compute performance
  60. 60. NoSQLUnstructured data storage.
  61. 61. DynamoDB Predictable, consistent performance Unlimited storage Single digit millisecond latencies No schema for unstructured data Backed on solid state drives
  62. 62. ...and SSDs for all. New Hi1 storage instances.
  63. 63. hi1.4xlarge 2 x 1Tb SSDs 10 GigE network HVM: 90k IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write
  64. 64. “The hi1.4xlarge configuration isabout half the system cost for thesame throughput.”Netflixhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
  65. 65. Generation Collection & storageAnalytics & computationCollaboration & sharing
  66. 66. Performance + ease of use
  67. 67. 3AWS MARKETPLACE
  68. 68. Extend platform with partners
  69. 69. Innovate on behalf of customers
  70. 70. Remove undifferentiated heavy lifting
  71. 71. AWS Marketplaceaws.amazon.com/marketplace
  72. 72. Generation Collection & storageAnalytics & computationCollaboration & sharing
  73. 73. Generation Collection & storageAnalytics & computationCollaboration & sharing
  74. 74. Collection & storage Acunu Reflex Apache Cassandra NoSQL database MongoDB With and without EBS RAID storage Couchbase Community and Enterprise editions ScaleArc MySQL load balancing
  75. 75. Generation Collection & storageAnalytics & computationCollaboration & sharing
  76. 76. Generation Collection & storageAnalytics & computationCollaboration & sharing
  77. 77. Analytics & computation KarmaSphere Analytics for Amazon Elastic MapReduce MapR M5 Hadoop Distribution Metamarkets Event based data processing
  78. 78. Analytics & computation StackIQ Rocks+ HPC clusters with MPI, Grid Engine Univa Grid Engine One click cluster deployment Quantivo Data association analytics
  79. 79. Generation Collection & storageAnalytics & computationCollaboration & sharing
  80. 80. Generation Collection & storageAnalytics & computationCollaboration & sharing
  81. 81. Collaboration & sharingAspera Faspex 20 Mbps data transfer
  82. 82. 4SUCCESS STORY
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×