AWS Summit 2013 | Auckland - Big Data Analytics

  • 787 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
787
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Glenn GoreBig Data AnalyticsSr. Manager, Solutions Architects, AWS
  • 2. Overview• The Big Data Challenge• Big Data tools and what can we do with them ?• Packetloop – Big Data Security Analytics• Intel technology on big data.
  • 3. An engineer’s definitionWhen your data sets become so large that you have to startinnovating how to collect, store, organize, analyze andshare it
  • 4. GenerationCollection & storageAnalytics & computationCollaboration & sharing
  • 5. GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughput
  • 6. GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
  • 7. Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • 8. Amazon Web Services helps removeconstraints
  • 9. Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
  • 10. Elastic MapReduce and RedshiftBig Data tools
  • 11. EMR is Hadoop in the Cloud
  • 12. What is Amazon Redshift ?Amazon Redshift is a fast and powerful, fully managed,petabyte-scale data warehouse service in the AWScloudEasy to provision and scaleNo upfront costs, pay as you goHigh performance at a low priceOpen and flexible with support for popular BI tools
  • 13. Elastic MapReduce and RedshiftBig Data tools
  • 14. How does EMR work ?EMREMR ClusterS3Put the datainto S3Choose: Hadoop distribution, # ofnodes, types of nodes, customconfigs, Hive/Pig/etc.Get the output fromS3Launch the cluster using theEMR console, CLI, SDK, orAPIsYou can also storeeverything in HDFS
  • 15. What can you run on EMR…S3EMREMR Cluster
  • 16. EMREMR ClusterResize NodesS3You can easily add andremove nodes
  • 17. Resize Nodes with Spot InstancesCost without Spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $168
  • 18. Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42
  • 19. Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42= Total $12625% reduction in price50% reduction in time
  • 20. Ad-Hoc Clusters – What are they ?EMR ClusterS3When processing is complete, youcan terminate the cluster (and stoppaying)1
  • 21. Ad-Hoc Clusters – When to useEMR ClusterS3Not using HDFSNot using the cluster 24/7Transient jobs1
  • 22. EMREMR Cluster“Alive” Clusters – What are they ?S3If you run your jobs 24 x 7 , youcan also run a persistent clusterand use RI models to save costs2
  • 23. EMREMR Cluster“Alive” Clusters – When ?S3Frequently running jobsDependencies on map-reduce-mapoutputs2
  • 24. S3 instead of HDFSS3EMREMR Cluster• S3 provides 99.99999999999% ofdurability• Elastic• Version control against failure• Run multiple clusters with a singlesource of truth• Quick recovery from failure• Continuously resize clusters3
  • 25. S3 and HDFSS3EMREMR ClusterLoad data from S3 using S3DistCPBenefits of HDFSMaster copy of the data in S3Get all the benefits of S3HDFSS3distCP4
  • 26. Elastic MapReduce and RedshiftBig Data tools
  • 27. Reporting Data-warehouseRDBMSRedshiftOLTPERPReportingand BI1
  • 28. Live Archive for (Structured) Big DataDynamoDBRedshiftOLTPWeb Apps Reportingand BI2
  • 29. Cloud ETL for Big DataRedshiftReportingand BIElastic MapReduceS33
  • 30. Streaming Hive Pig DynamoDB RedshiftUnstructured Data ✓ ✓Structured Data ✓ ✓ ✓ ✓Language Support Any* HQL Pig Latin Client SQLSQL ✓SQL-Like ✓Volume Unlimited Unlimited Unlimited RelativelyLow1.6 PBLatency Medium Medium Medium Ultra Low Low
  • 31. Collection & storageAnalytics & computationCollaboration & sharingRemoveConstraintsGeneration