• Save
AWS Summit 2013 | Auckland - Big Data Analytics
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,522
On Slideshare
1,242
From Embeds
280
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 280

http://www.scoop.it 279
http://webcache.googleusercontent.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Glenn GoreBig Data AnalyticsSr. Manager, Solutions Architects, AWS
  • 2. Overview• The Big Data Challenge• Big Data tools and what can we do with them ?• Packetloop – Big Data Security Analytics• Intel technology on big data.
  • 3. An engineer’s definitionWhen your data sets become so large that you have to startinnovating how to collect, store, organize, analyze andshare it
  • 4. GenerationCollection & storageAnalytics & computationCollaboration & sharing
  • 5. GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughput
  • 6. GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
  • 7. Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • 8. Amazon Web Services helps removeconstraints
  • 9. Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
  • 10. Elastic MapReduce and RedshiftBig Data tools
  • 11. EMR is Hadoop in the Cloud
  • 12. What is Amazon Redshift ?Amazon Redshift is a fast and powerful, fully managed,petabyte-scale data warehouse service in the AWScloudEasy to provision and scaleNo upfront costs, pay as you goHigh performance at a low priceOpen and flexible with support for popular BI tools
  • 13. Elastic MapReduce and RedshiftBig Data tools
  • 14. How does EMR work ?EMREMR ClusterS3Put the datainto S3Choose: Hadoop distribution, # ofnodes, types of nodes, customconfigs, Hive/Pig/etc.Get the output fromS3Launch the cluster using theEMR console, CLI, SDK, orAPIsYou can also storeeverything in HDFS
  • 15. What can you run on EMR…S3EMREMR Cluster
  • 16. EMREMR ClusterResize NodesS3You can easily add andremove nodes
  • 17. Resize Nodes with Spot InstancesCost without Spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $168
  • 18. Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42
  • 19. Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42= Total $12625% reduction in price50% reduction in time
  • 20. Ad-Hoc Clusters – What are they ?EMR ClusterS3When processing is complete, youcan terminate the cluster (and stoppaying)1
  • 21. Ad-Hoc Clusters – When to useEMR ClusterS3Not using HDFSNot using the cluster 24/7Transient jobs1
  • 22. EMREMR Cluster“Alive” Clusters – What are they ?S3If you run your jobs 24 x 7 , youcan also run a persistent clusterand use RI models to save costs2
  • 23. EMREMR Cluster“Alive” Clusters – When ?S3Frequently running jobsDependencies on map-reduce-mapoutputs2
  • 24. S3 instead of HDFSS3EMREMR Cluster• S3 provides 99.99999999999% ofdurability• Elastic• Version control against failure• Run multiple clusters with a singlesource of truth• Quick recovery from failure• Continuously resize clusters3
  • 25. S3 and HDFSS3EMREMR ClusterLoad data from S3 using S3DistCPBenefits of HDFSMaster copy of the data in S3Get all the benefits of S3HDFSS3distCP4
  • 26. Elastic MapReduce and RedshiftBig Data tools
  • 27. Reporting Data-warehouseRDBMSRedshiftOLTPERPReportingand BI1
  • 28. Live Archive for (Structured) Big DataDynamoDBRedshiftOLTPWeb Apps Reportingand BI2
  • 29. Cloud ETL for Big DataRedshiftReportingand BIElastic MapReduceS33
  • 30. Streaming Hive Pig DynamoDB RedshiftUnstructured Data ✓ ✓Structured Data ✓ ✓ ✓ ✓Language Support Any* HQL Pig Latin Client SQLSQL ✓SQL-Like ✓Volume Unlimited Unlimited Unlimited RelativelyLow1.6 PBLatency Medium Medium Medium Ultra Low Low
  • 31. Collection & storageAnalytics & computationCollaboration & sharingRemoveConstraintsGeneration