• Like
  • Save
AWS Summit 2013 | Auckland - Big Data Analytics
Upcoming SlideShare
Loading in...5
×
 

AWS Summit 2013 | Auckland - Big Data Analytics

on

  • 1,389 views

 

Statistics

Views

Total Views
1,389
Views on SlideShare
1,109
Embed Views
280

Actions

Likes
2
Downloads
0
Comments
0

2 Embeds 280

http://www.scoop.it 279
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AWS Summit 2013 | Auckland - Big Data Analytics AWS Summit 2013 | Auckland - Big Data Analytics Presentation Transcript

    • Glenn GoreBig Data AnalyticsSr. Manager, Solutions Architects, AWS
    • Overview• The Big Data Challenge• Big Data tools and what can we do with them ?• Packetloop – Big Data Security Analytics• Intel technology on big data.
    • An engineer’s definitionWhen your data sets become so large that you have to startinnovating how to collect, store, organize, analyze andshare it
    • GenerationCollection & storageAnalytics & computationCollaboration & sharing
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughput
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
    • Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
    • Amazon Web Services helps removeconstraints
    • Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
    • Elastic MapReduce and RedshiftBig Data tools
    • EMR is Hadoop in the Cloud
    • What is Amazon Redshift ?Amazon Redshift is a fast and powerful, fully managed,petabyte-scale data warehouse service in the AWScloudEasy to provision and scaleNo upfront costs, pay as you goHigh performance at a low priceOpen and flexible with support for popular BI tools
    • Elastic MapReduce and RedshiftBig Data tools
    • How does EMR work ?EMREMR ClusterS3Put the datainto S3Choose: Hadoop distribution, # ofnodes, types of nodes, customconfigs, Hive/Pig/etc.Get the output fromS3Launch the cluster using theEMR console, CLI, SDK, orAPIsYou can also storeeverything in HDFS
    • What can you run on EMR…S3EMREMR Cluster
    • EMREMR ClusterResize NodesS3You can easily add andremove nodes
    • Resize Nodes with Spot InstancesCost without Spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $168
    • Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42
    • Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42= Total $12625% reduction in price50% reduction in time
    • Ad-Hoc Clusters – What are they ?EMR ClusterS3When processing is complete, youcan terminate the cluster (and stoppaying)1
    • Ad-Hoc Clusters – When to useEMR ClusterS3Not using HDFSNot using the cluster 24/7Transient jobs1
    • EMREMR Cluster“Alive” Clusters – What are they ?S3If you run your jobs 24 x 7 , youcan also run a persistent clusterand use RI models to save costs2
    • EMREMR Cluster“Alive” Clusters – When ?S3Frequently running jobsDependencies on map-reduce-mapoutputs2
    • S3 instead of HDFSS3EMREMR Cluster• S3 provides 99.99999999999% ofdurability• Elastic• Version control against failure• Run multiple clusters with a singlesource of truth• Quick recovery from failure• Continuously resize clusters3
    • S3 and HDFSS3EMREMR ClusterLoad data from S3 using S3DistCPBenefits of HDFSMaster copy of the data in S3Get all the benefits of S3HDFSS3distCP4
    • Elastic MapReduce and RedshiftBig Data tools
    • Reporting Data-warehouseRDBMSRedshiftOLTPERPReportingand BI1
    • Live Archive for (Structured) Big DataDynamoDBRedshiftOLTPWeb Apps Reportingand BI2
    • Cloud ETL for Big DataRedshiftReportingand BIElastic MapReduceS33
    • Streaming Hive Pig DynamoDB RedshiftUnstructured Data ✓ ✓Structured Data ✓ ✓ ✓ ✓Language Support Any* HQL Pig Latin Client SQLSQL ✓SQL-Like ✓Volume Unlimited Unlimited Unlimited RelativelyLow1.6 PBLatency Medium Medium Medium Ultra Low Low
    • Collection & storageAnalytics & computationCollaboration & sharingRemoveConstraintsGeneration