• Share
  • Email
  • Embed
  • Like
  • Private Content
AWS Summit Berlin 2013 - Big Data Analytics
 

AWS Summit Berlin 2013 - Big Data Analytics

on

  • 583 views

Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to ...

Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.

Statistics

Views

Total Views
583
Views on SlideShare
583
Embed Views
0

Actions

Likes
0
Downloads
24
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AWS Summit Berlin 2013 - Big Data Analytics AWS Summit Berlin 2013 - Big Data Analytics Presentation Transcript

    • Big Data AnalyticsConstantin GonzalezSolutions Architect, Amazon Web ServicesBerlin
    • 1. Introducing Big Data2. From data to actionable information3. Analytics and Cloud ComputingOverview
    • Introducing Big Data1
    • GenerationCollection & storageAnalytics & computationCollaboration & sharing
    • The cost of data generationis falling
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughput
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
    • Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
    • Elastic and highly scalableNo upfront capital expenseOnly pay for what you use++Available on-demand+=Removeconstraints
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingAccelerated
    • Technologies and techniques forworking productively with data,at any scale.Big Data
    • From data toactionable information2
    • “Who buys video games?”
    • 3.5 billion records13 TB of click stream logs71 million unique cookiesPer day:
    • 500% return on ad spendFrom 2 months procurement timeto a few minutesResults:
    • “Who is using our service?”
    • Identified early mobile usageInvested heavily in mobile developmentFinding signal in the noise of logs
    • 9,432,061 unique mobile devicesused the Yelp mobile app.4 million+ calls. 5 million+ directions.In January 2013
    • Speaking of mobile devicesand social networks…
    • You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011Tweets about the Flu
    • Analytics andCloud Computing3
    • GenerationCollection & storageAnalytics & computationCollaboration & sharing
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingS3, Glacier,Storage Gateway,DynamoDB,Redshift, RDS,HBase
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2 &Elastic MapReduce
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2, S3, RDSCloudFormation,Elastic MapReduce,DynamoDB, Redshift
    • GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2 &Elastic MapReduceS3, Glacier,Storage Gateway,DynamoDB,Redshift, RDS,HBaseAWS Data PipelineEC2, S3, RDSCloudFormation,Elastic MapReduce,DynamoDB, Redshift
    • Elastic MapReduce
    • How does it work?EMREMR ClusterS31. Put the datainto S3 (orHDFS)1. Put the datainto S3 (orHDFS)3. Get theresults3. Get theresults2. Launch your cluster.Choose:•Hadoop distribution•How many nodes•Node type (hi-CPU, hi-memory, etc.)•Hadoop apps (Hive,Pig, HBase)2. Launch your cluster.Choose:•Hadoop distribution•How many nodes•Node type (hi-CPU, hi-memory, etc.)•Hadoop apps (Hive,Pig, HBase)
    • EMREMR ClusterHow does it work?S3You caneasily resizethe clusterYou caneasily resizethe cluster
    • EMREMR ClusterHow does it work?S3Use Spotnodes tosave timeand moneyUse Spotnodes tosave timeand money
    • EMREMR ClusterHow does it work?S3Launch parallel clustersagainst the same datasource (tune for theworkload)Launch parallel clustersagainst the same datasource (tune for theworkload)
    • How does it work?EMR ClusterS3When the work is complete,you can terminate thecluster (and stop paying)When the work is complete,you can terminate thecluster (and stop paying)
    • EMR ClusterHow does it work?You can storeeverything in HDFS(local disk)You can storeeverything in HDFS(local disk)High Storage nodes= 48 TB/nodeHigh Storage nodes= 48 TB/node
    • EMR ClusterHow does it work?Launch in a VirtualPrivate Cloud forextra securityLaunch in a VirtualPrivate Cloud forextra security
    • Thousands of Customers, 5+ Million Clusters
    • Give it a try:aws.amazon.com/elasticmapreduceCost to run a 100-node EMR cluster:EUR 5.75/hour($7.50/h)
    • Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/+
    • AWS Data PipelineData-intensive orchestration and automationReliable and scheduledEasy to use, drag and dropExecution and retry logicMap data dependenciesCreate and manage temporary computeresources
    • Anatomy of a pipeline
    • Additional checks and notifications
    • Arbitrarily complex pipelines
    • Thanks.glez@amazon.de@zalezLearn More: aws.amazon.com/big-data
    • Alan PriestleyStrategic Marketing DirectorIntel Corporation
    • Analysis of Data Can Transform SocietyCreate new businessmodels and improveorganizationalprocesses.Enhance scientificunderstanding, driveinnovation, andaccelerate medicalcures.Increase public safetyand improveenergy efficiency withsmart grids.
    • Democratizing Analytics gets Value out of Big DataUnlock Value inSiliconSupport OpenPlatformsDeliver Software Value
    • Intel at the Intersection of Big DataEnabling exascalecomputing onmassive data setsHelpingenterprises buildopeninteroperablecloudsContributing codeand fosteringecosystemHPC Cloud OpenSource
    • Intel at the Heart of the CloudServerStorageNetwork
    • Scale-Out Platform Optimizations for Big DataCost-effective performance•Intel® Advanced Vector ExtensionTechnology•Intel® Turbo Boost Technology 2.0•Intel® Advanced Encryption StandardNew Instructions Technology
    • 52Intel® Advanced Vector Extensions Technology• Newest in a long line ofprocessor instructioninnovations• Increases floatingpoint operations perclock up to 2X1performance1 : Performance comparison using Linpack benchmark. See backup for configuration details.For more legal information on performance forecasts go to http://www.intel.com/performanceSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, aremeasured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult otherinformation and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
    • Intel® Turbo Boost Technology 2.0MorePerformanceHigher turbo speedsmaximize performance forsingle andmulti-threaded applications
    • Intel® Advanced EncryptionStandard New Instructions•Processor assistance forperforming AES encryption7 new instructions•Makes enabled encryptionsoftware faster and stronger
    • Power of the Platform built by IntelRicheruserexperiences4HRS50%Reduction10MIN80%Reduction 50%Reduction 40%ReductionTeraSortfor 1TBsortIntel®Xeon®Processor E52600Solid-StateDrive 10GEthernet Intel®ApacheHadoopPreviousIntel®Xeon®Processor
    • CloudIntelligentSystemsClientsVirtuous Cycle of Data-Driven Experience
    • Get 600 Hours of free supercomputing time!www.powerof60.com
    • Thank you!