Big Data AnalyticsConstantin GonzalezSolutions Architect, Amazon Web ServicesBerlin
1. Introducing Big Data2. From data to actionable information3. Analytics and Cloud ComputingOverview
Introducing Big Data1
GenerationCollection & storageAnalytics & computationCollaboration & sharing
The cost of data generationis falling
GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughput
GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center...
Elastic and highly scalableNo upfront capital expenseOnly pay for what you use++Available on-demand+=Removeconstraints
GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
GenerationCollection & storageAnalytics & computationCollaboration & sharingAccelerated
Technologies and techniques forworking productively with data,at any scale.Big Data
From data toactionable information2
“Who buys video games?”
3.5 billion records13 TB of click stream logs71 million unique cookiesPer day:
500% return on ad spendFrom 2 months procurement timeto a few minutesResults:
“Who is using our service?”
Identified early mobile usageInvested heavily in mobile developmentFinding signal in the noise of logs
9,432,061 unique mobile devicesused the Yelp mobile app.4 million+ calls. 5 million+ directions.In January 2013
Speaking of mobile devicesand social networks…
You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011Tweets about the Flu
Analytics andCloud Computing3
GenerationCollection & storageAnalytics & computationCollaboration & sharing
GenerationCollection & storageAnalytics & computationCollaboration & sharingS3, Glacier,Storage Gateway,DynamoDB,Redshift,...
GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2 &Elastic MapReduce
GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2, S3, RDSCloudFormation,Elastic MapReduce,D...
GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2 &Elastic MapReduceS3, Glacier,Storage Gate...
Elastic MapReduce
How does it work?EMREMR ClusterS31. Put the datainto S3 (orHDFS)1. Put the datainto S3 (orHDFS)3. Get theresults3. Get the...
EMREMR ClusterHow does it work?S3You caneasily resizethe clusterYou caneasily resizethe cluster
EMREMR ClusterHow does it work?S3Use Spotnodes tosave timeand moneyUse Spotnodes tosave timeand money
EMREMR ClusterHow does it work?S3Launch parallel clustersagainst the same datasource (tune for theworkload)Launch parallel...
How does it work?EMR ClusterS3When the work is complete,you can terminate thecluster (and stop paying)When the work is com...
EMR ClusterHow does it work?You can storeeverything in HDFS(local disk)You can storeeverything in HDFS(local disk)High Sto...
EMR ClusterHow does it work?Launch in a VirtualPrivate Cloud forextra securityLaunch in a VirtualPrivate Cloud forextra se...
Thousands of Customers, 5+ Million Clusters
Give it a try:aws.amazon.com/elasticmapreduceCost to run a 100-node EMR cluster:EUR 5.75/hour($7.50/h)
Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/Calgary Reviews https://www.flic...
AWS Data PipelineData-intensive orchestration and automationReliable and scheduledEasy to use, drag and dropExecution and ...
Anatomy of a pipeline
Additional checks and notifications
Arbitrarily complex pipelines
Thanks.glez@amazon.de@zalezLearn More: aws.amazon.com/big-data
Alan PriestleyStrategic Marketing DirectorIntel Corporation
Analysis of Data Can Transform SocietyCreate new businessmodels and improveorganizationalprocesses.Enhance scientificunder...
Democratizing Analytics gets Value out of Big DataUnlock Value inSiliconSupport OpenPlatformsDeliver Software Value
Intel at the Intersection of Big DataEnabling exascalecomputing onmassive data setsHelpingenterprises buildopeninteroperab...
Intel at the Heart of the CloudServerStorageNetwork
Scale-Out Platform Optimizations for Big DataCost-effective performance•Intel® Advanced Vector ExtensionTechnology•Intel® ...
52Intel® Advanced Vector Extensions Technology• Newest in a long line ofprocessor instructioninnovations• Increases floati...
Intel® Turbo Boost Technology 2.0MorePerformanceHigher turbo speedsmaximize performance forsingle andmulti-threaded applic...
Intel® Advanced EncryptionStandard New Instructions•Processor assistance forperforming AES encryption7 new instructions•Ma...
Power of the Platform built by IntelRicheruserexperiences4HRS50%Reduction10MIN80%Reduction 50%Reduction 40%ReductionTeraSo...
CloudIntelligentSystemsClientsVirtuous Cycle of Data-Driven Experience
Get 600 Hours of free supercomputing time!www.powerof60.com
Thank you!
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
Upcoming SlideShare
Loading in...5
×

AWS Summit Berlin 2013 - Big Data Analytics

443

Published on

Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
443
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "AWS Summit Berlin 2013 - Big Data Analytics"

  1. 1. Big Data AnalyticsConstantin GonzalezSolutions Architect, Amazon Web ServicesBerlin
  2. 2. 1. Introducing Big Data2. From data to actionable information3. Analytics and Cloud ComputingOverview
  3. 3. Introducing Big Data1
  4. 4. GenerationCollection & storageAnalytics & computationCollaboration & sharing
  5. 5. The cost of data generationis falling
  6. 6. GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughput
  7. 7. GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
  8. 8. Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  9. 9. Elastic and highly scalableNo upfront capital expenseOnly pay for what you use++Available on-demand+=Removeconstraints
  10. 10. GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
  11. 11. GenerationCollection & storageAnalytics & computationCollaboration & sharingAccelerated
  12. 12. Technologies and techniques forworking productively with data,at any scale.Big Data
  13. 13. From data toactionable information2
  14. 14. “Who buys video games?”
  15. 15. 3.5 billion records13 TB of click stream logs71 million unique cookiesPer day:
  16. 16. 500% return on ad spendFrom 2 months procurement timeto a few minutesResults:
  17. 17. “Who is using our service?”
  18. 18. Identified early mobile usageInvested heavily in mobile developmentFinding signal in the noise of logs
  19. 19. 9,432,061 unique mobile devicesused the Yelp mobile app.4 million+ calls. 5 million+ directions.In January 2013
  20. 20. Speaking of mobile devicesand social networks…
  21. 21. You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011Tweets about the Flu
  22. 22. Analytics andCloud Computing3
  23. 23. GenerationCollection & storageAnalytics & computationCollaboration & sharing
  24. 24. GenerationCollection & storageAnalytics & computationCollaboration & sharingS3, Glacier,Storage Gateway,DynamoDB,Redshift, RDS,HBase
  25. 25. GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2 &Elastic MapReduce
  26. 26. GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2, S3, RDSCloudFormation,Elastic MapReduce,DynamoDB, Redshift
  27. 27. GenerationCollection & storageAnalytics & computationCollaboration & sharingEC2 &Elastic MapReduceS3, Glacier,Storage Gateway,DynamoDB,Redshift, RDS,HBaseAWS Data PipelineEC2, S3, RDSCloudFormation,Elastic MapReduce,DynamoDB, Redshift
  28. 28. Elastic MapReduce
  29. 29. How does it work?EMREMR ClusterS31. Put the datainto S3 (orHDFS)1. Put the datainto S3 (orHDFS)3. Get theresults3. Get theresults2. Launch your cluster.Choose:•Hadoop distribution•How many nodes•Node type (hi-CPU, hi-memory, etc.)•Hadoop apps (Hive,Pig, HBase)2. Launch your cluster.Choose:•Hadoop distribution•How many nodes•Node type (hi-CPU, hi-memory, etc.)•Hadoop apps (Hive,Pig, HBase)
  30. 30. EMREMR ClusterHow does it work?S3You caneasily resizethe clusterYou caneasily resizethe cluster
  31. 31. EMREMR ClusterHow does it work?S3Use Spotnodes tosave timeand moneyUse Spotnodes tosave timeand money
  32. 32. EMREMR ClusterHow does it work?S3Launch parallel clustersagainst the same datasource (tune for theworkload)Launch parallel clustersagainst the same datasource (tune for theworkload)
  33. 33. How does it work?EMR ClusterS3When the work is complete,you can terminate thecluster (and stop paying)When the work is complete,you can terminate thecluster (and stop paying)
  34. 34. EMR ClusterHow does it work?You can storeeverything in HDFS(local disk)You can storeeverything in HDFS(local disk)High Storage nodes= 48 TB/nodeHigh Storage nodes= 48 TB/node
  35. 35. EMR ClusterHow does it work?Launch in a VirtualPrivate Cloud forextra securityLaunch in a VirtualPrivate Cloud forextra security
  36. 36. Thousands of Customers, 5+ Million Clusters
  37. 37. Give it a try:aws.amazon.com/elasticmapreduceCost to run a 100-node EMR cluster:EUR 5.75/hour($7.50/h)
  38. 38. Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/+
  39. 39. AWS Data PipelineData-intensive orchestration and automationReliable and scheduledEasy to use, drag and dropExecution and retry logicMap data dependenciesCreate and manage temporary computeresources
  40. 40. Anatomy of a pipeline
  41. 41. Additional checks and notifications
  42. 42. Arbitrarily complex pipelines
  43. 43. Thanks.glez@amazon.de@zalezLearn More: aws.amazon.com/big-data
  44. 44. Alan PriestleyStrategic Marketing DirectorIntel Corporation
  45. 45. Analysis of Data Can Transform SocietyCreate new businessmodels and improveorganizationalprocesses.Enhance scientificunderstanding, driveinnovation, andaccelerate medicalcures.Increase public safetyand improveenergy efficiency withsmart grids.
  46. 46. Democratizing Analytics gets Value out of Big DataUnlock Value inSiliconSupport OpenPlatformsDeliver Software Value
  47. 47. Intel at the Intersection of Big DataEnabling exascalecomputing onmassive data setsHelpingenterprises buildopeninteroperablecloudsContributing codeand fosteringecosystemHPC Cloud OpenSource
  48. 48. Intel at the Heart of the CloudServerStorageNetwork
  49. 49. Scale-Out Platform Optimizations for Big DataCost-effective performance•Intel® Advanced Vector ExtensionTechnology•Intel® Turbo Boost Technology 2.0•Intel® Advanced Encryption StandardNew Instructions Technology
  50. 50. 52Intel® Advanced Vector Extensions Technology• Newest in a long line ofprocessor instructioninnovations• Increases floatingpoint operations perclock up to 2X1performance1 : Performance comparison using Linpack benchmark. See backup for configuration details.For more legal information on performance forecasts go to http://www.intel.com/performanceSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, aremeasured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult otherinformation and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
  51. 51. Intel® Turbo Boost Technology 2.0MorePerformanceHigher turbo speedsmaximize performance forsingle andmulti-threaded applications
  52. 52. Intel® Advanced EncryptionStandard New Instructions•Processor assistance forperforming AES encryption7 new instructions•Makes enabled encryptionsoftware faster and stronger
  53. 53. Power of the Platform built by IntelRicheruserexperiences4HRS50%Reduction10MIN80%Reduction 50%Reduction 40%ReductionTeraSortfor 1TBsortIntel®Xeon®Processor E52600Solid-StateDrive 10GEthernet Intel®ApacheHadoopPreviousIntel®Xeon®Processor
  54. 54. CloudIntelligentSystemsClientsVirtuous Cycle of Data-Driven Experience
  55. 55. Get 600 Hours of free supercomputing time!www.powerof60.com
  56. 56. Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×