AWS Big Data Analytics IP Expo 2013

760 views

Published on

Many companies recognize the use of data analytics as an opportunity to better understand their customers and gain a lead on their competition. The ability to get better insight from vast amounts of unstructured data, coming from a multitude of sources, can give businesses the advantage in an industry where even the smallest improvement can mean a big difference.

Amazon Web Services offers a range of big data, analytics and storage solutions that are used by companies such as NASDAQ, Bankinter and S&P Capital to deliver a highly secure and agile platform. Join this session and learn how it allows customers to start on a small scale but grow as their business requires, giving them the agility they need to deliver cutting edge solutions to their customers without any upfront CAPEX investment.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
760
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

AWS Big Data Analytics IP Expo 2013

  1. 1. Big Data Analytics David de Santiago Business Development Manager, Analytics EMEA
  2. 2. Overview 1. Introducing Big Data 2. From data to actionable information 3. Analytics and Cloud Computing
  3. 3. 1 Introducing Big Data
  4. 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  5. 5. The cost of data generation is falling
  6. 6. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  7. 7. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing Highly constrained
  8. 8. Data volume Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  9. 9. Elastic and highly scalable + No upfront capital expense + Only pay for what you use + Available on-demand = Remove constraints
  10. 10. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing Highly constrained
  11. 11. Generation Collection & storage Accelerated Analytics & computation Collaboration & sharing
  12. 12. Big Data Technologies and techniques for working productively with data, at any scale.
  13. 13. 2 From data to actionable information
  14. 14. Per day: 3.5 billion records 13 TB of click stream logs 71 million unique cookies
  15. 15. User bought recently a home theatre system Targeted Ad And is now looking at sport games
  16. 16. Results: 500% return on ad spend 17,000% reduction in procurement time “We couldn’t have done it”
  17. 17. Finding signal in the noise of logs Identified early mobile usage Invested heavily in mobile development
  18. 18. In January 2013 9,432,061 unique mobile devices used the Yelp mobile app. Other Features powered by EMR: People Who Viewed this Also Viewed Review highlights Auto complete as you type on search Search spelling suggestions Top searches Ads
  19. 19. Open web index. 3.4 billion records. Available to all.
  20. 20. Tweeting about Flu You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011
  21. 21. Full parse for impact of social networks 300 lines of Ruby code. 14 hours. $100.
  22. 22. 3 Analytics and Cloud Computing
  23. 23. Generation Collection & storage Analytics & computation Collaboration & sharing
  24. 24. Generation Collection & storage Analytics & computation Collaboration & sharing S3, Glacier, Storage Gateway, DynamoDB, Redshift, RDS, HBase
  25. 25. Generation Collection & storage Analytics & computation Collaboration & sharing EC2 & Elastic MapReduce
  26. 26. Generation Collection & storage Analytics & computation Collaboration & sharing EC2 & S3, CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  27. 27. Amazon Redshift Fully Managed Data Warehouse Scales to 1.6PB Faster, Simpler, Cheaper
  28. 28. Amazon Redshift Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.425 $ 3,723 1 Year Reservation $ 0.250 $ 2,190 3 Year Reservation $ 0.114 $ 999
  29. 29. “TOWARDS THE END OF LAST YEAR OUR DATA VOLUMES LITERALLY BROKE THE EXISTING DATABASE. WE WERE NO LONG ABLE TO SCALE THE DATABASE OR DO ANYTHING USEFUL; LIKE RUNNING QUERIES” “Two months to migrate to Amazon Redshift.” Greg Johnson, Head of Analytics, Nokia
  30. 30. Elastic Map Reduce: How does it work? 1. Put the data into S3 (or HDFS) S3 EMR Cluster EMR 3. Get the results 2. Launch your cluster. Choose: • Hadoop distribution • How many nodes • Node type (hi-CPU, hi-memory, etc.) • Hadoop apps (Hive, Pig, HBase)
  31. 31. Elastic Map Reduce: How does it work? EMR Cluster S3 EMR You can easily resize the cluster
  32. 32. Elastic Map Reduce: How does it work? EMR Cluster S3 EMR Use Spot nodes to save time and money
  33. 33. Elastic Map Reduce: How does it work? EMR Cluster S3 EMR Launch parallel clusters against the same data source (tune for the workload)
  34. 34. Elastic Map Reduce: How does it work? S3 EMR Cluster When the work is complete, you can terminate the cluster (and stop paying)
  35. 35. Thousands of Customers, 5+ Million Clusters
  36. 36. Give it a try. Cost to run a 100-node EMR cluster: £4.90 / hour
  37. 37. Generation Collection & storage AWS Data Pipeline Analytics & computation Collaboration & sharing S3, Glacier, Storage Gateway, DynamoDB, Redshift, RDS, HBase EC2 & Elastic MapReduce EC2 & S3, CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  38. 38. AWS Data Pipeline Data-intensive orchestration and automation Reliable and scheduled Easy to use, drag and drop Execution and retry logic Map data dependencies Create and manage temporary compute resources
  39. 39. Anatomy of a pipeline
  40. 40. Arbitrarily complex pipelines
  41. 41. Thanks. daviddes@amazon.co.uk To Learn More: aws.amazon.com/elasticmapreduce aws.amazon.com/datapipeline aws.amazon.com/big-data aws.amazon.com/redshift aws.amazon.com/rds
  42. 42. Thank you!

×