Your SlideShare is downloading. ×
0
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Big Data in the Cloud
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data in the Cloud

219

Published on

AWS Summit 2014 Melbourne - Breakout 3 …

AWS Summit 2014 Melbourne - Breakout 3

Most organisations are facing ever growing volumes of data that need to be stored and processed but most importantly analysed to bring value to the business. Big Data appears to have solutions to address these challenges but the landscape is littered with acronyms and obscure naming conventions such as MPP, NoSQL, Hadoop, Hive and HBase. Attend this Session to find out

- What is the value proposition for each of these technologies
- How do they fit with more traditional Big Data solutions such as data warehouses?
- How AWS can help organisations get maximum value from their data

Presenter: Russell Nash, Solutions Architect, APAC, Amazon Web Services

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
219
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Big Data in the Cloud Russell Nash Solutions Architect, Amazon Web Services, APAC © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. Big picture slide
  3. Hadoop MPP NoSQL STREAMING
  4. Structure High Low Large Size Small Traditional Database Hadoop NoSQL MPP DW
  5. Hadoop MPP NoSQL Structure Latency Interfaces
  6. Background 2004 – Map Reduce 2006 – Hadoop
  7. Input File Hadoop cluster Func;ons 1. Very Flexible 2. Very Scalable 3. Often Transient Output
  8. Input file map reduce Output file
  9. Input file map reduce Output file Input file map reduce Output file Input file map reduce Output file
  10. Big Data Verticals and Use cases Media/ Advertising Targeted Advertising Image and Video Processing Oil & Gas Seismic Analysis Retail Recommendations Transactions Analysis Life Sciences Genome Analysis Financial Services Monte Carlo Simulations Risk Analysis Security Anti-virus Fraud Detection Image Recognition Social Network/ Gaming User Demographics Usage analysis In-game metrics
  11. Deployment Options On-premise Cloud Managed on Cloud
  12. Elas;c MapReduce Manageability Scalability Cost
  13. 400 GB of logs per day ~12 Terabytes per month
  14. 1) Load log file data for six months of user search history into Amazon S3 Amazon S3 Search ID Search Text Final Selection 12423451 westen Westin 14235235 wisten Westin 54332232 westenn Westin 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451
  15. Amazon S3 Amazon EMR Log Files 2) Spin up a 200 node cluster Hadoop Cluster
  16. 3) 200 nodes simultaneously analyze this data looking for common misspellings … this takes a few hours Hadoop Cluster Amazon S3 Amazon EMR
  17. Amazon S3 Amazon EMR 4) New common misspellings and suggestions loaded back into S3 Hadoop Cluster Log Files
  18. Amazon S3 Amazon EMR 5) When the job is done, the cluster is shut down. Log Files
  19. The Hadoop Ecosystem
  20. Trends SQL on Hadoop Spark
  21. Hadoop MPP NoSQL Structure Latency Interfaces Any Mins-Hours Programming SQL-Like Tools
  22. Background SQL Databases for analytical workloads Performance Scalability Ease of Use Cost
  23. Leader Node Compute Node Compute Node Compute Node BI Tools 1. SQL 2. High Performance 3. Broad Toolset
  24. Deployment Options On-premise Cloud Managed on Cloud
  25. Amazon RedshiA Manageability Scalability Cost
  26. Performance Evaluation on 2B Rows Aggregate by month Traditional SQL Database 02:08:35 00:35:46 00:00:12
  27. Hadoop MPP NoSQL Structure Latency Interfaces Any Full Mins-Hours Seconds-Minutes Programming SQL-Like Tools SQL BI Tools
  28. Background Databases for webscale transactions Performance Flexibility
  29. ID Age State 123 20 CA 345 25 WA 678 40 FL Relational Table ID Attributes 123 Age:20, State:CA 345 Age:25, Country: Australia, Gender: F, Smoker: No 678 Age:40 Non-Relational Table
  30. Deployment Options On-premise Cloud Managed on Cloud
  31. DynamoDB Manageability Scalability Cost
  32. digital advertising real-time bidding
  33. Hadoop MPP NoSQL Structure Latency Interfaces Any Full Semi Mins-Hours Seconds-Minutes Sub-second Programming SQL-Like Tools SQL Programming Tools
  34. Streaming Analy;cs
  35. Data Sources App.4 [Machine Learning] AWS Endpoint App.1 [Aggregate & De-­‐Duplicate] Data Sources Data Sources Data Sources App.2 [Metric ExtracIon] S3 DynamoDB Redshift App.3 [Sliding Window Analysis] Data Sources Availability Zone Availability Zone Shard 1 Shard 2 Shard N Availability Zone Amazon Kinesis EMR
  36. • Sensor networks analytics • Ad network analytics • Log centralization • Click stream analysis • Hardware and software appliance metrics • …more…
  37. Amazon Mobile Analytics Fast: get your data within an hour Automatic MAU, DAU, session and retention reports Design and track custom app events Data is not mined or sold by Amazon
  38. Expand your skills with AWS Certification Exams Validate your proven technical expertise with the AWS platform aws.amazon.com/certification On-Demand Resources Videos & Labs Get hands-on practice working with AWS technologies in a live environment aws.amazon.com/training/ self-paced-labs Instructor-Led Courses Training Classes Expand your technical expertise to design, deploy, and operate scalable, efficient applications on AWS aws.amazon.com/training
  39. Big Data Tutorials aws.amazon.com/big-data Redshift Free Trial aws.amazon.com/redshift/free-trial
  40. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

×