• Save
Big Data in the Cloud
Upcoming SlideShare
Loading in...5
×
 

Big Data in the Cloud

on

  • 181 views

AWS Summit 2014 Melbourne - Breakout 3 ...

AWS Summit 2014 Melbourne - Breakout 3

Most organisations are facing ever growing volumes of data that need to be stored and processed but most importantly analysed to bring value to the business. Big Data appears to have solutions to address these challenges but the landscape is littered with acronyms and obscure naming conventions such as MPP, NoSQL, Hadoop, Hive and HBase. Attend this Session to find out

- What is the value proposition for each of these technologies
- How do they fit with more traditional Big Data solutions such as data warehouses?
- How AWS can help organisations get maximum value from their data

Presenter: Russell Nash, Solutions Architect, APAC, Amazon Web Services

Statistics

Views

Total Views
181
Views on SlideShare
180
Embed Views
1

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 1

https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data in the Cloud Big Data in the Cloud Presentation Transcript

  • Big Data in the Cloud Russell Nash Solutions Architect, Amazon Web Services, APAC © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • Big picture slide
  • Hadoop MPP NoSQL STREAMING
  • Structure High Low Large Size Small Traditional Database Hadoop NoSQL MPP DW
  • Hadoop MPP NoSQL Structure Latency Interfaces
  • Background 2004 – Map Reduce 2006 – Hadoop
  • Input File Hadoop cluster Func;ons 1. Very Flexible 2. Very Scalable 3. Often Transient Output
  • Input file map reduce Output file
  • Input file map reduce Output file Input file map reduce Output file Input file map reduce Output file
  • Big Data Verticals and Use cases Media/ Advertising Targeted Advertising Image and Video Processing Oil & Gas Seismic Analysis Retail Recommendations Transactions Analysis Life Sciences Genome Analysis Financial Services Monte Carlo Simulations Risk Analysis Security Anti-virus Fraud Detection Image Recognition Social Network/ Gaming User Demographics Usage analysis In-game metrics
  • Deployment Options On-premise Cloud Managed on Cloud
  • Elas;c MapReduce Manageability Scalability Cost
  • 400 GB of logs per day ~12 Terabytes per month
  • 1) Load log file data for six months of user search history into Amazon S3 Amazon S3 Search ID Search Text Final Selection 12423451 westen Westin 14235235 wisten Westin 54332232 westenn Westin 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451
  • Amazon S3 Amazon EMR Log Files 2) Spin up a 200 node cluster Hadoop Cluster
  • 3) 200 nodes simultaneously analyze this data looking for common misspellings … this takes a few hours Hadoop Cluster Amazon S3 Amazon EMR
  • Amazon S3 Amazon EMR 4) New common misspellings and suggestions loaded back into S3 Hadoop Cluster Log Files
  • Amazon S3 Amazon EMR 5) When the job is done, the cluster is shut down. Log Files
  • The Hadoop Ecosystem
  • Trends SQL on Hadoop Spark
  • Hadoop MPP NoSQL Structure Latency Interfaces Any Mins-Hours Programming SQL-Like Tools
  • Background SQL Databases for analytical workloads Performance Scalability Ease of Use Cost
  • Leader Node Compute Node Compute Node Compute Node BI Tools 1. SQL 2. High Performance 3. Broad Toolset
  • Deployment Options On-premise Cloud Managed on Cloud
  • Amazon RedshiA Manageability Scalability Cost
  • Performance Evaluation on 2B Rows Aggregate by month Traditional SQL Database 02:08:35 00:35:46 00:00:12
  • Hadoop MPP NoSQL Structure Latency Interfaces Any Full Mins-Hours Seconds-Minutes Programming SQL-Like Tools SQL BI Tools
  • Background Databases for webscale transactions Performance Flexibility
  • ID Age State 123 20 CA 345 25 WA 678 40 FL Relational Table ID Attributes 123 Age:20, State:CA 345 Age:25, Country: Australia, Gender: F, Smoker: No 678 Age:40 Non-Relational Table
  • Deployment Options On-premise Cloud Managed on Cloud
  • DynamoDB Manageability Scalability Cost
  • digital advertising real-time bidding
  • Hadoop MPP NoSQL Structure Latency Interfaces Any Full Semi Mins-Hours Seconds-Minutes Sub-second Programming SQL-Like Tools SQL Programming Tools
  • Streaming Analy;cs
  • Data Sources App.4 [Machine Learning] AWS Endpoint App.1 [Aggregate & De-­‐Duplicate] Data Sources Data Sources Data Sources App.2 [Metric ExtracIon] S3 DynamoDB Redshift App.3 [Sliding Window Analysis] Data Sources Availability Zone Availability Zone Shard 1 Shard 2 Shard N Availability Zone Amazon Kinesis EMR
  • • Sensor networks analytics • Ad network analytics • Log centralization • Click stream analysis • Hardware and software appliance metrics • …more…
  • Amazon Mobile Analytics Fast: get your data within an hour Automatic MAU, DAU, session and retention reports Design and track custom app events Data is not mined or sold by Amazon
  • Expand your skills with AWS Certification Exams Validate your proven technical expertise with the AWS platform aws.amazon.com/certification On-Demand Resources Videos & Labs Get hands-on practice working with AWS technologies in a live environment aws.amazon.com/training/ self-paced-labs Instructor-Led Courses Training Classes Expand your technical expertise to design, deploy, and operate scalable, efficient applications on AWS aws.amazon.com/training
  • Big Data Tutorials aws.amazon.com/big-data Redshift Free Trial aws.amazon.com/redshift/free-trial
  • © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.