• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and Mobilewalla
 

AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and Mobilewalla

on

  • 1,359 views

Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to ...

Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.

Statistics

Views

Total Views
1,359
Views on SlideShare
1,340
Embed Views
19

Actions

Likes
4
Downloads
0
Comments
1

3 Embeds 19

http://www.linkedin.com 14
https://www.linkedin.com 3
https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • great presentaion
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and Mobilewalla AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and Mobilewalla Presentation Transcript

    • Abhishek Sinha Business Development Manager, AWS July 18, 2013 @abysinha sinhaar@amazon.com Big Data Analytics
    • Overview • The Big Data Challenge • Turning data into actionable information • Building a big data platform • Mobilewalla– Big data system in AWS for mobile app audience measurement • Intel technology on big data.
    • Generation Collection & storage Analytics & computation Collaboration & sharing
    • Generation Collection & storage Analytics & computation Collaboration & sharing Lower cost, higher throughput
    • Generation Collection & storage Analytics & computation Collaboration & sharing Highly constrained Lower cost, higher throughput
    • Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
    • Big Gap in turning data into actionable information
    • Amazon Web Services helps remove constraints
    • 1 instance x 100 hours = 100 instances x 1 hour
    • Media/Advertising Targeted Advertising Image and Video Processing Oil & Gas Seismic Analysis Retail Recommendation Transactions Analysis Life Sciences Genome Analysis Financial Services Monte Carlo Simulations Risk Analysis Security Anti-virus Fraud Detection Image Recognition Social Network/Gaming User Demographics Usage analysis In-game metrics Big Data Verticals and Use cases
    • From data to actionable information
    • “Who is using our service?”
    • Identified early mobile usage Invested heavily in mobile development Finding signal in the noise of logs
    • 9,432,061 unique mobile devices used the Yelp mobile app. 4 million+ calls. 5 million+ directions. In January 2013
    • “What kind of movies do people like ?”
    • More than 25 Million Streaming Members 50 Billion Events Per Day 30 Million plays every day 2 billion hours of video in 3 months 4 million ratings per day 3 million searches Device location , time , day, week etc. Social data
    • Query complements the R3 solution by providing granular search-and- retrieval functionality for structured and unstructured data stored in FinQloud
    • Building a Big-Data Architecture
    • Generation Collection & storage Analytics & computation Collaboration & sharing
    • Generation Collection & storage Analytics & computation Collaboration & sharing
    • Getting your Data into AWS Amazon S3 Corporate Data Center • Console Upload • FTP • AWS Import Export • S3 API • Direct Connect • Storage Gateway • 3rd Party Commercial Apps • Tsunami UDP 1
    • Write directly to a data source Your application Amazon S3 DynamoDB Any other data store Amazon S3 Amazon EC2 2
    • Queue , pre-process and then write to data source Amazon Simple Queue Service (SQS) Amazon S3 DynamoDB Any other data store 3
    • Agency Customer: Video Analytics on AWS Elastic Load Balancer Edge Servers on EC2 Workers on EC2 Logs Reports HDFS Cluster Amazon Simple Queue Service (SQS) Amazon Simple Storage Service (S3) Amazon Elastic MapReduce
    • Aggregate and write to data source Flume running on EC2 Amazon S3 Any other data store HDFS 4
    • Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html S3 as a “single source of truth” S3
    • Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Choose depending upon design
    • Generation Collection & storage Analytics & computation Collaboration & sharing
    • Hadoop based Analysis Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR
    • EMR is Hadoop in the Cloud What is Amazon Elastic MapReduce (EMR)?
    • EMR Cluster S3 Put the data into S3 Choose: Hadoop distribution, # of nodes, types of nodes, custom configs, Hive/Pig/etc. Get the output from S3 Launch the cluster using the EMR console, CLI, SDK, or APIs You can also store everything in HDFS How does EMR work ?
    • S3 What can you run on EMR… EMR Cluster
    • Resize Nodes EMR Cluster You can easily add and remove nodes
    • On and Off Fast Growth Predictable peaksVariable peaks WASTE
    • Fast GrowthOn and Off Predictable peaksVariable peaks
    • Your choice of tools on Hadoop/EMR Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR
    • SQL based processing Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse
    • What is Amazon Redshift ? Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud Easy to provision and scale No upfront costs, pay as you go High performance at a low price Open and flexible with support for popular BI tools
    • Amazon Redshift is priced to let you analyze all your data Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999 Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go
    • Your choice of BI Tools on the cloud Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Pre-processing framework
    • Generation Collection & storage Analytics & computation Collaboration & sharing
    • Collaboration and Sharing insights Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift
    • Sharing results and visualizations Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Web App Server Visualization tools
    • Sharing results and visualizations and scale Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Web App Server Visualization tools
    • Sharing results and visualizations Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools
    • Geospatial Visualizations Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Visualization tools
    • Rinse Repeat every day or hour
    • Rinse and Repeat Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline
    • The complete architecture Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline
    • Kaushik Dutta CTO 18 July, 2013 Mobilewalla – App Audience Measurement With Amazon EC2 Infrastructure
    • Mobilewalla • Seattle-based big data venture that has accumulated the largest volumetric database of app market data in the industry. • Applying data science techniques on this data, Mobilewalla generates actionable intelligence of importance to ad agencies, ad tech companies, and app publishers • Measuring audience in mobile apps
    • Traditional audience measurement - Panels & Popularity Persistence Fundamental to panel driven measurement Idea of popularity persistence Large pool of options “small” set of popular choices 99 – 1 rule Objects popular today  popular 30-60-90 days from today • Panel can be assumed to eventually gravitate towards the persistent popular set
    • Mobilewalla Use Case – App Publishers • How is my app doing? – Rank by Category and Country, Reviews, Ratings, Feature mentions, Sentiment Analysis, Social Media, Audience Profile, Negative Review Analysis, Upgrades • Competitive Tracking – All of the above for competitors presented as overlays • Audience Analysis – Demographics, Psychographics • Alerts – Notifications upon specific events: review spikes, Twitter spikes
    • Mobilewalla Use Case – Mobile Ad Tech • New Publisher Acquisition – Top N apps & Publishers for a Category / Geography – Top publishers by audience • Optimal Traffic Allocation – Related apps by content – Related apps by Audience profile – Behavioral profiles of network apps • Real-Time, Programmatic Delivery – API driven access – Sub 100ms response times
    • Mobilewalla Approach Social media / web Web Crawler Cloud Storage Amazon S3 Amazon EBS Amazon RDS
    • Mobilewalla Approach – Map-Reduce based analytics Analytics Analytics Analytics Analytics Map Reduce Analytics Cloud Storage ( 30+ Terabyte) Amazon S3 Amazon EBS Amazon RDS
    • Mobilewalla – Amazon EC2 Infrastructure Web Crawler • 700+ micro to small instances • Elastic map-reduce – flexibility of allocating a large number instances for a distributed program running for short time • Spot Instance – reduces the cost
    • Mobilewalla – Amazon EC2 Infrastructure Cloud Storage • 50+ Medium to Large instances • Cassandra DB Nodes – EBS backed • Distributed in two availability zones in two different geographical regions • Flexibility to add nodes as and when required – allows you to grow with the business • Region based fail-over • Tier Storage systems – Local storage – Elastic Block Storage – S3 Storage • Considering Amazon Redshift Amazon S3 Amazon EBS Amazon RDS
    • Mobilewalla – Amazon EC2 Infrastructure Map Reduce Framework • Complex analytics jobs on Hadoop systems in EC2 nodes • Elastic map-reduce for jobs requiring large number of nodes on S3 storage systems Analytics Analytics Analytics Analytics
    • Mobilewalla – Amazon EC2 Infrastructure Analytics Delivery • Multiple application servers with load balancers • High read throughput from data nodes • Load balancers (ELB) and fail-over
    • Amazon Web Services for Mobilewalla - Advantages • On-Demand and reserved nodes – Flexibility to add, modify, delete nodes as your business changes • Tiered storage systems to store and manage terabytes of data – Flexibility to change the data parameters (reliability, read-throughput, write throughput) by varying the storage systems of your choice • Elastic Map-Reduce – Large scale map-reduce cluster without getting details into managing individual nodes and map-reduce framework Amazon EC2 allowed us to size our infrastructure as per our need and data growth.
    • Amazon Web Services for Mobilewalla - Suggestions • Take the initial time to explore all the various offerings of Amazon in data storage and management, before developing a solution • Changing solution architecture for terabytes of data at later time is a challenge
    • Thank You
    • Big Data Analytics Eddie Toh Regional Platform Marketing Manager Pricing & Product Marketing Group Intel APAC July 18, 2013
    • Create new business models and improve organizational processes. Enhance scientific understanding, drive innovation, and accelerate medical cures. Increase public safety and improve energy efficiency with smart grids. Analysis of Data Can Transform Society
    • Unlock Value in Silicon Support Open Platforms Deliver Software Value Democratizing Analytics gets Value out of Big Data
    • Intel at the Intersection of Big Data Enabling exascale computing on massive data sets Helping enterprises build open interoperable clouds CloudHPC Contributing code and fostering ecosystem Open Source
    • Intel at the Heart of the Cloud Server Storage Network
    • Scale-Out Platform Optimizations for Big Data Cost-effective performance • Intel® Advanced Vector Extension Technology • Intel® Turbo Boost Technology 2.0 • Intel® Advanced Encryption Standard New Instructions Technology
    • Intel® Advanced Vector Extensions Technology 1 : Performance comparison using Linpack benchmark. See backup for configuration details. For more legal information on performance forecasts go to http://www.intel.com/performance 76 • Newest in a long line of processor instruction innovations • Increases floating point operations per clock up to 2X1 performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
    • More Performance Higher turbo speeds maximize performance for single and multi-threaded applications Intel® Turbo Boost Technology 2.0
    • Intel® Advanced Encryption Standard New Instructions • Processor assistance for performing AES encryption - 7 new instructions • Makes enabled encryption software faster and stronger
    • Richer user experiences 4HRS 50% Reduction ~7MIN 80% Reduction 50% Reduction 40% Reduction TeraSort for 1TB sort Intel® Xeon® Processor E5 2600 Solid-State Drive 10G Ethernet Intel® Distribution for Apache Hadoop Previous Intel® Xeon® Processor Power of the Platform built by Intel
    • Cloud Intelligent Systems Clients Virtuous Cycle of Data-Driven Experience
    • Thank You
    • Technical Track
    • Break Technical Track