• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Analytics with Amazon Web Services
 

Big Data Analytics with Amazon Web Services

on

  • 499 views

 

Statistics

Views

Total Views
499
Views on SlideShare
499
Embed Views
0

Actions

Likes
0
Downloads
15
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Analytics with Amazon Web Services Big Data Analytics with Amazon Web Services Presentation Transcript

    • Big Data Analyticswith Amazon Web Services Dr. Matt Wood An Online Seminar. Tuesday 16th October.
    • Hello, and thank you.
    • Big Data Analytics An introduction
    • Big Data Analytics An introduction The story of analytics on AWS
    • Big Data Analytics An introduction The story of analytics on AWS AWS Marketplace
    • Big Data Analytics An introduction The story of analytics on AWS AWS Marketplace Success story: Brightcove
    • 1INTRODUCING BIG DATA
    • Data for competitive advantage.
    • Using data Customer segmentation, financial modeling, system analysis, line-of-sight, business intelligence.
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Cost of data generation is falling.
    • lower cost,increased throughput Generation Collection & storage Analytics & computation Collaboration & sharing
    • Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
    • Very high barrier to turning data into information.
    • Move from adata generation challenge to analytics challenge.
    • Enter the Cloud.
    • Remove the constraints.
    • Enable data-driven innovation.
    • Move to a distributed data approach.
    • Maturation of two things.
    • Software for distributed storage and analysisMaturation of two things.
    • Software for distributed storage and analysisMaturation of two things. Infrastructure for distributed storage and analysis
    • Software Frameworks for data-intensive workloads. Distributed by design.
    • Infrastructure Platform for data-intensive workloads. Distributed by design.
    • Support thedata timeline.
    • Generation HIGHLY CONSTRAINED Collection & storageAnalytics & computationCollaboration & sharing
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Lower thebarrier to entry.
    • Accelerate time to market and increase agility.
    • Enable new business opportunities.
    • Washington Post Pinterest NASA
    • “AWS enables Pfizer to exploredifficult or deep scientificquestions in a timely, scalablemanner and helps us make betterdecisions more quickly”Michael Miller, Pfizer
    • 2THE STORY OF ANALYTICS
    • EC2Utility computing. 6 years young.
    • Scale out systems Embarrassingly parallel problems. Queue based distribution. Small, medium and high scale.
    • Cost optimization. EC2Utility computing. 6 years young.
    • Achieving economies of scale100% Time
    • Achieving economies of scale100% Reserved capacity Time
    • Achieving economies of scale100% On-demand Reserved capacity Time
    • Achieving economies of scale UNUSED CAPACITY100% On-demand Reserved capacity Time
    • Spot Instances Bid on unused EC2 capacity. Very large discount. Perfect for batch runs. Balance cost and scale.
    • <$1000 per hour
    • Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up.
    • Map/reduce Pattern for distributed computing. Software frameworks such as Hadoop. Write two functions. Scale up. Complex cluster configuration and management.
    • Amazon Elastic MapReduce Managed Hadoop clusters. Easy to provision and monitor. Write two functions. Scale up. Optimized for S3 access.
    • S3Input data
    • S3 Input dataCode Elastic MapReduce
    • S3 Input dataCode Elastic Name MapReduce node
    • S3 Input dataCode Elastic Name MapReduce node Elastic cluster
    • S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
    • S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
    • S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
    • S3Input data Output S3 + SimpleDB
    • Performance
    • Performance Compute performance
    • Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings
    • Cluster Compute Intel Xeon E5-2670 10 gig E non-blocking network 60.5 Gb Placement groupings + GPU enabled instances
    • Performance Compute performance
    • IO performancePerformance Compute performance
    • NoSQLUnstructured data storage.
    • DynamoDB Predictable, consistent performance Unlimited storage Single digit millisecond latencies No schema for unstructured data Backed on solid state drives
    • ...and SSDs for all. New Hi1 storage instances.
    • hi1.4xlarge 2 x 1Tb SSDs 10 GigE network HVM: 90k IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write
    • “The hi1.4xlarge configuration isabout half the system cost for thesame throughput.”Netflixhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Performance + ease of use
    • 3AWS MARKETPLACE
    • Extend platform with partners
    • Innovate on behalf of customers
    • Remove undifferentiated heavy lifting
    • AWS Marketplaceaws.amazon.com/marketplace
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Collection & storage Acunu Reflex Apache Cassandra NoSQL database MongoDB With and without EBS RAID storage Couchbase Community and Enterprise editions ScaleArc MySQL load balancing
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Analytics & computation KarmaSphere Analytics for Amazon Elastic MapReduce MapR M5 Hadoop Distribution Metamarkets Event based data processing
    • Analytics & computation StackIQ Rocks+ HPC clusters with MPI, Grid Engine Univa Grid Engine One click cluster deployment Quantivo Data association analytics
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Generation Collection & storageAnalytics & computationCollaboration & sharing
    • Collaboration & sharingAspera Faspex 20 Mbps data transfer
    • 4SUCCESS STORY