• Save
Data-driven Innovation - Wood
Upcoming SlideShare
Loading in...5
×
 

Data-driven Innovation - Wood

on

  • 852 views

 

Statistics

Views

Total Views
852
Views on SlideShare
850
Embed Views
2

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 2

http://bslkb.elasticbeanstalk.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data-driven Innovation - Wood Data-driven Innovation - Wood Presentation Transcript

  • Data-driven innovationDr. Matt Woodmatthew@amazon.com@mza
  • Hello
  • Hello
  • Data
  • DNA
  • Chromosome 11 : ACTN3 : rs1815739
  • Chromosome X : rs6625163
  • Chromosome 19 : FUT2 : rs601338
  • Chromosome 2 : rs10427255
  • Chromosome 10 : rs7903146TYPE II
  • Chromosome 15 : rs2472297+0.25
  • I know this, because...
  • ATCGGTCCAGG
  • AT AC GG CG C TranscriptionT AC GC GA UG CG C
  • AT AC G SerG CG C Transcription TranslationT A GluC GC GA U ValG CG C
  • Chromosome 11 : ACTN3 : rs1815739
  • Chromosome X : rs6625163
  • Chromosome 19 : FUT2 : rs601338
  • Chromosome 2 : rs10427255
  • Chromosome 10 : rs7903146TYPE II
  • Chromosome 15 : rs2472297+0.25
  • I know all that, because...
  • Human Genome Project
  • 40 species ensembl.org
  • Compare
  • Change
  • Less
  • Compare
  • Transformative
  • Data generation costs are falling everywhere
  • Customer segmentation,financial modeling,system analysis,line of sight,business intelligence.
  • Opportunity
  • Transformation
  • Innovation
  • Generation Collection & storageAnalytics & computationCollaboration & sharing
  • lower cost,increased throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  • lower cost,increased throughput Generation highly constrained Collection & storage Analytics & computation Collaboration & sharing
  • Barrier
  • Data generation X challenge
  • Analyticschallenge
  • Accessibility challenge
  • Enter the AWS Cloud
  • Utility
  • Remove constraints
  • Data-driven innovation
  • Distributed
  • 2
  • Software for distributed storage & analysis 2
  • Software for distributed storage & analysis 2Infrastructure for distributed storage & analysis
  • Software Frameworks fordata-intensive work loads. Distributed by design.
  • Infrastructure Platform fordata-intensive work loads. Distributed by design.
  • Support the data timeline
  • Generation highly constrained Collection & storageAnalytics & computationCollaboration & sharing
  • Generation Collection & storageAnalytics & computationCollaboration & sharing
  • Lower the barrier to entry
  • Agility
  • Responsive
  • Generation Collection & storageAnalytics & computationCollaboration & sharing
  • Generation DynamoDBAnalytics & computationCollaboration & sharing
  • Generation DynamoDBEC2, Elastic MapReduceCollaboration & sharing
  • Generation DynamoDBEC2, Elastic MapReduce S3, Public Datasets
  • Tools and techniques for working productively with data
  • Scale
  • Secure
  • Software for distributed storage & analysis 2Infrastructure for distributed storage & analysis
  • Amazon EC2
  • Scale out systems Embarrassingly parallel Queue based distributionSmall, medium and high scale
  • High performance
  • Compute performance High performance
  • Cluster Compute Intel Xeon E5-267010 gigabit, non-blocking network 60.5 Gb Placement groupings
  • Cluster Compute Intel Xeon E5-267010 gigabit, non-blocking network 60.5 Gb Placement groupings +GPU
  • 240 TFLOPS
  • Compute performance High performance IO performance
  • Unstructured
  • Variable
  • Amazon DynamoDBPredictable, consistent performance Unlimited storage Single digit millisecond latencies No schema. Zero admin.
  • ...and SSDs for all
  • hi1.4xlarge 2 x 1Tb SSD storage 10 gigabit networkingHVM: 90k IOPS read, 9k to 75k writePV: 120k IOPS read, 10k to 85k write
  • “The hi1.4xlarge configuration is about half thesystem cost for the same throughput.”Netflixhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
  • Provisioned IOPSProvision required IO performance EBS optimized instances
  • Cost optimization
  • Reserved capacity
  • On-demandReserved capacity
  • On-demandReserved capacity
  • Spot instances
  • $0.2530 vs $2.40
  • Software for distributed storage & analysis 2Infrastructure for distributed storage & analysis
  • map/reduce
  • Map. Reduce.
  • Write functions. Scale up.
  • Hadoop
  • Undifferentiated heavy lifting
  • Amazon Elastic MapReduce Managed Hadoop Clusters Easy to provision and monitor Write two functions. Scale up. Choice of Hadoop flavors
  • Amazon Elastic MapReduce Integrates with S3 Analytics for DynamoDB Perfect for Spot pricing
  • S3Input data
  • S3 Input dataCode Elastic MapReduce
  • S3 Input dataCode Elastic Name MapReduce node
  • S3 Input dataCode Elastic Name MapReduce node Elastic cluster
  • S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
  • S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • S3Input data Output S3 + SimpleDB
  • CDCCenters for Disease Control and Prevention
  • “BioSense 2.0 protects the health of the American peopleby providing timely insight into the health of communities, regions, and the nation by offering a variety of features toimprove data collection, standardization, storage, analysis, and collaboration”
  • Health data Collection & storageAnalytics & computationCollaboration & sharing
  • Health data highly constrained Collection & storageAnalytics & computationCollaboration & sharing
  • HIPAA, HITECH,FISMA Moderate
  • GovCloud
  • Beyond a definition of Big Data
  • Chromosome 11 : ACTN3 : rs1815739
  • Chromosome X : rs6625163
  • Chromosome 19 : FUT2 : rs601338
  • Chromosome 2 : rs10427255
  • Chromosome 10 : rs7903146TYPE II
  • Chromosome 15 : rs2472297+0.25
  • Thank youmatthew@amazon.com aws.amazon.com @mza