Data-driven Innovation - Wood

927 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
927
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Data-driven Innovation - Wood

  1. 1. Data-driven innovationDr. Matt Woodmatthew@amazon.com@mza
  2. 2. Hello
  3. 3. Hello
  4. 4. Data
  5. 5. DNA
  6. 6. Chromosome 11 : ACTN3 : rs1815739
  7. 7. Chromosome X : rs6625163
  8. 8. Chromosome 19 : FUT2 : rs601338
  9. 9. Chromosome 2 : rs10427255
  10. 10. Chromosome 10 : rs7903146TYPE II
  11. 11. Chromosome 15 : rs2472297+0.25
  12. 12. I know this, because...
  13. 13. ATCGGTCCAGG
  14. 14. AT AC GG CG C TranscriptionT AC GC GA UG CG C
  15. 15. AT AC G SerG CG C Transcription TranslationT A GluC GC GA U ValG CG C
  16. 16. Chromosome 11 : ACTN3 : rs1815739
  17. 17. Chromosome X : rs6625163
  18. 18. Chromosome 19 : FUT2 : rs601338
  19. 19. Chromosome 2 : rs10427255
  20. 20. Chromosome 10 : rs7903146TYPE II
  21. 21. Chromosome 15 : rs2472297+0.25
  22. 22. I know all that, because...
  23. 23. Human Genome Project
  24. 24. 40 species ensembl.org
  25. 25. Compare
  26. 26. Change
  27. 27. Less
  28. 28. Compare
  29. 29. Transformative
  30. 30. Data generation costs are falling everywhere
  31. 31. Customer segmentation,financial modeling,system analysis,line of sight,business intelligence.
  32. 32. Opportunity
  33. 33. Transformation
  34. 34. Innovation
  35. 35. Generation Collection & storageAnalytics & computationCollaboration & sharing
  36. 36. lower cost,increased throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  37. 37. lower cost,increased throughput Generation highly constrained Collection & storage Analytics & computation Collaboration & sharing
  38. 38. Barrier
  39. 39. Data generation X challenge
  40. 40. Analyticschallenge
  41. 41. Accessibility challenge
  42. 42. Enter the AWS Cloud
  43. 43. Utility
  44. 44. Remove constraints
  45. 45. Data-driven innovation
  46. 46. Distributed
  47. 47. 2
  48. 48. Software for distributed storage & analysis 2
  49. 49. Software for distributed storage & analysis 2Infrastructure for distributed storage & analysis
  50. 50. Software Frameworks fordata-intensive work loads. Distributed by design.
  51. 51. Infrastructure Platform fordata-intensive work loads. Distributed by design.
  52. 52. Support the data timeline
  53. 53. Generation highly constrained Collection & storageAnalytics & computationCollaboration & sharing
  54. 54. Generation Collection & storageAnalytics & computationCollaboration & sharing
  55. 55. Lower the barrier to entry
  56. 56. Agility
  57. 57. Responsive
  58. 58. Generation Collection & storageAnalytics & computationCollaboration & sharing
  59. 59. Generation DynamoDBAnalytics & computationCollaboration & sharing
  60. 60. Generation DynamoDBEC2, Elastic MapReduceCollaboration & sharing
  61. 61. Generation DynamoDBEC2, Elastic MapReduce S3, Public Datasets
  62. 62. Tools and techniques for working productively with data
  63. 63. Scale
  64. 64. Secure
  65. 65. Software for distributed storage & analysis 2Infrastructure for distributed storage & analysis
  66. 66. Amazon EC2
  67. 67. Scale out systems Embarrassingly parallel Queue based distributionSmall, medium and high scale
  68. 68. High performance
  69. 69. Compute performance High performance
  70. 70. Cluster Compute Intel Xeon E5-267010 gigabit, non-blocking network 60.5 Gb Placement groupings
  71. 71. Cluster Compute Intel Xeon E5-267010 gigabit, non-blocking network 60.5 Gb Placement groupings +GPU
  72. 72. 240 TFLOPS
  73. 73. Compute performance High performance IO performance
  74. 74. Unstructured
  75. 75. Variable
  76. 76. Amazon DynamoDBPredictable, consistent performance Unlimited storage Single digit millisecond latencies No schema. Zero admin.
  77. 77. ...and SSDs for all
  78. 78. hi1.4xlarge 2 x 1Tb SSD storage 10 gigabit networkingHVM: 90k IOPS read, 9k to 75k writePV: 120k IOPS read, 10k to 85k write
  79. 79. “The hi1.4xlarge configuration is about half thesystem cost for the same throughput.”Netflixhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
  80. 80. Provisioned IOPSProvision required IO performance EBS optimized instances
  81. 81. Cost optimization
  82. 82. Reserved capacity
  83. 83. On-demandReserved capacity
  84. 84. On-demandReserved capacity
  85. 85. Spot instances
  86. 86. $0.2530 vs $2.40
  87. 87. Software for distributed storage & analysis 2Infrastructure for distributed storage & analysis
  88. 88. map/reduce
  89. 89. Map. Reduce.
  90. 90. Write functions. Scale up.
  91. 91. Hadoop
  92. 92. Undifferentiated heavy lifting
  93. 93. Amazon Elastic MapReduce Managed Hadoop Clusters Easy to provision and monitor Write two functions. Scale up. Choice of Hadoop flavors
  94. 94. Amazon Elastic MapReduce Integrates with S3 Analytics for DynamoDB Perfect for Spot pricing
  95. 95. S3Input data
  96. 96. S3 Input dataCode Elastic MapReduce
  97. 97. S3 Input dataCode Elastic Name MapReduce node
  98. 98. S3 Input dataCode Elastic Name MapReduce node Elastic cluster
  99. 99. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
  100. 100. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  101. 101. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  102. 102. S3Input data Output S3 + SimpleDB
  103. 103. CDCCenters for Disease Control and Prevention
  104. 104. “BioSense 2.0 protects the health of the American peopleby providing timely insight into the health of communities, regions, and the nation by offering a variety of features toimprove data collection, standardization, storage, analysis, and collaboration”
  105. 105. Health data Collection & storageAnalytics & computationCollaboration & sharing
  106. 106. Health data highly constrained Collection & storageAnalytics & computationCollaboration & sharing
  107. 107. HIPAA, HITECH,FISMA Moderate
  108. 108. GovCloud
  109. 109. Beyond a definition of Big Data
  110. 110. Chromosome 11 : ACTN3 : rs1815739
  111. 111. Chromosome X : rs6625163
  112. 112. Chromosome 19 : FUT2 : rs601338
  113. 113. Chromosome 2 : rs10427255
  114. 114. Chromosome 10 : rs7903146TYPE II
  115. 115. Chromosome 15 : rs2472297+0.25
  116. 116. Thank youmatthew@amazon.com aws.amazon.com @mza

×