Your SlideShare is downloading. ×
0
Friday, April 23, 2010
Moving from Narrative to Design
           with Open Source and Open Data

           Jeff Hammerbacher
           Chief S...
Presentation Outline
          ▪   Who am I and what am I talking about?
              ▪   My Background
              ▪  ...
My Background
          Thanks for Asking
          ▪   hammer@cloudera.com
          ▪   Studied Mathematics at Harvard
 ...
Friday, April 23, 2010
“Genomic Advances of the 2000s Will
          Demand an Informatics Revolution in the
          2010s”
                   ...
From Narrative to Design
          Learning from past revolutions
          ▪   Leaving technology aside, once you have a ...
Two Stories
Friday, April 23, 2010
Friday, April 23, 2010
Finance
          A Cautionary Tale
          ▪   Strengths
              ▪   Complex models
              ▪   Low latency...
Friday, April 23, 2010
The Web
          A Success Story
          ▪   Strengths
              ▪   Open data
              ▪   Open source
      ...
Another Story
Friday, April 23, 2010
Friday, April 23, 2010
Extreme Neuroanatomy
          DIY Brain Imaging
          ▪   Built by Ken Hayworth, ex-JPL engineer
          ▪   Now a ...
A New Story
Friday, April 23, 2010
Friday, April 23, 2010
Biology
          Finance or the Web?
          ▪   Large drug companies not that different from banks
          ▪   Recen...
Last Story
Friday, April 23, 2010
Friday, April 23, 2010
High-Energy Physics
          Science at the Petabyte scale
          ▪   Tremendous investment in measurement
           ...
The Promise of Sage
          Delivering on the Excitement
          ▪   An information platform for disease modeling and ...
Action Items
Friday, April 23, 2010
1. Share Data


Friday, April 23, 2010
2. Share Tools


Friday, April 23, 2010
3. Share Results


Friday, April 23, 2010
(c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1....
Upcoming SlideShare
Loading in...5
×

20100423sage

982

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
982
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "20100423sage"

  1. 1. Friday, April 23, 2010
  2. 2. Moving from Narrative to Design with Open Source and Open Data Jeff Hammerbacher Chief Scientist and Vice President of Products, Cloudera April 23, 2010 Friday, April 23, 2010
  3. 3. Presentation Outline ▪ Who am I and what am I talking about? ▪ My Background ▪ Defining “Narrative” and “Design” ▪ Two stories ▪ Finance ▪ Web ▪ Making the Sage story a success ▪ Open Source ▪ Open Data ▪ The Unreasonable Effectiveness of Data Friday, April 23, 2010
  4. 4. My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant at Bear Stearns ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Vice President of Products and Chief Scientist ▪ Also, check out the book “Beautiful Data” Friday, April 23, 2010
  5. 5. Friday, April 23, 2010
  6. 6. “Genomic Advances of the 2000s Will Demand an Informatics Revolution in the 2010s” -- Eric Schadt Friday, April 23, 2010
  7. 7. From Narrative to Design Learning from past revolutions ▪ Leaving technology aside, once you have a narrative: ▪ collect and structure data ▪ build tools, models, and a domain vocabulary (“understand”) ▪ use effective models to achieve your goal (“automate”) ▪ Pinnacle: Boeing 777, designed entirely on a computer ▪ Thanks to “Biology is Technology” by Robert Carlson for the “Narrative” and “Design” terminology Friday, April 23, 2010
  8. 8. Two Stories Friday, April 23, 2010
  9. 9. Friday, April 23, 2010
  10. 10. Finance A Cautionary Tale ▪ Strengths ▪ Complex models ▪ Low latency decisions ▪ Compute grids ▪ Weaknesses ▪ Data needed for decisions is expensive ▪ Code unrelated to automating decisions is highly guarded ▪ Storing and manipulating large data sets ▪ Narrative to Design: trading strategies, financial products Friday, April 23, 2010
  11. 11. Friday, April 23, 2010
  12. 12. The Web A Success Story ▪ Strengths ▪ Open data ▪ Open source ▪ Scalable data management and analysis ▪ Weaknesses ▪ Low latency ▪ Complex models ▪ Style of dress ▪ Narrative to Design: search quality, advertising targeting Friday, April 23, 2010
  13. 13. Another Story Friday, April 23, 2010
  14. 14. Friday, April 23, 2010
  15. 15. Extreme Neuroanatomy DIY Brain Imaging ▪ Built by Ken Hayworth, ex-JPL engineer ▪ Now a critical contributor to “Connectome” project ▪ Synapse-resolution connectivity diagram of the brain ▪ Image data for a single rat brain: 900 PB ▪ This looks familiar! ▪ hackers ▪ large data sets ▪ machine learning Friday, April 23, 2010
  16. 16. A New Story Friday, April 23, 2010
  17. 17. Friday, April 23, 2010
  18. 18. Biology Finance or the Web? ▪ Large drug companies not that different from banks ▪ Recent infusion of open source and open data initiatives! ▪ Learn from the web ▪ Open source ▪ Open data ▪ “The Unreasonable Effectiveness of Data” ▪ Use existing open source software ▪ Cambrian explosion in open source data management ▪ Data storage and processing at petabyte scale is solved Friday, April 23, 2010
  19. 19. Last Story Friday, April 23, 2010
  20. 20. Friday, April 23, 2010
  21. 21. High-Energy Physics Science at the Petabyte scale ▪ Tremendous investment in measurement ▪ Large Hadron Collider cost estimated at $4.4B ▪ Teams spend years performing analysis ▪ Shared toolset for analysis and visualization has emerged ▪ Lessons for Sage ▪ Invest in new measurement technologies and store everything ▪ Use existing open source tools and share those you build Friday, April 23, 2010
  22. 22. The Promise of Sage Delivering on the Excitement ▪ An information platform for disease modeling and drug discovery ▪ More importantly: a community of developers and researchers ▪ Developers, developers, developers! ▪ Wait, that’s you guys -- be hackers ▪ The most important person in an open source project ▪ Not the creator ▪ The first power user with no ties to the creator Friday, April 23, 2010
  23. 23. Action Items Friday, April 23, 2010
  24. 24. 1. Share Data Friday, April 23, 2010
  25. 25. 2. Share Tools Friday, April 23, 2010
  26. 26. 3. Share Results Friday, April 23, 2010
  27. 27. (c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Friday, April 23, 2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×