20100608sigmod

3,150 views

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,150
On SlideShare
0
From Embeds
0
Number of Embeds
153
Actions
Shares
0
Downloads
38
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

20100608sigmod

  1. 1. Tuesday, June 8, 2010
  2. 2. Evolving a New Analytical Platform What Works and What’s Missing Jeff Hammerbacher Chief Scientist and Vice President of Products, Cloudera June 8, 2010 Tuesday, June 8, 2010
  3. 3. My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant on Wall Street ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Chief Scientist ▪ Also, check out the book “Beautiful Data” Tuesday, June 8, 2010
  4. 4. Presentation Outline ▪ BI: Science for Profit ▪ Need tools for whole research cycle ▪ SQL Server 2008 R2: defining the platform ▪ State of the Platform Ecosystem ▪ New Foundations: Hadoop ▪ Boiling the Frog ▪ Future developments ▪ Questions and Discussion Tuesday, June 8, 2010
  5. 5. BI is looking more like science (for profit) Tuesday, June 8, 2010
  6. 6. Jim Gray: Science entering Fourth Paradigm “We have to do better at producing tools to support the whole research cycle” Tuesday, June 8, 2010
  7. 7. RDBMS only a small part of this tool set Tuesday, June 8, 2010
  8. 8. Example: SQL Server 2008 R2 Tuesday, June 8, 2010
  9. 9. RDBMS: SQL Server Tuesday, June 8, 2010
  10. 10. ETL: SQL Server Integration Services RDBMS: SQL Server Tuesday, June 8, 2010
  11. 11. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Tuesday, June 8, 2010
  12. 12. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Tuesday, June 8, 2010
  13. 13. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Tuesday, June 8, 2010
  14. 14. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Tuesday, June 8, 2010
  15. 15. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  16. 16. MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  17. 17. Collaboration: SharePoint MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  18. 18. What do we call this unified suite? Tuesday, June 8, 2010
  19. 19. For today: Analytical Data Platform Tuesday, June 8, 2010
  20. 20. Who makes up the platform ecosystem? Tuesday, June 8, 2010
  21. 21. Platform Providers Tuesday, June 8, 2010
  22. 22. Infrastructure Providers Platform Providers Tuesday, June 8, 2010
  23. 23. Infrastructure Providers Platform Providers Application Developers Tuesday, June 8, 2010
  24. 24. Content Providers Infrastructure Providers Platform Providers Application Developers Tuesday, June 8, 2010
  25. 25. Content Providers Infrastructure Providers Platform Providers Application Developers End Users Tuesday, June 8, 2010
  26. 26. What is new about the ecosystem today? Tuesday, June 8, 2010
  27. 27. Content Providers 1. > 95% of enterprise data is unstructured 2. Data volumes growing rapidly Tuesday, June 8, 2010
  28. 28. Infrastructure Providers 1. Cloud 2. Warehouse-Scale Computers Tuesday, June 8, 2010
  29. 29. Platform Providers 1. Open source 2. Driven by consumer web properties Tuesday, June 8, 2010
  30. 30. Application Developers 1. Data Scientists 2. Diversity of languages Tuesday, June 8, 2010
  31. 31. End Users 1. Move beyond reporting to analytics 2. Make use of all enterprise data Tuesday, June 8, 2010
  32. 32. New foundations: HDFS and MapReduce Tuesday, June 8, 2010
  33. 33. (This is what boiling a frog feels like) Tuesday, June 8, 2010
  34. 34. 2005: Doug/Mike start project inside Nutch Tuesday, June 8, 2010
  35. 35. 2006: Doug joins Yahoo! Tuesday, June 8, 2010
  36. 36. 2007: Make Hadoop scale Tuesday, June 8, 2010
  37. 37. 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  38. 38. Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  39. 39. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  40. 40. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Powerset makes HBase open source Tuesday, June 8, 2010
  41. 41. 2008: Make Hadoop fast Tuesday, June 8, 2010
  42. 42. 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Tuesday, June 8, 2010
  43. 43. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Tuesday, June 8, 2010
  44. 44. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  45. 45. Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  46. 46. “MapReduce: A Major Step Backwards” Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  47. 47. 2009: Insert Hadoop into the enterprise Tuesday, June 8, 2010
  48. 48. 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  49. 49. First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  50. 50. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  51. 51. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Tuesday, June 8, 2010
  52. 52. “The Unreasonable Effectiveness of Data” Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Tuesday, June 8, 2010
  53. 53. 2010: Integrate Hadoop into the enterprise Tuesday, June 8, 2010
  54. 54. 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Tuesday, June 8, 2010
  55. 55. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Tuesday, June 8, 2010
  56. 56. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  57. 57. Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  58. 58. Hive adds JDBC and ODBC Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  59. 59. Hadoop will be an Analytical Data Platform Tuesday, June 8, 2010
  60. 60. What’s Next? Tuesday, June 8, 2010
  61. 61. Capture: Log collection and CEP Tuesday, June 8, 2010
  62. 62. Curate: Workflow and Scheduling Tuesday, June 8, 2010
  63. 63. Curate: Secondary and Full-Text Indexing Tuesday, June 8, 2010
  64. 64. Curate: Learn Structure from Data Tuesday, June 8, 2010
  65. 65. Analyze: Mesos-enabled frameworks Tuesday, June 8, 2010
  66. 66. Analyze: Link local and global data Tuesday, June 8, 2010
  67. 67. All behind a single pane of glass Tuesday, June 8, 2010
  68. 68. Cloudera Desktop Making Many Computers Feel Like One Tuesday, June 8, 2010
  69. 69. (c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Tuesday, June 8, 2010

×