Your SlideShare is downloading. ×
20100608sigmod
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

20100608sigmod

2,678
views

Published on

Published in: Technology

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,678
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
36
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Tuesday, June 8, 2010
  • 2. Evolving a New Analytical Platform What Works and What’s Missing Jeff Hammerbacher Chief Scientist and Vice President of Products, Cloudera June 8, 2010 Tuesday, June 8, 2010
  • 3. My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant on Wall Street ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Chief Scientist ▪ Also, check out the book “Beautiful Data” Tuesday, June 8, 2010
  • 4. Presentation Outline ▪ BI: Science for Profit ▪ Need tools for whole research cycle ▪ SQL Server 2008 R2: defining the platform ▪ State of the Platform Ecosystem ▪ New Foundations: Hadoop ▪ Boiling the Frog ▪ Future developments ▪ Questions and Discussion Tuesday, June 8, 2010
  • 5. BI is looking more like science (for profit) Tuesday, June 8, 2010
  • 6. Jim Gray: Science entering Fourth Paradigm “We have to do better at producing tools to support the whole research cycle” Tuesday, June 8, 2010
  • 7. RDBMS only a small part of this tool set Tuesday, June 8, 2010
  • 8. Example: SQL Server 2008 R2 Tuesday, June 8, 2010
  • 9. RDBMS: SQL Server Tuesday, June 8, 2010
  • 10. ETL: SQL Server Integration Services RDBMS: SQL Server Tuesday, June 8, 2010
  • 11. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Tuesday, June 8, 2010
  • 12. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Tuesday, June 8, 2010
  • 13. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Tuesday, June 8, 2010
  • 14. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Tuesday, June 8, 2010
  • 15. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  • 16. MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  • 17. Collaboration: SharePoint MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  • 18. What do we call this unified suite? Tuesday, June 8, 2010
  • 19. For today: Analytical Data Platform Tuesday, June 8, 2010
  • 20. Who makes up the platform ecosystem? Tuesday, June 8, 2010
  • 21. Platform Providers Tuesday, June 8, 2010
  • 22. Infrastructure Providers Platform Providers Tuesday, June 8, 2010
  • 23. Infrastructure Providers Platform Providers Application Developers Tuesday, June 8, 2010
  • 24. Content Providers Infrastructure Providers Platform Providers Application Developers Tuesday, June 8, 2010
  • 25. Content Providers Infrastructure Providers Platform Providers Application Developers End Users Tuesday, June 8, 2010
  • 26. What is new about the ecosystem today? Tuesday, June 8, 2010
  • 27. Content Providers 1. > 95% of enterprise data is unstructured 2. Data volumes growing rapidly Tuesday, June 8, 2010
  • 28. Infrastructure Providers 1. Cloud 2. Warehouse-Scale Computers Tuesday, June 8, 2010
  • 29. Platform Providers 1. Open source 2. Driven by consumer web properties Tuesday, June 8, 2010
  • 30. Application Developers 1. Data Scientists 2. Diversity of languages Tuesday, June 8, 2010
  • 31. End Users 1. Move beyond reporting to analytics 2. Make use of all enterprise data Tuesday, June 8, 2010
  • 32. New foundations: HDFS and MapReduce Tuesday, June 8, 2010
  • 33. (This is what boiling a frog feels like) Tuesday, June 8, 2010
  • 34. 2005: Doug/Mike start project inside Nutch Tuesday, June 8, 2010
  • 35. 2006: Doug joins Yahoo! Tuesday, June 8, 2010
  • 36. 2007: Make Hadoop scale Tuesday, June 8, 2010
  • 37. 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  • 38. Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  • 39. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  • 40. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Powerset makes HBase open source Tuesday, June 8, 2010
  • 41. 2008: Make Hadoop fast Tuesday, June 8, 2010
  • 42. 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Tuesday, June 8, 2010
  • 43. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Tuesday, June 8, 2010
  • 44. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  • 45. Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  • 46. “MapReduce: A Major Step Backwards” Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  • 47. 2009: Insert Hadoop into the enterprise Tuesday, June 8, 2010
  • 48. 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  • 49. First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  • 50. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  • 51. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Tuesday, June 8, 2010
  • 52. “The Unreasonable Effectiveness of Data” Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Tuesday, June 8, 2010
  • 53. 2010: Integrate Hadoop into the enterprise Tuesday, June 8, 2010
  • 54. 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Tuesday, June 8, 2010
  • 55. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Tuesday, June 8, 2010
  • 56. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  • 57. Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  • 58. Hive adds JDBC and ODBC Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  • 59. Hadoop will be an Analytical Data Platform Tuesday, June 8, 2010
  • 60. What’s Next? Tuesday, June 8, 2010
  • 61. Capture: Log collection and CEP Tuesday, June 8, 2010
  • 62. Curate: Workflow and Scheduling Tuesday, June 8, 2010
  • 63. Curate: Secondary and Full-Text Indexing Tuesday, June 8, 2010
  • 64. Curate: Learn Structure from Data Tuesday, June 8, 2010
  • 65. Analyze: Mesos-enabled frameworks Tuesday, June 8, 2010
  • 66. Analyze: Link local and global data Tuesday, June 8, 2010
  • 67. All behind a single pane of glass Tuesday, June 8, 2010
  • 68. Cloudera Desktop Making Many Computers Feel Like One Tuesday, June 8, 2010
  • 69. (c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Tuesday, June 8, 2010