Experiences Evolving a New Analytical Platform: What Works and What's Missing
 

Experiences Evolving a New Analytical Platform: What Works and What's Missing

on

  • 4,086 views

Presentation from the Industry track of the 2010 ACM SIGMOD conference.

Presentation from the Industry track of the 2010 ACM SIGMOD conference.

Statistics

Views

Total Views
4,086
Views on SlideShare
3,628
Embed Views
458

Actions

Likes
7
Downloads
128
Comments
0

9 Embeds 458

http://www.cloudera.com 425
http://practicals-datascience.blogspot.com 14
http://www.slideshare.net 10
http://practicals-datascience.blogspot.in 3
http://www.techgig.com 2
http://translate.googleusercontent.com 1
http://test.cloudera.com 1
http://blog.cloudera.com 1
http://practicals-datascience.blogspot.com.br 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Experiences Evolving a New Analytical Platform: What Works and What's Missing Experiences Evolving a New Analytical Platform: What Works and What's Missing Presentation Transcript

  • Saturday, June 12, 2010
  • Evolving a New Analytical Platform What Works and What’s Missing Jeff Hammerbacher Chief Scientist, Cloudera June 8, 2010 Saturday, June 12, 2010
  • My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant on Wall Street ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Chief Scientist ▪ Also, check out the book “Beautiful Data” Saturday, June 12, 2010
  • Presentation Outline ▪ BI: Science for Profit ▪ Need tools for whole research cycle ▪ SQL Server 2008 R2: defining the platform ▪ State of the Platform Ecosystem ▪ New Foundations: Hadoop ▪ Boiling the Frog ▪ Future developments ▪ Questions and Discussion Saturday, June 12, 2010
  • BI is looking more like science (for profit) Saturday, June 12, 2010
  • Jim Gray: Science entering Fourth Paradigm “We have to do better at producing tools to support the whole research cycle” Saturday, June 12, 2010
  • RDBMS only a small part of this tool set Saturday, June 12, 2010
  • Example: SQL Server 2008 R2 Saturday, June 12, 2010
  • RDBMS: SQL Server Saturday, June 12, 2010
  • ETL: SQL Server Integration Services RDBMS: SQL Server Saturday, June 12, 2010
  • ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Saturday, June 12, 2010
  • ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Saturday, June 12, 2010
  • ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Saturday, June 12, 2010
  • CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Saturday, June 12, 2010
  • CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Saturday, June 12, 2010
  • MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Saturday, June 12, 2010
  • Collaboration: SharePoint MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Saturday, June 12, 2010
  • What do we call this unified suite? Saturday, June 12, 2010
  • For today: Analytical Data Platform Saturday, June 12, 2010
  • Who makes up the platform ecosystem? Saturday, June 12, 2010
  • Platform Providers Saturday, June 12, 2010
  • Infrastructure Providers Platform Providers Saturday, June 12, 2010
  • Infrastructure Providers Platform Providers Application Developers Saturday, June 12, 2010
  • Content Providers Infrastructure Providers Platform Providers Application Developers Saturday, June 12, 2010
  • Content Providers Infrastructure Providers Platform Providers Application Developers End Users Saturday, June 12, 2010
  • What is new about the ecosystem today? Saturday, June 12, 2010
  • Content Providers 1. > 95% of enterprise data is unstructured 2. Data volumes growing rapidly Saturday, June 12, 2010
  • Infrastructure Providers 1. Cloud 2. Warehouse-Scale Computers Saturday, June 12, 2010
  • Platform Providers 1. Open source 2. Driven by consumer web properties Saturday, June 12, 2010
  • Application Developers 1. Data Scientists 2. Diversity of languages Saturday, June 12, 2010
  • End Users 1. Move beyond reporting to analytics 2. Make use of all enterprise data Saturday, June 12, 2010
  • New foundations: HDFS and MapReduce Saturday, June 12, 2010
  • (This is what boiling a frog feels like) Saturday, June 12, 2010
  • 2005: Doug/Mike start project inside Nutch Saturday, June 12, 2010
  • 2006: Doug joins Yahoo! Saturday, June 12, 2010
  • 2007: Make Hadoop scale Saturday, June 12, 2010
  • 2007: Make Hadoop scale Yahoo! makes Pig open source Saturday, June 12, 2010
  • Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Saturday, June 12, 2010
  • Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Saturday, June 12, 2010
  • Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Powerset makes HBase open source Saturday, June 12, 2010
  • 2008: Make Hadoop fast Saturday, June 12, 2010
  • 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Saturday, June 12, 2010
  • First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Saturday, June 12, 2010
  • First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Saturday, June 12, 2010
  • Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Saturday, June 12, 2010
  • “MapReduce: A Major Step Backwards” Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Saturday, June 12, 2010
  • 2009: Insert Hadoop into the enterprise Saturday, June 12, 2010
  • 2009: Insert Hadoop into the enterprise Cloudera releases CDH Saturday, June 12, 2010
  • First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Saturday, June 12, 2010
  • Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Saturday, June 12, 2010
  • Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Saturday, June 12, 2010
  • “The Unreasonable Effectiveness of Data” Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Saturday, June 12, 2010
  • 2010: Integrate Hadoop into the enterprise Saturday, June 12, 2010
  • 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Saturday, June 12, 2010
  • Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Saturday, June 12, 2010
  • Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Saturday, June 12, 2010
  • Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Saturday, June 12, 2010
  • Hive adds JDBC and ODBC Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Saturday, June 12, 2010
  • Hadoop will be an Analytical Data Platform Saturday, June 12, 2010
  • What’s Next? Saturday, June 12, 2010
  • Capture: Log collection and CEP Saturday, June 12, 2010
  • Curate: Workflow and Scheduling Saturday, June 12, 2010
  • Curate: Secondary and Full-Text Indexing Saturday, June 12, 2010
  • Curate: Learn Structure from Data Saturday, June 12, 2010
  • Analyze: Mesos-enabled frameworks Saturday, June 12, 2010
  • Analyze: Link local and global data Saturday, June 12, 2010
  • All behind a single pane of glass Saturday, June 12, 2010
  • Cloudera Desktop Making Many Computers Feel Like One Saturday, June 12, 2010
  • (c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Saturday, June 12, 2010