SlideShare a Scribd company logo
1 of 69
Download to read offline
Tuesday, June 8, 2010
Evolving a New Analytical Platform
         What Works and What’s Missing


         Jeff Hammerbacher
         Chief Scientist and Vice President of Products, Cloudera
         June 8, 2010



Tuesday, June 8, 2010
My Background
         Thanks for Asking
         ▪   hammer@cloudera.com
         ▪   Studied Mathematics at Harvard
         ▪   Worked as a Quant on Wall Street
         ▪   Conceived, built, and led Data team at Facebook
             ▪   Nearly 30 amazing engineers and data scientists
             ▪   Several open source projects and research papers
         ▪   Founder of Cloudera
             ▪   Chief Scientist
             ▪   Also, check out the book “Beautiful Data”

Tuesday, June 8, 2010
Presentation Outline
         ▪   BI: Science for Profit
             ▪   Need tools for whole research cycle
             ▪   SQL Server 2008 R2: defining the platform
         ▪   State of the Platform Ecosystem
         ▪   New Foundations: Hadoop
             ▪   Boiling the Frog
             ▪   Future developments
         ▪   Questions and Discussion




Tuesday, June 8, 2010
BI is looking more like science (for profit)




Tuesday, June 8, 2010
Jim Gray: Science entering Fourth Paradigm
            “We have to do better at producing tools to
                 support the whole research cycle”




Tuesday, June 8, 2010
RDBMS only a small part of this tool set




Tuesday, June 8, 2010
Example: SQL Server 2008 R2




Tuesday, June 8, 2010
RDBMS: SQL Server




Tuesday, June 8, 2010
ETL: SQL Server Integration Services
                               RDBMS: SQL Server




Tuesday, June 8, 2010
ETL: SQL Server Integration Services
                               RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services




Tuesday, June 8, 2010
ETL: SQL Server Integration Services
                               RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services




Tuesday, June 8, 2010
ETL: SQL Server Integration Services
                               RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Tuesday, June 8, 2010
CEP: StreamInsight
                        ETL: SQL Server Integration Services
                               RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Tuesday, June 8, 2010
CEP: StreamInsight
                        ETL: SQL Server Integration Services
                               RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Tuesday, June 8, 2010
MDM: Master Data Services
                                 CEP: StreamInsight
                        ETL: SQL Server Integration Services
                               RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Tuesday, June 8, 2010
Collaboration: SharePoint
                             MDM: Master Data Services
                                 CEP: StreamInsight
                        ETL: SQL Server Integration Services
                               RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Tuesday, June 8, 2010
What do we call this unified suite?




Tuesday, June 8, 2010
For today: Analytical Data Platform




Tuesday, June 8, 2010
Who makes up the platform ecosystem?




Tuesday, June 8, 2010
Platform Providers




Tuesday, June 8, 2010
Infrastructure Providers
                          Platform Providers




Tuesday, June 8, 2010
Infrastructure Providers
                          Platform Providers
                        Application Developers




Tuesday, June 8, 2010
Content Providers
                        Infrastructure Providers
                          Platform Providers
                        Application Developers




Tuesday, June 8, 2010
Content Providers
                        Infrastructure Providers
                          Platform Providers
                        Application Developers
                              End Users




Tuesday, June 8, 2010
What is new about the ecosystem today?




Tuesday, June 8, 2010
Content Providers
            1. > 95% of enterprise data is unstructured
                  2. Data volumes growing rapidly




Tuesday, June 8, 2010
Infrastructure Providers
                                  1. Cloud
                        2. Warehouse-Scale Computers




Tuesday, June 8, 2010
Platform Providers
                                   1. Open source
                        2. Driven by consumer web properties




Tuesday, June 8, 2010
Application Developers
                           1. Data Scientists
                        2. Diversity of languages




Tuesday, June 8, 2010
End Users
                        1. Move beyond reporting to analytics
                         2. Make use of all enterprise data




Tuesday, June 8, 2010
New foundations: HDFS and MapReduce




Tuesday, June 8, 2010
(This is what boiling a frog feels like)




Tuesday, June 8, 2010
2005: Doug/Mike start project inside Nutch




Tuesday, June 8, 2010
2006: Doug joins Yahoo!




Tuesday, June 8, 2010
2007: Make Hadoop scale




Tuesday, June 8, 2010
2007: Make Hadoop scale
                        Yahoo! makes Pig open source




Tuesday, June 8, 2010
Jim Gray’s “Fourth Paradigm” lecture
                            2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Tuesday, June 8, 2010
Randy Bryant’s “DISC” lecture
                        Jim Gray’s “Fourth Paradigm” lecture
                            2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Tuesday, June 8, 2010
Randy Bryant’s “DISC” lecture
                        Jim Gray’s “Fourth Paradigm” lecture
                            2007: Make Hadoop scale
                           Yahoo! makes Pig open source
                         Powerset makes HBase open source




Tuesday, June 8, 2010
2008: Make Hadoop fast




Tuesday, June 8, 2010
2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Tuesday, June 8, 2010
First Hadoop Summit
                        2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Tuesday, June 8, 2010
First Hadoop Summit
                        2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Tuesday, June 8, 2010
Facebook makes Hive open source
                              First Hadoop Summit
                           2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Tuesday, June 8, 2010
“MapReduce: A Major Step Backwards”
                          Facebook makes Hive open source
                                First Hadoop Summit
                             2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Tuesday, June 8, 2010
2009: Insert Hadoop into the enterprise




Tuesday, June 8, 2010
2009: Insert Hadoop into the enterprise
                           Cloudera releases CDH




Tuesday, June 8, 2010
First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                           Cloudera releases CDH




Tuesday, June 8, 2010
Yahoo! sorts a petabyte with Hadoop
                              First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                              Cloudera releases CDH




Tuesday, June 8, 2010
Yahoo! sorts a petabyte with Hadoop
                              First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                         Cloudera releases CDH
               Cloudera adds training, support, services




Tuesday, June 8, 2010
“The Unreasonable Effectiveness of Data”
                   Yahoo! sorts a petabyte with Hadoop
                          First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                         Cloudera releases CDH
               Cloudera adds training, support, services




Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterprise




Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterprise
                        IBM announces InfoSphere BigInsights




Tuesday, June 8, 2010
Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                        IBM announces InfoSphere BigInsights




Tuesday, June 8, 2010
Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                        IBM announces InfoSphere BigInsights
                          Datameer and Karmasphere funded




Tuesday, June 8, 2010
Teradata, Pentaho, and others integrate
              Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                        IBM announces InfoSphere BigInsights
                          Datameer and Karmasphere funded




Tuesday, June 8, 2010
Hive adds JDBC and ODBC
               Teradata, Pentaho, and others integrate
              Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                        IBM announces InfoSphere BigInsights
                          Datameer and Karmasphere funded




Tuesday, June 8, 2010
Hadoop will be an Analytical Data Platform




Tuesday, June 8, 2010
What’s Next?




Tuesday, June 8, 2010
Capture: Log collection and CEP




Tuesday, June 8, 2010
Curate: Workflow and Scheduling




Tuesday, June 8, 2010
Curate: Secondary and Full-Text Indexing




Tuesday, June 8, 2010
Curate: Learn Structure from Data




Tuesday, June 8, 2010
Analyze: Mesos-enabled frameworks




Tuesday, June 8, 2010
Analyze: Link local and global data




Tuesday, June 8, 2010
All behind a single pane of glass




Tuesday, June 8, 2010
Cloudera Desktop
                        Making Many Computers Feel Like One




Tuesday, June 8, 2010
(c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0




Tuesday, June 8, 2010

More Related Content

Similar to 20100608sigmod

Google App Engine - Devfest India 2010
Google App Engine -  Devfest India 2010Google App Engine -  Devfest India 2010
Google App Engine - Devfest India 2010Patrick Chanezon
 
Html5 apps nikolaionken-08-06
Html5 apps nikolaionken-08-06Html5 apps nikolaionken-08-06
Html5 apps nikolaionken-08-06Skills Matter
 
Fosdem chef-101-app-deploy
Fosdem chef-101-app-deployFosdem chef-101-app-deploy
Fosdem chef-101-app-deployjtimberman
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperabilityparker01
 
Javascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSJavascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSSylvain Zimmer
 
Opscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as CodeOpscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as CodeJohn Willis
 
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...Patrick Chanezon
 
Shifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveShifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveWAN-IFRA
 
Red Dirt Ruby Conference
Red Dirt Ruby ConferenceRed Dirt Ruby Conference
Red Dirt Ruby ConferenceJohn Woodell
 
Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Philip Bourne
 
GlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIsGlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIsRestlet
 
Open End To End Js Stack
Open End To End Js StackOpen End To End Js Stack
Open End To End Js StackSkills Matter
 
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...Google Developer Relations Team
 
A Match Made In The Cloud
A Match Made In The CloudA Match Made In The Cloud
A Match Made In The CloudChapter Three
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETLkabrilake
 
ISMB 2010 BioCatalogue presentation
ISMB 2010 BioCatalogue presentationISMB 2010 BioCatalogue presentation
ISMB 2010 BioCatalogue presentationBioCatalogue
 
Flowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDBFlowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDBFlowdock
 

Similar to 20100608sigmod (20)

Google App Engine - Devfest India 2010
Google App Engine -  Devfest India 2010Google App Engine -  Devfest India 2010
Google App Engine - Devfest India 2010
 
Html5 apps nikolaionken-08-06
Html5 apps nikolaionken-08-06Html5 apps nikolaionken-08-06
Html5 apps nikolaionken-08-06
 
Fosdem chef-101-app-deploy
Fosdem chef-101-app-deployFosdem chef-101-app-deploy
Fosdem chef-101-app-deploy
 
App Engine Meetup
App Engine MeetupApp Engine Meetup
App Engine Meetup
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
 
Railsconf 2010
Railsconf 2010Railsconf 2010
Railsconf 2010
 
Javascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJSJavascript Views, Client-side or Server-side with NodeJS
Javascript Views, Client-side or Server-side with NodeJS
 
Opscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as CodeOpscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as Code
 
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
 
Shifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveShifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspective
 
Tech WG report 2011
Tech WG report 2011Tech WG report 2011
Tech WG report 2011
 
Red Dirt Ruby Conference
Red Dirt Ruby ConferenceRed Dirt Ruby Conference
Red Dirt Ruby Conference
 
Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020
 
GlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIsGlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIs
 
Open End To End Js Stack
Open End To End Js StackOpen End To End Js Stack
Open End To End Js Stack
 
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
 
A Match Made In The Cloud
A Match Made In The CloudA Match Made In The Cloud
A Match Made In The Cloud
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
 
ISMB 2010 BioCatalogue presentation
ISMB 2010 BioCatalogue presentationISMB 2010 BioCatalogue presentation
ISMB 2010 BioCatalogue presentation
 
Flowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDBFlowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDB
 

More from Jeff Hammerbacher (20)

20100513brown
20100513brown20100513brown
20100513brown
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100418sos
20100418sos20100418sos
20100418sos
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081022cca
20081022cca20081022cca
20081022cca
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
 

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

20100608sigmod

  • 2. Evolving a New Analytical Platform What Works and What’s Missing Jeff Hammerbacher Chief Scientist and Vice President of Products, Cloudera June 8, 2010 Tuesday, June 8, 2010
  • 3. My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant on Wall Street ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Chief Scientist ▪ Also, check out the book “Beautiful Data” Tuesday, June 8, 2010
  • 4. Presentation Outline ▪ BI: Science for Profit ▪ Need tools for whole research cycle ▪ SQL Server 2008 R2: defining the platform ▪ State of the Platform Ecosystem ▪ New Foundations: Hadoop ▪ Boiling the Frog ▪ Future developments ▪ Questions and Discussion Tuesday, June 8, 2010
  • 5. BI is looking more like science (for profit) Tuesday, June 8, 2010
  • 6. Jim Gray: Science entering Fourth Paradigm “We have to do better at producing tools to support the whole research cycle” Tuesday, June 8, 2010
  • 7. RDBMS only a small part of this tool set Tuesday, June 8, 2010
  • 8. Example: SQL Server 2008 R2 Tuesday, June 8, 2010
  • 10. ETL: SQL Server Integration Services RDBMS: SQL Server Tuesday, June 8, 2010
  • 11. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Tuesday, June 8, 2010
  • 12. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Tuesday, June 8, 2010
  • 13. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Tuesday, June 8, 2010
  • 14. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Tuesday, June 8, 2010
  • 15. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  • 16. MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  • 17. Collaboration: SharePoint MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Tuesday, June 8, 2010
  • 18. What do we call this unified suite? Tuesday, June 8, 2010
  • 19. For today: Analytical Data Platform Tuesday, June 8, 2010
  • 20. Who makes up the platform ecosystem? Tuesday, June 8, 2010
  • 22. Infrastructure Providers Platform Providers Tuesday, June 8, 2010
  • 23. Infrastructure Providers Platform Providers Application Developers Tuesday, June 8, 2010
  • 24. Content Providers Infrastructure Providers Platform Providers Application Developers Tuesday, June 8, 2010
  • 25. Content Providers Infrastructure Providers Platform Providers Application Developers End Users Tuesday, June 8, 2010
  • 26. What is new about the ecosystem today? Tuesday, June 8, 2010
  • 27. Content Providers 1. > 95% of enterprise data is unstructured 2. Data volumes growing rapidly Tuesday, June 8, 2010
  • 28. Infrastructure Providers 1. Cloud 2. Warehouse-Scale Computers Tuesday, June 8, 2010
  • 29. Platform Providers 1. Open source 2. Driven by consumer web properties Tuesday, June 8, 2010
  • 30. Application Developers 1. Data Scientists 2. Diversity of languages Tuesday, June 8, 2010
  • 31. End Users 1. Move beyond reporting to analytics 2. Make use of all enterprise data Tuesday, June 8, 2010
  • 32. New foundations: HDFS and MapReduce Tuesday, June 8, 2010
  • 33. (This is what boiling a frog feels like) Tuesday, June 8, 2010
  • 34. 2005: Doug/Mike start project inside Nutch Tuesday, June 8, 2010
  • 35. 2006: Doug joins Yahoo! Tuesday, June 8, 2010
  • 36. 2007: Make Hadoop scale Tuesday, June 8, 2010
  • 37. 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  • 38. Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  • 39. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Tuesday, June 8, 2010
  • 40. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Powerset makes HBase open source Tuesday, June 8, 2010
  • 41. 2008: Make Hadoop fast Tuesday, June 8, 2010
  • 42. 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Tuesday, June 8, 2010
  • 43. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Tuesday, June 8, 2010
  • 44. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  • 45. Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  • 46. “MapReduce: A Major Step Backwards” Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Tuesday, June 8, 2010
  • 47. 2009: Insert Hadoop into the enterprise Tuesday, June 8, 2010
  • 48. 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  • 49. First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  • 50. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Tuesday, June 8, 2010
  • 51. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Tuesday, June 8, 2010
  • 52. “The Unreasonable Effectiveness of Data” Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Tuesday, June 8, 2010
  • 53. 2010: Integrate Hadoop into the enterprise Tuesday, June 8, 2010
  • 54. 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Tuesday, June 8, 2010
  • 55. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Tuesday, June 8, 2010
  • 56. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  • 57. Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  • 58. Hive adds JDBC and ODBC Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Tuesday, June 8, 2010
  • 59. Hadoop will be an Analytical Data Platform Tuesday, June 8, 2010
  • 61. Capture: Log collection and CEP Tuesday, June 8, 2010
  • 62. Curate: Workflow and Scheduling Tuesday, June 8, 2010
  • 63. Curate: Secondary and Full-Text Indexing Tuesday, June 8, 2010
  • 64. Curate: Learn Structure from Data Tuesday, June 8, 2010
  • 66. Analyze: Link local and global data Tuesday, June 8, 2010
  • 67. All behind a single pane of glass Tuesday, June 8, 2010
  • 68. Cloudera Desktop Making Many Computers Feel Like One Tuesday, June 8, 2010
  • 69. (c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Tuesday, June 8, 2010