SlideShare a Scribd company logo
1 of 15
Cloudera:Cloudera:
Hadoop for the EnterpriseHadoop for the Enterprise
September 2008September 2008
Data Growing Much Faster thanData Growing Much Faster than
Moore’s LawMoore’s Law
04/21/17
Cloudera ConfidentialCloudera Confidential 22
Source: Richard Winter,
Why Are Data
Warehouses Growing
so Fast?, April 2008
Uniprocessor PerformanceUniprocessor Performance
04/21/17
33Cloudera ConfidentialCloudera Confidential
Founding TeamFounding Team
• Mike Olson, CEOMike Olson, CEO
– CEO SleepycatCEO Sleepycat
– Britton Lee, Illustra,Britton Lee, Illustra,
Informix, OracleInformix, Oracle
– BA, MS CS, BerkeleyBA, MS CS, Berkeley
• Amr Awadallah, CTO, VPAmr Awadallah, CTO, VP
EngineeringEngineering
– Founder Aptivia/VivaSmartFounder Aptivia/VivaSmart
– 8 years at Yahoo! running8 years at Yahoo! running
BI infrastructure, includingBI infrastructure, including
HadoopHadoop
– PhD EE, StanfordPhD EE, Stanford
• Christophe Bisciglia, VPChristophe Bisciglia, VP
TechnologyTechnology
– Created Google/NSFCreated Google/NSF
Hadoop cluster andHadoop cluster and
programprogram
– BA CS, U WashingtonBA CS, U Washington
• Jeff Hammerbacher, VPJeff Hammerbacher, VP
ProductProduct
– Ran world’s largestRan world’s largest
operational BI supportoperational BI support
system on Hadoop, atsystem on Hadoop, at
FacebookFacebook
– BA Mathematics, HarvardBA Mathematics, Harvard
04/21/17
44Cloudera ConfidentialCloudera Confidential
What Is Hadoop?What Is Hadoop?
• Core engine:Core engine:
– Open source implementation of Google’sOpen source implementation of Google’s
MapReduce and GFSMapReduce and GFS
– Hundreds or thousands of serversHundreds or thousands of servers
parallelize a data analysis taskparallelize a data analysis task
• Interfaces built on top of MapReduceInterfaces built on top of MapReduce
• Storage layer beneath (HDFS)Storage layer beneath (HDFS)
• Doug Cutting, Mike Cafarella areDoug Cutting, Mike Cafarella are
advisorsadvisors
04/21/17
55Cloudera ConfidentialCloudera Confidential
Hadoop is Open SourceHadoop is Open Source
• Hadoop is distributed under the Apache License:Hadoop is distributed under the Apache License:
– Reduces concern about lock-inReduces concern about lock-in
– Low-cost, effective distribution strategyLow-cost, effective distribution strategy
– Allows innovation by partners, customersAllows innovation by partners, customers
– Third-party inspection of source code providesThird-party inspection of source code provides
assurances on security, product qualityassurances on security, product quality
• Business-friendly license encourages commercialBusiness-friendly license encourages commercial
developmentdevelopment
– ““Open core” licensingOpen core” licensing
– Closed-source components, applicationsClosed-source components, applications
04/21/17
66Cloudera ConfidentialCloudera Confidential
Hadoop UsersHadoop Users
04/21/17
77Cloudera ConfidentialCloudera Confidential
Momentum: Google TrendsMomentum: Google Trends
04/21/17
88Cloudera ConfidentialCloudera Confidential
Netezza: $127M in FY08, $79M in FY07
Teradata: $830M in 1H08, $1.7B in FY07
Worldwide PhenomenonWorldwide Phenomenon
04/21/17
99Cloudera ConfidentialCloudera Confidential
Source:
Google Insights
world map for
searches on
“hadoop”,
Sept 2008.
Why is Hadoop Successful?Why is Hadoop Successful?
• BringsBrings computation closer to datacomputation closer to data
allowing both IO and computeallowing both IO and compute
scalability.scalability.
• Map-ReduceMap-Reduce forces developers toforces developers to thinkthink
in a parallel wayin a parallel way
• Operates onOperates on unstructured dataunstructured data , and, and
structured datastructured data (HBASE, HIVE)(HBASE, HIVE)
• Prescriptive developmentPrescriptive development , grows with, grows with
you without needing to re-architectyou without needing to re-architect
• Procedural languageProcedural language offers poweroffers power
04/21/17
1010Cloudera ConfidentialCloudera Confidential
Current Systems Isolate Users fromCurrent Systems Isolate Users from
the Event Level Raw Datathe Event Level Raw Data
File Server Farm for Warehouse (File Server Farm for Warehouse (non-queryablenon-queryable))
Warehouse Pre-ProcessingWarehouse Pre-Processing
InstrumentationInstrumentation
Log CollectionLog Collection
Datamart DatabaseDatamart Database
BI ReportingBI Reporting
MySQLMySQL
MemCachedMemCached
Live Web SiteLive Web SiteData MiningData Mining
R, Weka,R, Weka,
SAS, SPSSSAS, SPSS
ETLETL ETLETL ETLETL
ETLETL ETLETL ETLETL
Non-Consumption
Expensive ETL Grids
Expensive ETL Grids
04/21/17
1111Cloudera ConfidentialCloudera Confidential
Solution: “Smart” Storage ServiceSolution: “Smart” Storage Service
Smart Storage: Grid For File Storage & Data ProcessingSmart Storage: Grid For File Storage & Data Processing
Warehouse Pre-ProcessingWarehouse Pre-Processing
InstrumentationInstrumentation
Log CollectionLog Collection
Datamart DatabaseDatamart Database
BI ReportingBI Reporting
MySQLMySQL
MemCachedMemCached
Live Web SiteLive Web SiteData MiningData Mining
R, Weka,R, Weka,
SAS, SPSSSAS, SPSS
Enable Consumption
Eliminate Expensive
ETL Grids
Eliminate Expensive
ETL Grids
04/21/17
1212Cloudera ConfidentialCloudera Confidential
BDP versus OLAP/OLTPBDP versus OLAP/OLTP
Schema
Complexity
Processing
Freedom
Table Join Complexity
Concurrent
Jobs
Responsiveness
Per Job
Data Volume
Data Update
Pattern
100TB
Unstructured
100TB
1PB
Append OnlyRead/Write
100PB
Total Data Volume
Structured
SQL
Generic
Data
Processing
Batch
Interactive
1000
100 Tables
10PB
1PB
10PB
100PB
OLAP/OLTP
Batch Data
Processing
04/21/17
1313Cloudera ConfidentialCloudera Confidential
04/21/17
Cloudera ConfidentialCloudera Confidential 1414
Source:
Merrill Lynch
Industry
Overview,
May 7, 2008
Cloudera DifferentiatorsCloudera Differentiators
• Enabling Hadoop as an elastic platform withEnabling Hadoop as an elastic platform with
statistical multiplexing over many customersstatistical multiplexing over many customers
• Multi-Tenant Support:Multi-Tenant Support: Concurrency, Priority, NamespaceConcurrency, Priority, Namespace
Isolation, Performance Isolation.Isolation, Performance Isolation.
• Monitoring, Reliability, and AvailabilityMonitoring, Reliability, and Availability
• Resilience and Fast RecoveryResilience and Fast Recovery : A: A non-sexy problemnon-sexy problem
that isthat is critical to enterprisescritical to enterprises , no time to restart ETL job, no time to restart ETL job
from scratch, otherwise misses SLA.from scratch, otherwise misses SLA.
• IDEIDE to easilyto easily debug, deploy, and tune.debug, deploy, and tune.
• Integration withIntegration with data mining and analysisdata mining and analysis functionality (R,functionality (R,
Weka, SAS, SPSS)Weka, SAS, SPSS)
• Connector certificationConnector certification : another non-sexy problem that is: another non-sexy problem that is
ignored by community, make sure system is compatible withignored by community, make sure system is compatible with
other enterprise systems.other enterprise systems.
04/21/17
1515Cloudera ConfidentialCloudera Confidential

More Related Content

What's hot

Rippling Investor Memo: $45M Series A — with no pitch deck!
Rippling Investor Memo: $45M Series A — with no pitch deck!Rippling Investor Memo: $45M Series A — with no pitch deck!
Rippling Investor Memo: $45M Series A — with no pitch deck!Pitch Decks
 
Tesla Investor Presentation - Model S
Tesla Investor Presentation - Model STesla Investor Presentation - Model S
Tesla Investor Presentation - Model Sstartuphome
 
Manpacks Pitch Deck
Manpacks Pitch DeckManpacks Pitch Deck
Manpacks Pitch Deckstartuphome
 
Mixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65MMixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65MSuhail Doshi
 
Coinbase Seed Round Pitch Deck
Coinbase Seed Round Pitch DeckCoinbase Seed Round Pitch Deck
Coinbase Seed Round Pitch Deckstartuphome
 
Tinder Pitch Deck
Tinder Pitch DeckTinder Pitch Deck
Tinder Pitch Deckstartuphome
 
Clearblanc pitch deck
Clearblanc pitch deckClearblanc pitch deck
Clearblanc pitch deckPitch Decks
 
500’s Demo Day Batch 14 >> Baker
500’s Demo Day Batch 14 >> Baker500’s Demo Day Batch 14 >> Baker
500’s Demo Day Batch 14 >> Baker500 Startups
 
Sendgrid pitch deck
Sendgrid pitch deckSendgrid pitch deck
Sendgrid pitch deckDavid Cohen
 
MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)Robin Wauters
 
7 Bridges Pitch Deck
7 Bridges Pitch Deck7 Bridges Pitch Deck
7 Bridges Pitch DeckPitch Decks
 
Openfin pitch deck series c
Openfin pitch deck series cOpenfin pitch deck series c
Openfin pitch deck series cPitch Decks
 
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...AA BB
 
Dropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deck
Dropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deckDropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deck
Dropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deckAA BB
 

What's hot (20)

Rippling Investor Memo: $45M Series A — with no pitch deck!
Rippling Investor Memo: $45M Series A — with no pitch deck!Rippling Investor Memo: $45M Series A — with no pitch deck!
Rippling Investor Memo: $45M Series A — with no pitch deck!
 
Tesla Investor Presentation - Model S
Tesla Investor Presentation - Model STesla Investor Presentation - Model S
Tesla Investor Presentation - Model S
 
Manpacks Pitch Deck
Manpacks Pitch DeckManpacks Pitch Deck
Manpacks Pitch Deck
 
Mixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65MMixpanel - Our pitch deck that we used to raise $65M
Mixpanel - Our pitch deck that we used to raise $65M
 
Coinbase Seed Round Pitch Deck
Coinbase Seed Round Pitch DeckCoinbase Seed Round Pitch Deck
Coinbase Seed Round Pitch Deck
 
Tinder Pitch Deck
Tinder Pitch DeckTinder Pitch Deck
Tinder Pitch Deck
 
Cedar
CedarCedar
Cedar
 
Clearblanc pitch deck
Clearblanc pitch deckClearblanc pitch deck
Clearblanc pitch deck
 
Simple. Friendly. Smart.
Simple. Friendly. Smart. Simple. Friendly. Smart.
Simple. Friendly. Smart.
 
500’s Demo Day Batch 14 >> Baker
500’s Demo Day Batch 14 >> Baker500’s Demo Day Batch 14 >> Baker
500’s Demo Day Batch 14 >> Baker
 
Sendgrid pitch deck
Sendgrid pitch deckSendgrid pitch deck
Sendgrid pitch deck
 
MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)
 
7 Bridges Pitch Deck
7 Bridges Pitch Deck7 Bridges Pitch Deck
7 Bridges Pitch Deck
 
Bind
BindBind
Bind
 
Openfin pitch deck series c
Openfin pitch deck series cOpenfin pitch deck series c
Openfin pitch deck series c
 
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
Transferwise: €56K VC investment turned into $3.5B. Transferwise's initial pi...
 
Alan's deck
Alan's deck Alan's deck
Alan's deck
 
Dropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deck
Dropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deckDropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deck
Dropbox: $15K VC investment turned into $16.8B. Dropbox's initial pitch deck
 
Renetec
RenetecRenetec
Renetec
 
Uber Pitch Deck
Uber Pitch DeckUber Pitch Deck
Uber Pitch Deck
 

Similar to Cloudera's Original Pitch Deck from 2008

How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyHarald Erb
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory SpeedAlluxio, Inc.
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioAlluxio, Inc.
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cMaria Colgan
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)Taewan Kim
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentDATAVERSITY
 

Similar to Cloudera's Original Pitch Deck from 2008 (20)

Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile Development
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Cloudera's Original Pitch Deck from 2008

  • 1. Cloudera:Cloudera: Hadoop for the EnterpriseHadoop for the Enterprise September 2008September 2008
  • 2. Data Growing Much Faster thanData Growing Much Faster than Moore’s LawMoore’s Law 04/21/17 Cloudera ConfidentialCloudera Confidential 22 Source: Richard Winter, Why Are Data Warehouses Growing so Fast?, April 2008
  • 4. Founding TeamFounding Team • Mike Olson, CEOMike Olson, CEO – CEO SleepycatCEO Sleepycat – Britton Lee, Illustra,Britton Lee, Illustra, Informix, OracleInformix, Oracle – BA, MS CS, BerkeleyBA, MS CS, Berkeley • Amr Awadallah, CTO, VPAmr Awadallah, CTO, VP EngineeringEngineering – Founder Aptivia/VivaSmartFounder Aptivia/VivaSmart – 8 years at Yahoo! running8 years at Yahoo! running BI infrastructure, includingBI infrastructure, including HadoopHadoop – PhD EE, StanfordPhD EE, Stanford • Christophe Bisciglia, VPChristophe Bisciglia, VP TechnologyTechnology – Created Google/NSFCreated Google/NSF Hadoop cluster andHadoop cluster and programprogram – BA CS, U WashingtonBA CS, U Washington • Jeff Hammerbacher, VPJeff Hammerbacher, VP ProductProduct – Ran world’s largestRan world’s largest operational BI supportoperational BI support system on Hadoop, atsystem on Hadoop, at FacebookFacebook – BA Mathematics, HarvardBA Mathematics, Harvard 04/21/17 44Cloudera ConfidentialCloudera Confidential
  • 5. What Is Hadoop?What Is Hadoop? • Core engine:Core engine: – Open source implementation of Google’sOpen source implementation of Google’s MapReduce and GFSMapReduce and GFS – Hundreds or thousands of serversHundreds or thousands of servers parallelize a data analysis taskparallelize a data analysis task • Interfaces built on top of MapReduceInterfaces built on top of MapReduce • Storage layer beneath (HDFS)Storage layer beneath (HDFS) • Doug Cutting, Mike Cafarella areDoug Cutting, Mike Cafarella are advisorsadvisors 04/21/17 55Cloudera ConfidentialCloudera Confidential
  • 6. Hadoop is Open SourceHadoop is Open Source • Hadoop is distributed under the Apache License:Hadoop is distributed under the Apache License: – Reduces concern about lock-inReduces concern about lock-in – Low-cost, effective distribution strategyLow-cost, effective distribution strategy – Allows innovation by partners, customersAllows innovation by partners, customers – Third-party inspection of source code providesThird-party inspection of source code provides assurances on security, product qualityassurances on security, product quality • Business-friendly license encourages commercialBusiness-friendly license encourages commercial developmentdevelopment – ““Open core” licensingOpen core” licensing – Closed-source components, applicationsClosed-source components, applications 04/21/17 66Cloudera ConfidentialCloudera Confidential
  • 7. Hadoop UsersHadoop Users 04/21/17 77Cloudera ConfidentialCloudera Confidential
  • 8. Momentum: Google TrendsMomentum: Google Trends 04/21/17 88Cloudera ConfidentialCloudera Confidential Netezza: $127M in FY08, $79M in FY07 Teradata: $830M in 1H08, $1.7B in FY07
  • 9. Worldwide PhenomenonWorldwide Phenomenon 04/21/17 99Cloudera ConfidentialCloudera Confidential Source: Google Insights world map for searches on “hadoop”, Sept 2008.
  • 10. Why is Hadoop Successful?Why is Hadoop Successful? • BringsBrings computation closer to datacomputation closer to data allowing both IO and computeallowing both IO and compute scalability.scalability. • Map-ReduceMap-Reduce forces developers toforces developers to thinkthink in a parallel wayin a parallel way • Operates onOperates on unstructured dataunstructured data , and, and structured datastructured data (HBASE, HIVE)(HBASE, HIVE) • Prescriptive developmentPrescriptive development , grows with, grows with you without needing to re-architectyou without needing to re-architect • Procedural languageProcedural language offers poweroffers power 04/21/17 1010Cloudera ConfidentialCloudera Confidential
  • 11. Current Systems Isolate Users fromCurrent Systems Isolate Users from the Event Level Raw Datathe Event Level Raw Data File Server Farm for Warehouse (File Server Farm for Warehouse (non-queryablenon-queryable)) Warehouse Pre-ProcessingWarehouse Pre-Processing InstrumentationInstrumentation Log CollectionLog Collection Datamart DatabaseDatamart Database BI ReportingBI Reporting MySQLMySQL MemCachedMemCached Live Web SiteLive Web SiteData MiningData Mining R, Weka,R, Weka, SAS, SPSSSAS, SPSS ETLETL ETLETL ETLETL ETLETL ETLETL ETLETL Non-Consumption Expensive ETL Grids Expensive ETL Grids 04/21/17 1111Cloudera ConfidentialCloudera Confidential
  • 12. Solution: “Smart” Storage ServiceSolution: “Smart” Storage Service Smart Storage: Grid For File Storage & Data ProcessingSmart Storage: Grid For File Storage & Data Processing Warehouse Pre-ProcessingWarehouse Pre-Processing InstrumentationInstrumentation Log CollectionLog Collection Datamart DatabaseDatamart Database BI ReportingBI Reporting MySQLMySQL MemCachedMemCached Live Web SiteLive Web SiteData MiningData Mining R, Weka,R, Weka, SAS, SPSSSAS, SPSS Enable Consumption Eliminate Expensive ETL Grids Eliminate Expensive ETL Grids 04/21/17 1212Cloudera ConfidentialCloudera Confidential
  • 13. BDP versus OLAP/OLTPBDP versus OLAP/OLTP Schema Complexity Processing Freedom Table Join Complexity Concurrent Jobs Responsiveness Per Job Data Volume Data Update Pattern 100TB Unstructured 100TB 1PB Append OnlyRead/Write 100PB Total Data Volume Structured SQL Generic Data Processing Batch Interactive 1000 100 Tables 10PB 1PB 10PB 100PB OLAP/OLTP Batch Data Processing 04/21/17 1313Cloudera ConfidentialCloudera Confidential
  • 14. 04/21/17 Cloudera ConfidentialCloudera Confidential 1414 Source: Merrill Lynch Industry Overview, May 7, 2008
  • 15. Cloudera DifferentiatorsCloudera Differentiators • Enabling Hadoop as an elastic platform withEnabling Hadoop as an elastic platform with statistical multiplexing over many customersstatistical multiplexing over many customers • Multi-Tenant Support:Multi-Tenant Support: Concurrency, Priority, NamespaceConcurrency, Priority, Namespace Isolation, Performance Isolation.Isolation, Performance Isolation. • Monitoring, Reliability, and AvailabilityMonitoring, Reliability, and Availability • Resilience and Fast RecoveryResilience and Fast Recovery : A: A non-sexy problemnon-sexy problem that isthat is critical to enterprisescritical to enterprises , no time to restart ETL job, no time to restart ETL job from scratch, otherwise misses SLA.from scratch, otherwise misses SLA. • IDEIDE to easilyto easily debug, deploy, and tune.debug, deploy, and tune. • Integration withIntegration with data mining and analysisdata mining and analysis functionality (R,functionality (R, Weka, SAS, SPSS)Weka, SAS, SPSS) • Connector certificationConnector certification : another non-sexy problem that is: another non-sexy problem that is ignored by community, make sure system is compatible withignored by community, make sure system is compatible with other enterprise systems.other enterprise systems. 04/21/17 1515Cloudera ConfidentialCloudera Confidential

Editor's Notes

  1. (Moore’s law is failing, only way to speed up going forward is massive parallelism on grids/multicores).
  2. Furthermore, these expensive ETL grids are only needed a couple of hours in the morning to meet the loading SLA.
  3. Another pain point is resilience to failure: currently when a hadoop job fails you have to restart it all the way from beginning. The community is not spending much time addressing this problem since it is not "sexy", but it is critical for enterprises with strict SLAs to meet. You don't want to have to restart your ETL job from scratch when a failure occurs, there is no time for that. There is a need to snapshot the jobs at intermediate checkpoints so that you don't have to restart all way from beginning in case of failure.