SlideShare a Scribd company logo
1 of 15
Cloudera:Cloudera:
Hadoop for the EnterpriseHadoop for the Enterprise
September 2008September 2008
Data Growing Much Faster thanData Growing Much Faster than
Moore’s LawMoore’s Law
04/21/17
Cloudera ConfidentialCloudera Confidential 22
Source: Richard Winter,
Why Are Data
Warehouses Growing
so Fast?, April 2008
Uniprocessor PerformanceUniprocessor Performance
04/21/17
33Cloudera ConfidentialCloudera Confidential
Founding TeamFounding Team
• Mike Olson, CEOMike Olson, CEO
– CEO SleepycatCEO Sleepycat
– Britton Lee, Illustra,Britton Lee, Illustra,
Informix, OracleInformix, Oracle
– BA, MS CS, BerkeleyBA, MS CS, Berkeley
• Amr Awadallah, CTO, VPAmr Awadallah, CTO, VP
EngineeringEngineering
– Founder Aptivia/VivaSmartFounder Aptivia/VivaSmart
– 8 years at Yahoo! running8 years at Yahoo! running
BI infrastructure, includingBI infrastructure, including
HadoopHadoop
– PhD EE, StanfordPhD EE, Stanford
• Christophe Bisciglia, VPChristophe Bisciglia, VP
TechnologyTechnology
– Created Google/NSFCreated Google/NSF
Hadoop cluster andHadoop cluster and
programprogram
– BA CS, U WashingtonBA CS, U Washington
• Jeff Hammerbacher, VPJeff Hammerbacher, VP
ProductProduct
– Ran world’s largestRan world’s largest
operational BI supportoperational BI support
system on Hadoop, atsystem on Hadoop, at
FacebookFacebook
– BA Mathematics, HarvardBA Mathematics, Harvard
04/21/17
44Cloudera ConfidentialCloudera Confidential
What Is Hadoop?What Is Hadoop?
• Core engine:Core engine:
– Open source implementation of Google’sOpen source implementation of Google’s
MapReduce and GFSMapReduce and GFS
– Hundreds or thousands of serversHundreds or thousands of servers
parallelize a data analysis taskparallelize a data analysis task
• Interfaces built on top of MapReduceInterfaces built on top of MapReduce
• Storage layer beneath (HDFS)Storage layer beneath (HDFS)
• Doug Cutting, Mike Cafarella areDoug Cutting, Mike Cafarella are
advisorsadvisors
04/21/17
55Cloudera ConfidentialCloudera Confidential
Hadoop is Open SourceHadoop is Open Source
• Hadoop is distributed under the Apache License:Hadoop is distributed under the Apache License:
– Reduces concern about lock-inReduces concern about lock-in
– Low-cost, effective distribution strategyLow-cost, effective distribution strategy
– Allows innovation by partners, customersAllows innovation by partners, customers
– Third-party inspection of source code providesThird-party inspection of source code provides
assurances on security, product qualityassurances on security, product quality
• Business-friendly license encourages commercialBusiness-friendly license encourages commercial
developmentdevelopment
– ““Open core” licensingOpen core” licensing
– Closed-source components, applicationsClosed-source components, applications
04/21/17
66Cloudera ConfidentialCloudera Confidential
Hadoop UsersHadoop Users
04/21/17
77Cloudera ConfidentialCloudera Confidential
Momentum: Google TrendsMomentum: Google Trends
04/21/17
88Cloudera ConfidentialCloudera Confidential
Netezza: $127M in FY08, $79M in FY07
Teradata: $830M in 1H08, $1.7B in FY07
Worldwide PhenomenonWorldwide Phenomenon
04/21/17
99Cloudera ConfidentialCloudera Confidential
Source:
Google Insights
world map for
searches on
“hadoop”,
Sept 2008.
Why is Hadoop Successful?Why is Hadoop Successful?
• BringsBrings computation closer to datacomputation closer to data
allowing both IO and computeallowing both IO and compute
scalability.scalability.
• Map-ReduceMap-Reduce forces developers toforces developers to thinkthink
in a parallel wayin a parallel way
• Operates onOperates on unstructured dataunstructured data , and, and
structured datastructured data (HBASE, HIVE)(HBASE, HIVE)
• Prescriptive developmentPrescriptive development , grows with, grows with
you without needing to re-architectyou without needing to re-architect
• Procedural languageProcedural language offers poweroffers power
04/21/17
1010Cloudera ConfidentialCloudera Confidential
Current Systems Isolate Users fromCurrent Systems Isolate Users from
the Event Level Raw Datathe Event Level Raw Data
File Server Farm for Warehouse (File Server Farm for Warehouse (non-queryablenon-queryable))
Warehouse Pre-ProcessingWarehouse Pre-Processing
InstrumentationInstrumentation
Log CollectionLog Collection
Datamart DatabaseDatamart Database
BI ReportingBI Reporting
MySQLMySQL
MemCachedMemCached
Live Web SiteLive Web SiteData MiningData Mining
R, Weka,R, Weka,
SAS, SPSSSAS, SPSS
ETLETL ETLETL ETLETL
ETLETL ETLETL ETLETL
Non-Consumption
Expensive ETL Grids
Expensive ETL Grids
04/21/17
1111Cloudera ConfidentialCloudera Confidential
Solution: “Smart” Storage ServiceSolution: “Smart” Storage Service
Smart Storage: Grid For File Storage & Data ProcessingSmart Storage: Grid For File Storage & Data Processing
Warehouse Pre-ProcessingWarehouse Pre-Processing
InstrumentationInstrumentation
Log CollectionLog Collection
Datamart DatabaseDatamart Database
BI ReportingBI Reporting
MySQLMySQL
MemCachedMemCached
Live Web SiteLive Web SiteData MiningData Mining
R, Weka,R, Weka,
SAS, SPSSSAS, SPSS
Enable Consumption
Eliminate Expensive
ETL Grids
Eliminate Expensive
ETL Grids
04/21/17
1212Cloudera ConfidentialCloudera Confidential
BDP versus OLAP/OLTPBDP versus OLAP/OLTP
Schema
Complexity
Processing
Freedom
Table Join Complexity
Concurrent
Jobs
Responsiveness
Per Job
Data Volume
Data Update
Pattern
100TB
Unstructured
100TB
1PB
Append OnlyRead/Write
100PB
Total Data Volume
Structured
SQL
Generic
Data
Processing
Batch
Interactive
1000
100 Tables
10PB
1PB
10PB
100PB
OLAP/OLTP
Batch Data
Processing
04/21/17
1313Cloudera ConfidentialCloudera Confidential
04/21/17
Cloudera ConfidentialCloudera Confidential 1414
Source:
Merrill Lynch
Industry
Overview,
May 7, 2008
Cloudera DifferentiatorsCloudera Differentiators
• Enabling Hadoop as an elastic platform withEnabling Hadoop as an elastic platform with
statistical multiplexing over many customersstatistical multiplexing over many customers
• Multi-Tenant Support:Multi-Tenant Support: Concurrency, Priority, NamespaceConcurrency, Priority, Namespace
Isolation, Performance Isolation.Isolation, Performance Isolation.
• Monitoring, Reliability, and AvailabilityMonitoring, Reliability, and Availability
• Resilience and Fast RecoveryResilience and Fast Recovery : A: A non-sexy problemnon-sexy problem
that isthat is critical to enterprisescritical to enterprises , no time to restart ETL job, no time to restart ETL job
from scratch, otherwise misses SLA.from scratch, otherwise misses SLA.
• IDEIDE to easilyto easily debug, deploy, and tune.debug, deploy, and tune.
• Integration withIntegration with data mining and analysisdata mining and analysis functionality (R,functionality (R,
Weka, SAS, SPSS)Weka, SAS, SPSS)
• Connector certificationConnector certification : another non-sexy problem that is: another non-sexy problem that is
ignored by community, make sure system is compatible withignored by community, make sure system is compatible with
other enterprise systems.other enterprise systems.
04/21/17
1515Cloudera ConfidentialCloudera Confidential

More Related Content

What's hot

Peleton pitch deck
Peleton pitch deckPeleton pitch deck
Peleton pitch deckPitch Decks
 
How Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksHow Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksWealthsimple
 
Pitch Deck Teardown: Gable's $12M Series A deck
Pitch Deck Teardown: Gable's $12M Series A deckPitch Deck Teardown: Gable's $12M Series A deck
Pitch Deck Teardown: Gable's $12M Series A deckHajeJanKamps
 
MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)Robin Wauters
 
Monzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deck
Monzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deckMonzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deck
Monzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deckAA BB
 
Snyk investor deck late 2015 short
Snyk investor deck late 2015   shortSnyk investor deck late 2015   short
Snyk investor deck late 2015 shortEd Sim
 
Zenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateZenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateJoseph Hsieh
 
WeWork: $382K VC investment turned into $50B. WeWork's initial pitch deck
WeWork: $382K VC investment turned into $50B. WeWork's initial pitch deckWeWork: $382K VC investment turned into $50B. WeWork's initial pitch deck
WeWork: $382K VC investment turned into $50B. WeWork's initial pitch deckAA BB
 
WeWork Pitch Deck 2014
WeWork Pitch Deck 2014WeWork Pitch Deck 2014
WeWork Pitch Deck 2014startuphome
 
Crossbeam
CrossbeamCrossbeam
CrossbeamPPerksi
 
Reddit Pitch Deck
Reddit Pitch DeckReddit Pitch Deck
Reddit Pitch Deckstartuphome
 
Pillar pitch deck
Pillar pitch deckPillar pitch deck
Pillar pitch deckPitch Decks
 
Contently Pitch Deck
Contently Pitch DeckContently Pitch Deck
Contently Pitch DeckRyan Gum
 
Dwolla Startup Pitch Deck
Dwolla Startup Pitch DeckDwolla Startup Pitch Deck
Dwolla Startup Pitch DeckJoseph Hsieh
 

What's hot (20)

Peleton pitch deck
Peleton pitch deckPeleton pitch deck
Peleton pitch deck
 
How Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksHow Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeks
 
Pitch Deck Teardown: Gable's $12M Series A deck
Pitch Deck Teardown: Gable's $12M Series A deckPitch Deck Teardown: Gable's $12M Series A deck
Pitch Deck Teardown: Gable's $12M Series A deck
 
MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)MySQL fundraising pitch deck ($16 million Series B round - 2003)
MySQL fundraising pitch deck ($16 million Series B round - 2003)
 
Monzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deck
Monzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deckMonzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deck
Monzo: £19.3M VC investment turned into $2B. Monzo's Series C pitch deck
 
Snyk investor deck late 2015 short
Snyk investor deck late 2015   shortSnyk investor deck late 2015   short
Snyk investor deck late 2015 short
 
Zenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateZenpayroll Pitch Deck Template
Zenpayroll Pitch Deck Template
 
WeWork: $382K VC investment turned into $50B. WeWork's initial pitch deck
WeWork: $382K VC investment turned into $50B. WeWork's initial pitch deckWeWork: $382K VC investment turned into $50B. WeWork's initial pitch deck
WeWork: $382K VC investment turned into $50B. WeWork's initial pitch deck
 
SplitBrick Deck
SplitBrick DeckSplitBrick Deck
SplitBrick Deck
 
Alyce
AlyceAlyce
Alyce
 
WeWork Pitch Deck 2014
WeWork Pitch Deck 2014WeWork Pitch Deck 2014
WeWork Pitch Deck 2014
 
Crossbeam
CrossbeamCrossbeam
Crossbeam
 
Reddit Pitch Deck
Reddit Pitch DeckReddit Pitch Deck
Reddit Pitch Deck
 
Bliss deck v1
Bliss deck v1Bliss deck v1
Bliss deck v1
 
Pillar pitch deck
Pillar pitch deckPillar pitch deck
Pillar pitch deck
 
Lunar
LunarLunar
Lunar
 
Heal
HealHeal
Heal
 
Contently Pitch Deck
Contently Pitch DeckContently Pitch Deck
Contently Pitch Deck
 
Dwolla Startup Pitch Deck
Dwolla Startup Pitch DeckDwolla Startup Pitch Deck
Dwolla Startup Pitch Deck
 
Butlr
ButlrButlr
Butlr
 

Similar to Cloudera's Original Pitch Deck from 2008

How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyHarald Erb
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory SpeedAlluxio, Inc.
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioAlluxio, Inc.
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cMaria Colgan
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)Taewan Kim
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentDATAVERSITY
 

Similar to Cloudera's Original Pitch Deck from 2008 (20)

Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with AlluxioThe Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
13회 Oracle Developer Meetup 발표 자료: Oracle Cloud Data Interface(2019.07.20)
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile Development
 

Recently uploaded

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Cloudera's Original Pitch Deck from 2008

  • 1. Cloudera:Cloudera: Hadoop for the EnterpriseHadoop for the Enterprise September 2008September 2008
  • 2. Data Growing Much Faster thanData Growing Much Faster than Moore’s LawMoore’s Law 04/21/17 Cloudera ConfidentialCloudera Confidential 22 Source: Richard Winter, Why Are Data Warehouses Growing so Fast?, April 2008
  • 4. Founding TeamFounding Team • Mike Olson, CEOMike Olson, CEO – CEO SleepycatCEO Sleepycat – Britton Lee, Illustra,Britton Lee, Illustra, Informix, OracleInformix, Oracle – BA, MS CS, BerkeleyBA, MS CS, Berkeley • Amr Awadallah, CTO, VPAmr Awadallah, CTO, VP EngineeringEngineering – Founder Aptivia/VivaSmartFounder Aptivia/VivaSmart – 8 years at Yahoo! running8 years at Yahoo! running BI infrastructure, includingBI infrastructure, including HadoopHadoop – PhD EE, StanfordPhD EE, Stanford • Christophe Bisciglia, VPChristophe Bisciglia, VP TechnologyTechnology – Created Google/NSFCreated Google/NSF Hadoop cluster andHadoop cluster and programprogram – BA CS, U WashingtonBA CS, U Washington • Jeff Hammerbacher, VPJeff Hammerbacher, VP ProductProduct – Ran world’s largestRan world’s largest operational BI supportoperational BI support system on Hadoop, atsystem on Hadoop, at FacebookFacebook – BA Mathematics, HarvardBA Mathematics, Harvard 04/21/17 44Cloudera ConfidentialCloudera Confidential
  • 5. What Is Hadoop?What Is Hadoop? • Core engine:Core engine: – Open source implementation of Google’sOpen source implementation of Google’s MapReduce and GFSMapReduce and GFS – Hundreds or thousands of serversHundreds or thousands of servers parallelize a data analysis taskparallelize a data analysis task • Interfaces built on top of MapReduceInterfaces built on top of MapReduce • Storage layer beneath (HDFS)Storage layer beneath (HDFS) • Doug Cutting, Mike Cafarella areDoug Cutting, Mike Cafarella are advisorsadvisors 04/21/17 55Cloudera ConfidentialCloudera Confidential
  • 6. Hadoop is Open SourceHadoop is Open Source • Hadoop is distributed under the Apache License:Hadoop is distributed under the Apache License: – Reduces concern about lock-inReduces concern about lock-in – Low-cost, effective distribution strategyLow-cost, effective distribution strategy – Allows innovation by partners, customersAllows innovation by partners, customers – Third-party inspection of source code providesThird-party inspection of source code provides assurances on security, product qualityassurances on security, product quality • Business-friendly license encourages commercialBusiness-friendly license encourages commercial developmentdevelopment – ““Open core” licensingOpen core” licensing – Closed-source components, applicationsClosed-source components, applications 04/21/17 66Cloudera ConfidentialCloudera Confidential
  • 7. Hadoop UsersHadoop Users 04/21/17 77Cloudera ConfidentialCloudera Confidential
  • 8. Momentum: Google TrendsMomentum: Google Trends 04/21/17 88Cloudera ConfidentialCloudera Confidential Netezza: $127M in FY08, $79M in FY07 Teradata: $830M in 1H08, $1.7B in FY07
  • 9. Worldwide PhenomenonWorldwide Phenomenon 04/21/17 99Cloudera ConfidentialCloudera Confidential Source: Google Insights world map for searches on “hadoop”, Sept 2008.
  • 10. Why is Hadoop Successful?Why is Hadoop Successful? • BringsBrings computation closer to datacomputation closer to data allowing both IO and computeallowing both IO and compute scalability.scalability. • Map-ReduceMap-Reduce forces developers toforces developers to thinkthink in a parallel wayin a parallel way • Operates onOperates on unstructured dataunstructured data , and, and structured datastructured data (HBASE, HIVE)(HBASE, HIVE) • Prescriptive developmentPrescriptive development , grows with, grows with you without needing to re-architectyou without needing to re-architect • Procedural languageProcedural language offers poweroffers power 04/21/17 1010Cloudera ConfidentialCloudera Confidential
  • 11. Current Systems Isolate Users fromCurrent Systems Isolate Users from the Event Level Raw Datathe Event Level Raw Data File Server Farm for Warehouse (File Server Farm for Warehouse (non-queryablenon-queryable)) Warehouse Pre-ProcessingWarehouse Pre-Processing InstrumentationInstrumentation Log CollectionLog Collection Datamart DatabaseDatamart Database BI ReportingBI Reporting MySQLMySQL MemCachedMemCached Live Web SiteLive Web SiteData MiningData Mining R, Weka,R, Weka, SAS, SPSSSAS, SPSS ETLETL ETLETL ETLETL ETLETL ETLETL ETLETL Non-Consumption Expensive ETL Grids Expensive ETL Grids 04/21/17 1111Cloudera ConfidentialCloudera Confidential
  • 12. Solution: “Smart” Storage ServiceSolution: “Smart” Storage Service Smart Storage: Grid For File Storage & Data ProcessingSmart Storage: Grid For File Storage & Data Processing Warehouse Pre-ProcessingWarehouse Pre-Processing InstrumentationInstrumentation Log CollectionLog Collection Datamart DatabaseDatamart Database BI ReportingBI Reporting MySQLMySQL MemCachedMemCached Live Web SiteLive Web SiteData MiningData Mining R, Weka,R, Weka, SAS, SPSSSAS, SPSS Enable Consumption Eliminate Expensive ETL Grids Eliminate Expensive ETL Grids 04/21/17 1212Cloudera ConfidentialCloudera Confidential
  • 13. BDP versus OLAP/OLTPBDP versus OLAP/OLTP Schema Complexity Processing Freedom Table Join Complexity Concurrent Jobs Responsiveness Per Job Data Volume Data Update Pattern 100TB Unstructured 100TB 1PB Append OnlyRead/Write 100PB Total Data Volume Structured SQL Generic Data Processing Batch Interactive 1000 100 Tables 10PB 1PB 10PB 100PB OLAP/OLTP Batch Data Processing 04/21/17 1313Cloudera ConfidentialCloudera Confidential
  • 14. 04/21/17 Cloudera ConfidentialCloudera Confidential 1414 Source: Merrill Lynch Industry Overview, May 7, 2008
  • 15. Cloudera DifferentiatorsCloudera Differentiators • Enabling Hadoop as an elastic platform withEnabling Hadoop as an elastic platform with statistical multiplexing over many customersstatistical multiplexing over many customers • Multi-Tenant Support:Multi-Tenant Support: Concurrency, Priority, NamespaceConcurrency, Priority, Namespace Isolation, Performance Isolation.Isolation, Performance Isolation. • Monitoring, Reliability, and AvailabilityMonitoring, Reliability, and Availability • Resilience and Fast RecoveryResilience and Fast Recovery : A: A non-sexy problemnon-sexy problem that isthat is critical to enterprisescritical to enterprises , no time to restart ETL job, no time to restart ETL job from scratch, otherwise misses SLA.from scratch, otherwise misses SLA. • IDEIDE to easilyto easily debug, deploy, and tune.debug, deploy, and tune. • Integration withIntegration with data mining and analysisdata mining and analysis functionality (R,functionality (R, Weka, SAS, SPSS)Weka, SAS, SPSS) • Connector certificationConnector certification : another non-sexy problem that is: another non-sexy problem that is ignored by community, make sure system is compatible withignored by community, make sure system is compatible with other enterprise systems.other enterprise systems. 04/21/17 1515Cloudera ConfidentialCloudera Confidential

Editor's Notes

  1. (Moore’s law is failing, only way to speed up going forward is massive parallelism on grids/multicores).
  2. Furthermore, these expensive ETL grids are only needed a couple of hours in the morning to meet the loading SLA.
  3. Another pain point is resilience to failure: currently when a hadoop job fails you have to restart it all the way from beginning. The community is not spending much time addressing this problem since it is not "sexy", but it is critical for enterprises with strict SLAs to meet. You don't want to have to restart your ETL job from scratch when a failure occurs, there is no time for that. There is a need to snapshot the jobs at intermediate checkpoints so that you don't have to restart all way from beginning in case of failure.