SlideShare a Scribd company logo
1 of 24
Download to read offline
Cloudera’s Distribution
     for Hadoop
        Oct 2, 2009
        Todd Lipcon
      (todd@cloudera.com)
What is CDH?
What’s a Distribution?
     How many of you get your apache
     httpd from apache.org?
What’s a Distribution?
     How many of you get your apache
     httpd from apache.org?
     Pretty much everyone uses Linux
     distributions to get software
     CDH is a Hadoop distribution in the
     same way that Ubuntu is a Linux
     distribution
What is CDH?
    Apache Hadoop and its ecosystem,
    packaged up and easier to install
    RPM, Debian, and tarball installs
    Better Linux citizenship
    Maintained and tested patch series on
    top of upstream
    Ecosystem compatibility guarantees
What’s in CDH?
CDH - Included Packages
    Apache Hadoop (MR, HDFS, and
    Common)
    Apache Pig
    Apache Hive
    Cloudera Desktop
    HBase and ZooKeeper (contributed by
    HBase team)
    ... more to come
Installation Options
     APT and Yum repositories
     apt-get install hadoop
     yum install hadoop
     hadoop-conf-pseudo package to get
     started
     tarball
CDH on Amazon EC2
    hadoop-ec2 launch-cluster
    todd-cluster 20
    Support for HDFS on EBS volumes
    (better performance than S3)
    Cloudera Desktop automatically
    installed and launched
    Great if your data is already on EBS or
    S3
CDH on Amazon EC2
    hadoop-ec2 launch-cluster
    todd-cluster 20
    Support for HDFS on EBS volumes
    (better performance than S3)
    Cloudera Desktop automatically
    installed and launched
    Great if your data is already on EBS or
    S3
    Soon to come: VMware (vCloud) and
    Rackspace
Linux citizenship
     Hadoop should act like other software
     you’re used to
     Configuration using alternatives in
     /etc
     Logs in /var/log
     Start/stop with init.d services
Patches in CDH
    Get bug fixes early
    Backport “Safe” new features
        Sqoop, MRUnit
        Fair Scheduler on 18
        /metrics servlet
        S3 fixes
        etc...
    Backport “Really Safe” performance
    patches
What exactly am I getting?
     Hadoop in CDH is still Apache 2.0
     Read the changelog:
     ...hadoop-0.20/cloudera/CHANGES.cloudera.txt

     Read the patches:
     ...hadoop-0.20/cloudera/patches/

     Build it yourself:
     ...hadoop-0.20/cloudera/do-release-build
Is this a fork?
Is this a fork?



             No way!
Is this a fork?
No way!

          All functionality patches submitted
          upstream (some build-system patches
          only apply to our build)
          We employ 2 committers fulltime, plus
          several contributors
          We regularly meet and work with other
          community members from Yahoo!,
          Facebook, etc.
My one commercial plug
...gotta pay the bills

           We provide paid support for CDH
           Someone to call if your cluster is down
           Access to knowledgeable Hadoop
           engineers
           Configuration and tuning help
           Process design reviews
           Prioritize patches you need (and hot
           fixes for critical issues)
           </salesman>
Versions of CDH
Versions of CDH
    Debian versioning scheme
    stable
        no new features, lots of “soak time”
        comparable to RHEL 5, Ubuntu LTS, or
        Debian stable
        recommended for critical production
        deployments
Versions of CDH
    Debian versioning scheme
    testing
        considered usable - testing, not
        untested!
        has whiz-bang features and newer
        versions
        recommended for shops who like the
        bleeding edge, or for those in PoC/dev
        stage
Versions of CDH
    CDH1 (stable)
       Released March ’09
       Hadoop 0.18.3, Hive 0.3, Pig 0.2
       Will become oldstable this winter
    CDH2 (testing)
       Released June ’09
       Hadoop 0.18.3, Hadoop 0.20.1, Pig 0.5,
       Hive 0.4, HBase 0.20
       Can install 0.18 and 0.20 at the same
       time
       Will become stable this winter
CDH2 Package Versioning

 hadoop-0.18-0.18.3+65-1.cloudera.noarch.rpm
 A hadoop package based on Apache Hadoop
 0.18.3 with 65 patches


 hadoop-0.20-0.20.0+4.4-1.cloudera.noarch.rpm
 A hadoop package based on Apache Hadoop
 0.20.0 with 4 patches in testing, 4
 security/critical fixes
Where do I get CDH?
http://archive.cloudera.com/
Questions?

More Related Content

What's hot

Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 

What's hot (20)

Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop introduction seminar presentation
Hadoop introduction seminar presentationHadoop introduction seminar presentation
Hadoop introduction seminar presentation
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradation
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
 

Viewers also liked

Viewers also liked (20)

Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
 
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
 
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsCase study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
Hadoop.TW : Now and Future
Hadoop.TW : Now and FutureHadoop.TW : Now and Future
Hadoop.TW : Now and Future
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
 
What is Spark
What is SparkWhat is Spark
What is Spark
 
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
 
cdh
cdhcdh
cdh
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
eris:db -- Typical Account Types
eris:db -- Typical Account Typeseris:db -- Typical Account Types
eris:db -- Typical Account Types
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 

Similar to Hw09 Clouderas Distribution For Hadoop

Single node setup
Single node setupSingle node setup
Single node setup
KBCHOW123
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
Søren Lund
 

Similar to Hw09 Clouderas Distribution For Hadoop (20)

Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
 
Unit 5
Unit  5Unit  5
Unit 5
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Lecture 2 part 2
Lecture 2 part 2Lecture 2 part 2
Lecture 2 part 2
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
The Gory Details of Debian packages
The Gory Details of Debian packagesThe Gory Details of Debian packages
The Gory Details of Debian packages
 
Single node setup
Single node setupSingle node setup
Single node setup
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Bigdata
BigdataBigdata
Bigdata
 
Talend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_enTalend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_en
 
LCA 2014 project-builder.org presentation
LCA 2014 project-builder.org presentationLCA 2014 project-builder.org presentation
LCA 2014 project-builder.org presentation
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
 
Picking a distro_1_
Picking a distro_1_Picking a distro_1_
Picking a distro_1_
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Hw09 Clouderas Distribution For Hadoop

  • 1. Cloudera’s Distribution for Hadoop Oct 2, 2009 Todd Lipcon (todd@cloudera.com)
  • 3. What’s a Distribution? How many of you get your apache httpd from apache.org?
  • 4. What’s a Distribution? How many of you get your apache httpd from apache.org? Pretty much everyone uses Linux distributions to get software CDH is a Hadoop distribution in the same way that Ubuntu is a Linux distribution
  • 5. What is CDH? Apache Hadoop and its ecosystem, packaged up and easier to install RPM, Debian, and tarball installs Better Linux citizenship Maintained and tested patch series on top of upstream Ecosystem compatibility guarantees
  • 7. CDH - Included Packages Apache Hadoop (MR, HDFS, and Common) Apache Pig Apache Hive Cloudera Desktop HBase and ZooKeeper (contributed by HBase team) ... more to come
  • 8. Installation Options APT and Yum repositories apt-get install hadoop yum install hadoop hadoop-conf-pseudo package to get started tarball
  • 9. CDH on Amazon EC2 hadoop-ec2 launch-cluster todd-cluster 20 Support for HDFS on EBS volumes (better performance than S3) Cloudera Desktop automatically installed and launched Great if your data is already on EBS or S3
  • 10. CDH on Amazon EC2 hadoop-ec2 launch-cluster todd-cluster 20 Support for HDFS on EBS volumes (better performance than S3) Cloudera Desktop automatically installed and launched Great if your data is already on EBS or S3 Soon to come: VMware (vCloud) and Rackspace
  • 11. Linux citizenship Hadoop should act like other software you’re used to Configuration using alternatives in /etc Logs in /var/log Start/stop with init.d services
  • 12. Patches in CDH Get bug fixes early Backport “Safe” new features Sqoop, MRUnit Fair Scheduler on 18 /metrics servlet S3 fixes etc... Backport “Really Safe” performance patches
  • 13. What exactly am I getting? Hadoop in CDH is still Apache 2.0 Read the changelog: ...hadoop-0.20/cloudera/CHANGES.cloudera.txt Read the patches: ...hadoop-0.20/cloudera/patches/ Build it yourself: ...hadoop-0.20/cloudera/do-release-build
  • 14. Is this a fork?
  • 15. Is this a fork? No way!
  • 16. Is this a fork? No way! All functionality patches submitted upstream (some build-system patches only apply to our build) We employ 2 committers fulltime, plus several contributors We regularly meet and work with other community members from Yahoo!, Facebook, etc.
  • 17. My one commercial plug ...gotta pay the bills We provide paid support for CDH Someone to call if your cluster is down Access to knowledgeable Hadoop engineers Configuration and tuning help Process design reviews Prioritize patches you need (and hot fixes for critical issues) </salesman>
  • 19. Versions of CDH Debian versioning scheme stable no new features, lots of “soak time” comparable to RHEL 5, Ubuntu LTS, or Debian stable recommended for critical production deployments
  • 20. Versions of CDH Debian versioning scheme testing considered usable - testing, not untested! has whiz-bang features and newer versions recommended for shops who like the bleeding edge, or for those in PoC/dev stage
  • 21. Versions of CDH CDH1 (stable) Released March ’09 Hadoop 0.18.3, Hive 0.3, Pig 0.2 Will become oldstable this winter CDH2 (testing) Released June ’09 Hadoop 0.18.3, Hadoop 0.20.1, Pig 0.5, Hive 0.4, HBase 0.20 Can install 0.18 and 0.20 at the same time Will become stable this winter
  • 22. CDH2 Package Versioning hadoop-0.18-0.18.3+65-1.cloudera.noarch.rpm A hadoop package based on Apache Hadoop 0.18.3 with 65 patches hadoop-0.20-0.20.0+4.4-1.cloudera.noarch.rpm A hadoop package based on Apache Hadoop 0.20.0 with 4 patches in testing, 4 security/critical fixes
  • 23. Where do I get CDH? http://archive.cloudera.com/