Hw09   Clouderas Distribution For Hadoop
 

Hw09 Clouderas Distribution For Hadoop

on

  • 3,804 views

 

Statistics

Views

Total Views
3,804
Views on SlideShare
3,791
Embed Views
13

Actions

Likes
1
Downloads
117
Comments
1

1 Embed 13

http://www.slideshare.net 13

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • How to configure Hadoop on RedHat linux in cluster mode.

    Mob: +91-8824693196
    email: shalaindrasaraswat@gmail.com
    ojt-shalaindra.kumar@hcl.com
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hw09   Clouderas Distribution For Hadoop Hw09 Clouderas Distribution For Hadoop Presentation Transcript

  • Cloudera’s Distribution for Hadoop Oct 2, 2009 Todd Lipcon (todd@cloudera.com)
  • What is CDH?
  • What’s a Distribution? How many of you get your apache httpd from apache.org?
  • What’s a Distribution? How many of you get your apache httpd from apache.org? Pretty much everyone uses Linux distributions to get software CDH is a Hadoop distribution in the same way that Ubuntu is a Linux distribution
  • What is CDH? Apache Hadoop and its ecosystem, packaged up and easier to install RPM, Debian, and tarball installs Better Linux citizenship Maintained and tested patch series on top of upstream Ecosystem compatibility guarantees
  • What’s in CDH?
  • CDH - Included Packages Apache Hadoop (MR, HDFS, and Common) Apache Pig Apache Hive Cloudera Desktop HBase and ZooKeeper (contributed by HBase team) ... more to come
  • Installation Options APT and Yum repositories apt-get install hadoop yum install hadoop hadoop-conf-pseudo package to get started tarball
  • CDH on Amazon EC2 hadoop-ec2 launch-cluster todd-cluster 20 Support for HDFS on EBS volumes (better performance than S3) Cloudera Desktop automatically installed and launched Great if your data is already on EBS or S3
  • CDH on Amazon EC2 hadoop-ec2 launch-cluster todd-cluster 20 Support for HDFS on EBS volumes (better performance than S3) Cloudera Desktop automatically installed and launched Great if your data is already on EBS or S3 Soon to come: VMware (vCloud) and Rackspace
  • Linux citizenship Hadoop should act like other software you’re used to Configuration using alternatives in /etc Logs in /var/log Start/stop with init.d services
  • Patches in CDH Get bug fixes early Backport “Safe” new features Sqoop, MRUnit Fair Scheduler on 18 /metrics servlet S3 fixes etc... Backport “Really Safe” performance patches
  • What exactly am I getting? Hadoop in CDH is still Apache 2.0 Read the changelog: ...hadoop-0.20/cloudera/CHANGES.cloudera.txt Read the patches: ...hadoop-0.20/cloudera/patches/ Build it yourself: ...hadoop-0.20/cloudera/do-release-build
  • Is this a fork?
  • Is this a fork? No way!
  • Is this a fork? No way! All functionality patches submitted upstream (some build-system patches only apply to our build) We employ 2 committers fulltime, plus several contributors We regularly meet and work with other community members from Yahoo!, Facebook, etc.
  • My one commercial plug ...gotta pay the bills We provide paid support for CDH Someone to call if your cluster is down Access to knowledgeable Hadoop engineers Configuration and tuning help Process design reviews Prioritize patches you need (and hot fixes for critical issues) </salesman>
  • Versions of CDH
  • Versions of CDH Debian versioning scheme stable no new features, lots of “soak time” comparable to RHEL 5, Ubuntu LTS, or Debian stable recommended for critical production deployments
  • Versions of CDH Debian versioning scheme testing considered usable - testing, not untested! has whiz-bang features and newer versions recommended for shops who like the bleeding edge, or for those in PoC/dev stage
  • Versions of CDH CDH1 (stable) Released March ’09 Hadoop 0.18.3, Hive 0.3, Pig 0.2 Will become oldstable this winter CDH2 (testing) Released June ’09 Hadoop 0.18.3, Hadoop 0.20.1, Pig 0.5, Hive 0.4, HBase 0.20 Can install 0.18 and 0.20 at the same time Will become stable this winter
  • CDH2 Package Versioning hadoop-0.18-0.18.3+65-1.cloudera.noarch.rpm A hadoop package based on Apache Hadoop 0.18.3 with 65 patches hadoop-0.20-0.20.0+4.4-1.cloudera.noarch.rpm A hadoop package based on Apache Hadoop 0.20.0 with 4 patches in testing, 4 security/critical fixes
  • Where do I get CDH? http://archive.cloudera.com/
  • Questions?