Hadoop Versioning

        July 9, 2012
          Anty Rao
 Big Data Engineering Team
        Hanborq Inc.


                             1
Development Convention
• Trunk
  – The main codeline, new features are developed on
    trunk
• Branch
  – Occasionally very large features are developed on
    their own branchers with the expectation they’ll later
    merge into trunk.
• Release
  – Candidate releases are branched from trunk
  – Stop accepting new features
  – Bugs get fixed and after a vote, a release is declared
    for that particular branch.


                                                             2
Hadoop 0.20 branch
• Two major features were added to branches
  off 0.20.2
  – Authentication
     • Enabling strong security for core hadoop
  – Append
     • Enabling users to run apache HBase without risk of data
       loss




                                                             3
4
Confusion about Version
• Release off the 0.20 branches had features
  that release off the trunk did not have and
  vice versa.
• Apache Hadoop 0.23 is a strict superset of
  features over 0.22, but it actually released a
  month before 0.22
• The 0.20 branch formerly known as 0.20.205
  was renumbered 1.0. This is just a
  renumbering, no functional difference.

                                                   5
Status
• There has been 18 month period where there
  has been no one apache release that had all
  the committed features of Apache Hadoop!
• Recently released Hadoop 1.0, including
  following features
  – 0.20 Append
  – 0.20 security



                                                6
7
Cloudera CDH
• CDH1
  – 0.18.3 Apache Hadoop Release
• CDH2
  – 0.20.1 Apache Hadoop Release
• CDH3
  – 0.20.2 Apache Hadoop Release
  – 0.20. Append
  – 0.20. Security
• CDH4
  – 0.23.X Apache Hadoop Release

                                   8
9
Reference
• http://www.cloudera.com/blog/2012/01/an-
  update-on-apache-hadoop-1-0/




                                             10

Hadoop Versioning

  • 1.
    Hadoop Versioning July 9, 2012 Anty Rao Big Data Engineering Team Hanborq Inc. 1
  • 2.
    Development Convention • Trunk – The main codeline, new features are developed on trunk • Branch – Occasionally very large features are developed on their own branchers with the expectation they’ll later merge into trunk. • Release – Candidate releases are branched from trunk – Stop accepting new features – Bugs get fixed and after a vote, a release is declared for that particular branch. 2
  • 3.
    Hadoop 0.20 branch •Two major features were added to branches off 0.20.2 – Authentication • Enabling strong security for core hadoop – Append • Enabling users to run apache HBase without risk of data loss 3
  • 4.
  • 5.
    Confusion about Version •Release off the 0.20 branches had features that release off the trunk did not have and vice versa. • Apache Hadoop 0.23 is a strict superset of features over 0.22, but it actually released a month before 0.22 • The 0.20 branch formerly known as 0.20.205 was renumbered 1.0. This is just a renumbering, no functional difference. 5
  • 6.
    Status • There hasbeen 18 month period where there has been no one apache release that had all the committed features of Apache Hadoop! • Recently released Hadoop 1.0, including following features – 0.20 Append – 0.20 security 6
  • 7.
  • 8.
    Cloudera CDH • CDH1 – 0.18.3 Apache Hadoop Release • CDH2 – 0.20.1 Apache Hadoop Release • CDH3 – 0.20.2 Apache Hadoop Release – 0.20. Append – 0.20. Security • CDH4 – 0.23.X Apache Hadoop Release 8
  • 9.
  • 10.