Challenges and Uniqueness
             of
QE and RE processes in Hadoop
          Jayant Mahajan
  Grid Computing, Yahoo! ...
Agenda



 •   Quality Checks for a Patch at Hadoop
 •   Additional QE at Yahoo!
 •   Tools used for Hadoop QE and RE
 •  ...
Quality checks for a patch commit in Hadoop



  • Static Quality Analysis – Patch attached to Jira
     – Verify Findbugs...
Quality checks for a patch commit in Hadoop (Contd ..)

                               COMMUNITY
                         ...
Additional QE @ Y! for Hadoop


  • We are the largest test team for Hadoop
  • More than 1000 nodes dedicated for QE
  • ...
Additional QE @ Y! for Hadoop (Contd ..)

                               COMMUNITY

  Jira             Patch      Set JIRA...
Tools used for Hadoop QE and RE

  •   Hudson            – Build automation
  •   SVN and GIT       – Source Code Mgmt (SC...
Hudson



 •   Hudson is a Continuous Integration Server used to
     execute and monitor job (Hudson job)
 •   Used for:
...
Challenges in Hadoop QE and RE

  • Reliability
     – Loss of nodes
     – Data corruption
     – Loss of data blocks
  •...
Reliability


   • MapReduce Reliability
     – Fail Tasks
     – Lost TT’s

   • HDFS Reliability
     – Bringing a rack ...
Scale


  • Testing at scale when Hardware resource are limited
  • If we want more nodes for testing, what will we do?
  ...
Performance


  • Benchmark execution on 20 and 500 nodes
    – Eg: Sort, Shuffle, DFSIO

  • GridMix
    – V1 - A standar...
Corner Cases

  • Challenges in reproducing a problem related to
    –   Timing issues
    –   Race conditions
    –   Out...
Repeatability - Deployment



  •   Deployment Challenges
      –   Deploying on a multiple node cluster
      –   Decidin...
Repeatability - CI



  • Continuous Integration aka CI
     – Software development process where members of the team
    ...
Thank you



    - 16 -   16
Upcoming SlideShare
Loading in …5
×

Hadoop Summit 2010 Challenges And Uniqueness Of Qe And Re Processes In Hadoop

1,531 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,531
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hadoop Summit 2010 Challenges And Uniqueness Of Qe And Re Processes In Hadoop

  1. 1. Challenges and Uniqueness of QE and RE processes in Hadoop Jayant Mahajan Grid Computing, Yahoo! Bangalore Feb 2010 -1- 1
  2. 2. Agenda • Quality Checks for a Patch at Hadoop • Additional QE at Yahoo! • Tools used for Hadoop QE and RE • Challenges -2-
  3. 3. Quality checks for a patch commit in Hadoop • Static Quality Analysis – Patch attached to Jira – Verify Findbugs warnings – Verify Javadoc warning – Verify ReleaseAudit warnings – Verify Unit Tests – if added or not • Committer Review • Unit Tests – Junit – Mini MR Tests -3-
  4. 4. Quality checks for a patch commit in Hadoop (Contd ..) COMMUNITY Secondary Build • Static analysis – findbugs • Jdiff • All Core unit tests with code coverage • All Contrib unit tests code coverage Jira Patch Set JIRA Patch commit raised attached status to picked up to JIRA “Patch for testing SVN Available” - HUDSON • Static analysis – Findbug • ReleaseAudit warning Committer • Fast unit tests - TestNG Review • Fast contrib unit tests (if Development patching contrib) -4-
  5. 5. Additional QE @ Y! for Hadoop • We are the largest test team for Hadoop • More than 1000 nodes dedicated for QE • Hadoop testing at Yahoo – Patch testing – Automated Testing – Manual Testing -5-
  6. 6. Additional QE @ Y! for Hadoop (Contd ..) COMMUNITY Jira Patch Set JIRA Patch commit raised attached status to picked up to JIRA “Patch for testing SVN Available” - HUDSON GIT Development YAHOO ! Test Environment Manual Manual HUDSON - HUDSON GIT Patch Functional Benchmark Release Y!Hadoop Testing Testing and Build Automation -6-
  7. 7. Tools used for Hadoop QE and RE • Hudson – Build automation • SVN and GIT – Source Code Mgmt (SCM) • Ant & ivy – Build and Dependency Mgmt • Checkstyle – code standard checker • Clover – code coverage • Forrest – Documentation • Jdiff – Track API changes • Findbugs – Static analysis to find bugs • Junit – Unit tests • Bugzilla & Jira – Issue Tracking -7-
  8. 8. Hudson • Hudson is a Continuous Integration Server used to execute and monitor job (Hudson job) • Used for: – Build – Unit Tests – Deployment – Validation Jobs – Automated tests • http://hudson-ci.org/ -8-
  9. 9. Challenges in Hadoop QE and RE • Reliability – Loss of nodes – Data corruption – Loss of data blocks • Scale – Network issues – Disk issues • Performance • Corner cases • Repeatability – Deployment – Continuous Integration -9-
  10. 10. Reliability • MapReduce Reliability – Fail Tasks – Lost TT’s • HDFS Reliability – Bringing a rack down – Corrupting data blocks – Loss of data blocks - 10 -
  11. 11. Scale • Testing at scale when Hardware resource are limited • If we want more nodes for testing, what will we do? – Use simulation ▪ DataNode simulation ▪ TaskTracker simulation – For example ▪ We need an environment of 3000 node cluster ▪ Run 3 instance of TT’s and DN’s per node on 1000 Node cluster ▪ This simulates an environment equivalent to 3000 node cluster - 11 -
  12. 12. Performance • Benchmark execution on 20 and 500 nodes – Eg: Sort, Shuffle, DFSIO • GridMix – V1 - A standard mix of MR jobs of varying types and sizes measuring throughput on a cluster – V2 - Customized mix of MR jobs where the number of small/large/medium jobs can be controlled – V3 ▪ It simulates user load pattern. ▪ Work load is generated from job history trace analysis - 12 -
  13. 13. Corner Cases • Challenges in reproducing a problem related to – Timing issues – Race conditions – Out of memory issues – Reproducing in the exact environment where it occurred. • AspectJ – Aspectj taps into source code and can run simulated scenarios before/after/during a method. – It can reproduce timing issues by introducing sleep statements. – out of memory issues, by reducing the memory available duing run time. – Exact environments can reproduced by changing the configs of the jobs in the go, when the exact configuration is not possible to replicate. - 13 -
  14. 14. Repeatability - Deployment • Deployment Challenges – Deploying on a multiple node cluster – Deciding on a JTNode and NameNode – Building configurations for variety of clusters • Solution – YUM repo for deployment – Backup host for JTNode and Namenode – Source code build & configuration build - 14 -
  15. 15. Repeatability - CI • Continuous Integration aka CI – Software development process where members of the team integrate their work frequently, usually daily – Every integration is verified by automated build (including tests) to verify integration errors as quickly as possible. • CI @ Y! – Commit build – Secondary build – Secondary smoke test build – Automated deployment - 15 -
  16. 16. Thank you - 16 - 16

×