July 2012 HUG: Overview of Oozie Qualification Process
Upcoming SlideShare
Loading in...5
×
 

July 2012 HUG: Overview of Oozie Qualification Process

on

  • 2,027 views

The talk will cover the Oozie QE practice and process in Yahoo!, the types of tests that QE perform before release, and the roadmap.

The talk will cover the Oozie QE practice and process in Yahoo!, the types of tests that QE perform before release, and the roadmap.

Statistics

Views

Total Views
2,027
Views on SlideShare
1,841
Embed Views
186

Actions

Likes
1
Downloads
47
Comments
0

2 Embeds 186

http://www.scoop.it 184
http://localhost 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

July 2012 HUG: Overview of Oozie Qualification Process July 2012 HUG: Overview of Oozie Qualification Process Presentation Transcript

  • Overview of Oozie QE Qualification Process Michelle Chiang 07/18/2012
  • Agenda• What is Oozie• Qualification stages• Challenges• Future tasks• Q&A
  • What is Oozie?• Scalable, secure workflow scheduling system for Hadoop. – Three levels of jobs • Workflow job – Support actions such as MR, Pig, Java, Distcp • Coordinator job – Scheduling • Bundle job – Monitor status of coordinator jobs
  • Job Submission to Hadoop Oozie Client Hadoop Cluster1. CLI Job Tracker Actual2. Java Client API M/R Job3. WS API Launcher Mapper Oozie Server
  • QE Qualification Process• Develop test plan in design cycle• Design and implement test cases• Execute tests• Prepare release notes & certification• Support production deployment and customers’ FAQs
  • Develop Test Plan• Prepare test plan for new features defined in PRD, or• Prepare test plan for the selected new features checked into the apache source• Define test strategy• Test plan is reviewed by QE and Dev
  • Test plan example • test plan for “shell” actionCase Execution Expected results CommentIDTicket # Shell action 1. Read env var, compare action data Pass/Fail, bug#/JIRA# 2. Read config env var 3. Hadoop fs –ls; hadoop fs -cpTest_sh* Bash shell 1, 2, 3 Perl script 1, 2 Python script 1, 2 Java 1, 2 C++ 1, 2
  • Design and implement test cases Design Prepare Build Verify/Bug Automate Demotest case test data
  • Unit tests• Unit tests – 784 unit tests – code coverage: 72% – Checked in with code by developers – Executed by CI build as a Jenkins job
  • Functional tests• Functional tests (including regression tests) as of 3.2.0: – Use real systems (hadoop, oozie), not minicluster or minioozie – 1129 shell-based tests – 146 Java OozieClient API tests (in testNG) – Runtime: 36 hours, on 2 servers/clusters • Manual setup time: 20min
  • Shell-based tests• Assumptions: – secure hadoop cluster is up – oozie server is configured and up.• 2 types of tests – Individualized feature tests • Customized validation • Self-contained – 1 script drives many tests • Good for repetitive testing, e.g., schema tests
  • Example: run.sh • Prepare: generate jobprepare property file based on given conf and template • Upload: delete existing upload data, and upload application/data to hdfs • Submit: submit oozie jobs submit • Verify: check jobs finish successfully verify
  • Test validation (1)• Add validation into the workflow.xml – Apply decision node to check • wf:actionData • fs:exists • Other EL functions – Apply Java action to verify • capture-output
  • Test validation (2)• Add validation into run.sh – Apply oozie client commands to check • Job status, log, configurations, definition, dryrun – Apply shell commands to parse results • Download output data, parse and compare
  • Integration• Integration tests: – 15 tests, within hadoop eco system • Including Hadoop, Pig, Hcatalog, Distcp. – Runtime: ~5 hours (oozie tests only) • Manual setup time: 30min • Plus, test package preparation & test run: 3 hr – Examples • Oozie and MapReduce • Oozie and Pig • Oozie and Hcatalog
  • Stress tests (1)• Performance/stress/longevity tests: – 10 tests – Runtime: • 12 hours for performance/stress tests • 7 days for longevity testing. • Manual setup & analysis time: ~ 10min per test
  • Stress tests (2)• Performance metrics: – job submission rate – status update – no failed jobs – number of jobs submitted vs. completed• Longevity tests: – 300 wf jobs/min for 7 days ~= 3M jobs
  • Memory tests• Memory/stress tests: – 3 tests – Runtime: ~ 10 hours. • Manual setup & analysis: 30min per test – Examples: • Purge big amount of wf/coord/bundle jobs • Query a coord job with 100k actions • Query a coord job with 8k actions by N threads
  • Upgrade/installation tests• Upgrade tests: – 14 tests – Runtime: 4 hours (manual setup: 2hr) – steps: • Submit wf/coord/bundle jobs • Shut down oozie server • Upgrade database schema, oozie version, oozie config • Restart oozie server
  • Release notes and certification• Release notes – New features – Package version and new settings – New db schema• Certification – Number of tests being executed and pass rate – Known issues
  • Production and customer support• Document FAQs, e.g., usage of new features• Support production deployment issues• Meet customers’ SLA requirements
  • Experiences learned (1)• Add “time-out” to the test script – If the test fails to reach expected status• Carefully timed the verification step to catch transient states. – Job status transition, e.g., from PREP to RUNNING to PAUSED
  • Experiences learned (2)• Increase hadoop capacity – Modify hadoop queue capacity property – Modify user limit• Increase database active connections
  • Experiences learned (3)• Accumulate large number of jobs for testing – Increase materialization window – Reduce materialization look up interval – Coordinator job’s frequency, duration• Also, check database memory usage
  • Experiences learned (4)• Check oozie job log, tomcat server log, hadoop jobtracker log for debugging• Dev adds debugging statements
  • Challenges - production issues• Reproduce and debug issues in QE environment.• Set up QE environment as close to production as possible. – Recent story: using CNAME for oozie URL.
  • Challenges – backward compatibility• Oozie always guarantees backward compatibility – Web-service API – Job definitions – Client API• Verify old jobs continue to run in new release
  • Challenges – multiple versions• Compatibility of multiple versions of other components – Hadoop API – Pig – Hcatalog
  • Work in Progress (1)• Increase test coverage – Java based, testNG framework – Server-side oozie white box testing – Improved web service API testing
  • Work in Progress (2)• Hadoop 2.x integration testing, including HDFS federation.• Memory monitoring framework• Performance benchmark framework• Of course, new oozie releases
  • Open sourcing• Short term: Shell based tests – Review file/data structure – Add readme, copyright, etc – Work in progress• Long term: Java based tests – oozie-core, oozie-client, oozie-ws
  • Y! Oozie QE team QE Architect Jane Q. Chen qianchen@yahoo-inc.com QE Engineer Marcy Chen marchen@yahoo-inc.com QE EngineerMichelle Chiang mchiang@yahoo-inc.com
  • Acknowledgement• All oozie developers in the community! http://incubator.apache.org/oozie/ oozie-dev@incubator.apache.org
  • Thank you!• Q&A• oozie-users@incubator.apache.org
  • Back up slides
  • An Oozie Workflow MapReduce OK Streaming job FS job OKstart fork join (mkdir) Pig job OK Case1 Decision Case2 MapReduce job OK Java Action OK OK FS job end (chmod)
  • Oozie ‘Wordcount’ Workflow Example• Non-Oozie (single map-reduce job)From Gateway,[yourid@gwgd2211 ~]$ hadoop jar hadoop-examples.jar wordcount-Dmapred.job.queue.name=queue_name inputDir outputDir• Oozie MapReduce OK Start wordcount End Workflow.xml ERROR Kill
  • Example: shell-action workflow.xml <shell xmlns="uri:oozie:shell-action:0.1"> <!– skip lines --> Shell action <exec>${SCRIPT}</exec> <argument>-classpath</argument> <argument>./${SCRIPTFILE}:$CLASSPATH</argument> <argument>script</argument> <file>${SCRIPTFILE}#${SCRIPTFILE}</file> <capture-output/> wf:actionData </shell> matches?false <decision name="decision1"> true <switch><case to="end">${wf:actionData(shell- sh)[PATH1] == Reset}</case> <default to="fail" /> end </switch>kill </decision>
  • Integration tests• Compatible with other components• No system failures, e.g., NN, JT, Hcat_server• Run standalone utility to narrow down issues – For example, pig, distcp• Check oozie’s launcher log on Jobtracker
  • Production environment• Total number of nodes: 42K+• Total number of Clusters: 25+ – 1 oozie server per cluster• Total number of processed jobs ≈ 750K/month