Yahoo! Workflow Engine for Hadoop Alejandro Abdelnur Yahoo!
Oozie workflow engine (Oozie 1) Oozie coordinator engine (Oozie 2) Getting Oozie Session Agenda
What was Oozie? An Oozie workflow is a DAG of MR/pig/fs/java/workflow actions Workflow applications are written in a PDL  in XML Workflow applications are parameterized Oozie is a  server It is transactional, reliable and it scales HTTP REST API only  (Java API, CLI, console on top of it) Implementation: Java web-app + SQL DB Oozie 1, Workflow
Users Experience ” Oozie has enabled us to reduce our index building operation  from  a  manually   intensive  4-days  process  to   6-hours fully automated   process...”  Keyword Research Service  team ”…  It  saved   us tremendous amount of  time and resources  not to develop alternative custom solution to manage our complex workflows on the Grid…” Segment Manager  team
Oozie users: 50  Workflow applications: 4868 Largest workflow: 2000 action nodes Average action nodes per workflow: 18 Workflow jobs in last month: 55K Workflow action nodes by type: Longest running workflow job: 17 hours Some Numbers Map-Red Pig File System Java Sub-Workflow 23% 30% 19% 18% 4%
Releases 4 feature releases, 6 patches, 1 DB schema change (from Oozie 1 to 2) Failures?  YES  (recovered from them?  YES ) Servlet-container (Tomcat) and database (MySQL) Did we lose workflow jobs data?  NO Code issues that caused failures: DB CONN leaks (fix: use command pattern all over) Thread pool starvation (fix: added thread quota per command type) HDFS CONN leaks (fix: 2 nd  level caching) The First Year …
Deployment Model Started: 1 Tomcat / multiple Oozies  Now: multiple 1 Tomcat/ 1 Oozie Database Migrated from MySQL to Oracle …  The First Year …
Co-existence with Hadoop When JT/NN are slow, Oozie users complain that Oozie is slow Bad workflows can overload JT/NN fork of 2000+ MR Java action looping waiting for files to become available Hadoop patching requires a synchronized patching of Oozie  (because of Hadoop-RPC compatibility issues) Different Y! clusters use different Hadoop versions (it requires juggling with Oozie code to avoid more branches) …  The First Year …
Implementation changes Deprecated SSH action, added JAVA action MR/Pig actions are started via a launcher M(1)R(0) job Improved user logging (specially for Pig) Removed external calls from within DB TRX (nasty one) Using (Open)JPA for DB access Got right from the beginning Backward compatibility for API and PDL:  ALWAYS KEPT Heavy use of asynchronous command execution (queue + threadpool) Instrumentation data (for monitoring) …  The First Year
A Workflow job MUST NOT be started until all external input is available RULE for Oozie Workflows
What is Oozie 2? It is  Oozie 1 PLUS … Time+Data driven execution of workflow jobs Workflow job is scheduled at a regular frequency Workflow job is started when all input data is available Oozie 2 Coordinator Coordinator  app f IN  Workflow  OUT
Use Cases: Data Pipelines WS f (5min) PH1 1:05 f (60min) PH1 1:10 PH1 1:15 PH1 2:00 LOG 1:05 LOG 1:10 LOG 1:15 LOG 2:00 PH2 2:00 01JAN  31DEC 01JAN  31DEC 1:05 1:10 2:00 1:15 2:00
A coordinator application can be parameterized Coordinator jobs have frequency, start & end date Every tick of the frequency a coordinator action is created The coordinator action starts a workflow job only when all input data is available Coordinator applications define their input/output data Input/output data is (normally) relative to action creation time (the job frequency), they are expressed as URI templates: hdfs://.../ph1/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MIN} Coordinator Applications
Coordinator Input and Output Data PH1 1:05 f j   (60min) PH1 1:10 PH1 1:15 PH1 2:00 PH2 2:00 01JAN  31DEC 2:00 ${current(0)} ${current(-11)} ${current(0)} ${current(-10)} ${current(-9)} f i (5min) f o (60min) IN  Workflow  OUT
Minutes and hours in a day change on per TZ basis Hours in March == 31 * 24 ? YES & NO A day of hourly datasets is always 24 instances? YES & NO How about mixing datasets from different US TZs? How about mixing datasets from different TZs from different regions/countries? SOLUTION: Built-In Support for TZ/DS Daylight Saving is Evil
Automatic temporary back-off from JT/NN when down or too slow Map-Reduce and Pig jobs submission over HTTP (w/o WF) High Availability (via Zookeeper) Improved Workflow Schema Complete Coordinator specification support (asynch datasets and apps) More user friendly functions Integration with metadata system Coordinator reprocessing features Coordinator application bundles (manage many coord jobs as one unit) What is Next?
http://developer.yahoo.com/hadoop http://yahoo.github.com/oozie Getting Oozie
Questions? Alejandro Abdelnur [email_address]

Workflow on Hadoop Using Oozie__HadoopSummit2010

  • 1.
    Yahoo! Workflow Enginefor Hadoop Alejandro Abdelnur Yahoo!
  • 2.
    Oozie workflow engine(Oozie 1) Oozie coordinator engine (Oozie 2) Getting Oozie Session Agenda
  • 3.
    What was Oozie?An Oozie workflow is a DAG of MR/pig/fs/java/workflow actions Workflow applications are written in a PDL in XML Workflow applications are parameterized Oozie is a server It is transactional, reliable and it scales HTTP REST API only (Java API, CLI, console on top of it) Implementation: Java web-app + SQL DB Oozie 1, Workflow
  • 4.
    Users Experience ”Oozie has enabled us to reduce our index building operation from a manually intensive 4-days process to 6-hours fully automated process...” Keyword Research Service team ”… It saved us tremendous amount of time and resources not to develop alternative custom solution to manage our complex workflows on the Grid…” Segment Manager team
  • 5.
    Oozie users: 50 Workflow applications: 4868 Largest workflow: 2000 action nodes Average action nodes per workflow: 18 Workflow jobs in last month: 55K Workflow action nodes by type: Longest running workflow job: 17 hours Some Numbers Map-Red Pig File System Java Sub-Workflow 23% 30% 19% 18% 4%
  • 6.
    Releases 4 featurereleases, 6 patches, 1 DB schema change (from Oozie 1 to 2) Failures? YES (recovered from them? YES ) Servlet-container (Tomcat) and database (MySQL) Did we lose workflow jobs data? NO Code issues that caused failures: DB CONN leaks (fix: use command pattern all over) Thread pool starvation (fix: added thread quota per command type) HDFS CONN leaks (fix: 2 nd level caching) The First Year …
  • 7.
    Deployment Model Started:1 Tomcat / multiple Oozies Now: multiple 1 Tomcat/ 1 Oozie Database Migrated from MySQL to Oracle … The First Year …
  • 8.
    Co-existence with HadoopWhen JT/NN are slow, Oozie users complain that Oozie is slow Bad workflows can overload JT/NN fork of 2000+ MR Java action looping waiting for files to become available Hadoop patching requires a synchronized patching of Oozie (because of Hadoop-RPC compatibility issues) Different Y! clusters use different Hadoop versions (it requires juggling with Oozie code to avoid more branches) … The First Year …
  • 9.
    Implementation changes DeprecatedSSH action, added JAVA action MR/Pig actions are started via a launcher M(1)R(0) job Improved user logging (specially for Pig) Removed external calls from within DB TRX (nasty one) Using (Open)JPA for DB access Got right from the beginning Backward compatibility for API and PDL: ALWAYS KEPT Heavy use of asynchronous command execution (queue + threadpool) Instrumentation data (for monitoring) … The First Year
  • 11.
    A Workflow jobMUST NOT be started until all external input is available RULE for Oozie Workflows
  • 12.
    What is Oozie2? It is Oozie 1 PLUS … Time+Data driven execution of workflow jobs Workflow job is scheduled at a regular frequency Workflow job is started when all input data is available Oozie 2 Coordinator Coordinator app f IN Workflow OUT
  • 13.
    Use Cases: DataPipelines WS f (5min) PH1 1:05 f (60min) PH1 1:10 PH1 1:15 PH1 2:00 LOG 1:05 LOG 1:10 LOG 1:15 LOG 2:00 PH2 2:00 01JAN 31DEC 01JAN 31DEC 1:05 1:10 2:00 1:15 2:00
  • 14.
    A coordinator applicationcan be parameterized Coordinator jobs have frequency, start & end date Every tick of the frequency a coordinator action is created The coordinator action starts a workflow job only when all input data is available Coordinator applications define their input/output data Input/output data is (normally) relative to action creation time (the job frequency), they are expressed as URI templates: hdfs://.../ph1/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MIN} Coordinator Applications
  • 15.
    Coordinator Input andOutput Data PH1 1:05 f j (60min) PH1 1:10 PH1 1:15 PH1 2:00 PH2 2:00 01JAN 31DEC 2:00 ${current(0)} ${current(-11)} ${current(0)} ${current(-10)} ${current(-9)} f i (5min) f o (60min) IN Workflow OUT
  • 16.
    Minutes and hoursin a day change on per TZ basis Hours in March == 31 * 24 ? YES & NO A day of hourly datasets is always 24 instances? YES & NO How about mixing datasets from different US TZs? How about mixing datasets from different TZs from different regions/countries? SOLUTION: Built-In Support for TZ/DS Daylight Saving is Evil
  • 17.
    Automatic temporary back-offfrom JT/NN when down or too slow Map-Reduce and Pig jobs submission over HTTP (w/o WF) High Availability (via Zookeeper) Improved Workflow Schema Complete Coordinator specification support (asynch datasets and apps) More user friendly functions Integration with metadata system Coordinator reprocessing features Coordinator application bundles (manage many coord jobs as one unit) What is Next?
  • 18.
  • 19.

Editor's Notes

  • #2 This is the Title slide. Please use the name of the presentation that was used in the abstract submission.
  • #3 This is the agenda slide. There is only one of these in the deck.
  • #4 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #6 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #7 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #8 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #9 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #10 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #12 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #13 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #14 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #15 Let me try to formalize things a bit A coordinator application can be parameterized, the locations of their inputs and outputs, special settings for the workflows jobs. They have a start and end date and a frequency, which also can be parameterized. On every frequency tick a workflow job is scheduled but the workflow is in WAITING state until all the INPUT is available A coordinator application defines all its input and an output. And they are normally related to the frequency and the time the workflow jobs are scheduled.
  • #16 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #17 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #18 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #19 This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • #20 This is the final slide; generally for questions at the end of the talk. Please post your contact information here.