Your SlideShare is downloading. ×
0
Oozie
3:
Improved
Scheduling
 and
Control
Of
Workflows
        Mohammad
K
Islam
      kamrul@yahoo‐inc.com

Introduc?ons
•  Who
I
am
    •  Technical
Lead
at
Yahoo!
•  Oozie
Team
•  Architecture,
Development,
Management
    –    M...
Agenda
•  Oozie
Overview
•  Oozie
3.0
features:

   –  Bundle
   –  Scalability
   –  Usability

•  Future
Plan
•  Q&A


Overview:
Workflow
•  Oozie
executes
workflow
defined
as
DAG
of
jobs.
•  The
job
type
includes:
Map‐Reduce/
Pipes/
Streaming/...
Overview:
Coordinator
•  Oozie
executes
workflow
based
on:
   –  Time
Dependency
(Frequency)
   –  Data
Dependency

•  Intr...
Bundle
•  What
is
Bundle?
  –  A
new
abstraccon
layer
on
top
of
Coordinator.
  –  Users
can
define
and
execute
a bunch of  ...
Bundle
Cont.
•  User
defines
the
bundle
through
a
new XML.
•  User
could
start/stop/suspend/resume/rerun    in
the
bundle
l...
Oozie
Abstrac?on
Layers
                               Bundle                                            Layer
1
      Coo...
Enhanced
Stability
and
Scalability
•  Issue
:
At
very
high
load,
Oozie
becomes
slow.
•  Impact:
90%
of
the
total
Oozie
sup...
Improved
Usability
•  Issue:
Coordinator
job’s
status
is
not
intuicve
and
   causes
confusion
to
the
Oozie
user.
•  Impact...
Coordinator
Status
Redesign
Current
                    SUSPENDED
               KILLED
    PREP
      PREMATER
          ...
Future
Plan
•  Higher
Scalability:
Change
polling‐based
data‐   dependency

check
to
push‐model
through
HCatalog
   and
No...
Future
Plan
Cont.
•  Automa?c
Failover:
Using
ZooKeeper.
•  Load
Balancing
:
Through
server
replicacon
•  Improved
Usabili...
Q&A
•  Github
link:
hhp://yahoo.github.com/oozie
•  Mailing
list:
Oozie-users@yahoogroups.com
                     Mohamma...
Upcoming SlideShare
Loading in...5
×

Oozie Hug May 2011

476

Published on

Presented at HUG, 2011.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
476
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Oozie Hug May 2011"

  1. 1. Oozie
3:
Improved
Scheduling
 and
Control
Of
Workflows
 Mohammad
K
Islam
 kamrul@yahoo‐inc.com

  2. 2. Introduc?ons
•  Who
I
am
 •  Technical
Lead
at
Yahoo!
•  Oozie
Team
•  Architecture,
Development,
Management
 –  Mayank
Bansal
 –  Angelo
Huang
 –  Mohammad
Islam
 –  Amol
Kekre
 –  Andreas
Newman
 –  Lei
Zhang
•  External
contributors.
•  QE
 –  Marcy
Chang
 –  Michelle
Chiang

  3. 3. Agenda
•  Oozie
Overview
•  Oozie
3.0
features:

 –  Bundle
 –  Scalability
 –  Usability

•  Future
Plan
•  Q&A


  4. 4. Overview:
Workflow
•  Oozie
executes
workflow
defined
as
DAG
of
jobs.
•  The
job
type
includes:
Map‐Reduce/
Pipes/
Streaming/ Pig/Custom
Java
Code
etc.
•  Introduced
in
Oozie
1.x.
 M/R 
 streaming 
 job
 M/R 
 start 
 fork
 join
 job
 Pig
 MORE
 decision
 job
 M/R 
 ENOUGH
 job
 FS
 end
 Java
 job 

  5. 5. Overview:
Coordinator
•  Oozie
executes
workflow
based
on:
 –  Time
Dependency
(Frequency)
 –  Data
Dependency

•  Introduced
in
Oozie
2.x.
 Oozie
Server
 Check

 WS
API
 Oozie

 Data
Availability
 Coordinator
 Oozie

 Oozie
 Workflow
 Client
 Hadoop

  6. 6. Bundle
•  What
is
Bundle?
 –  A
new
abstraccon
layer
on
top
of
Coordinator.
 –  Users
can
define
and
execute
a bunch of  coordinator
applicacons.
 –  Introduced
in
Oozie
3.x.
•  Why
it
is
required?
 –  Data
pipeline:
A
set
of
inter‐related
coordinators
 applicacon
required
for
large
data
processing.
 –  Operaconal
nightmare:
Hard
to
maintain
and
 control
these
pipelines
for
Service
Engineering
 team.

  7. 7. Bundle
Cont.
•  User
defines
the
bundle
through
a
new XML.
•  User
could
start/stop/suspend/resume/rerun  in
the
bundle
level.
•  Bundle
is
op3onal.  Oozie
Server
 Check

 WS
API
 Data
Availability
 Bundle
 Coordinator
 Oozie
 Workflow
 Client
 Hadoop

  8. 8. Oozie
Abstrac?on
Layers
 Bundle  Layer
1
 Coord Job 1  Coord Job 2  Layer
2
Coord  Coord  Coord  Coord Action 1  Action 2  Action1   Action 2 WF Job 1  WF Job 1  WF Job 2  WF Job 2  PIG  Layer
3
 Job  M/R  M/R  PIG  Job  Job  Job  FS  Job 
  9. 9. Enhanced
Stability
and
Scalability
•  Issue
:
At
very
high
load,
Oozie
becomes
slow.
•  Impact:
90%
of
the
total
Oozie
support
incidence.

•  Reason:

 –  Lot
of
accve
but
non‐progressing
jobs.
 –  Non‐progressing
jobs
are
consuming
a
lot
of
 resources.
 –  Oozie
internal
queue
is
full.
•  Resolucon:
 –  Throhle
the
number
of
accve
jobs/coordinator
 –  Put
the
job
into
cmeout
state.
 –  Enforce
the
uniqueness
for
oozie
queue
element.


  10. 10. Improved
Usability
•  Issue:
Coordinator
job’s
status
is
not
intuicve
and
 causes
confusion
to
the
Oozie
user.
•  Impact:
User
confusion
and
related
Oozie
 support.
•  
Reason:
 –  Status
SUCCEEDED
doesn’t
mean
job
is
successful!!
 –  Status
PREMATER
is
for
oozie
internal
use
only.
But
it
 was
exposed
to
user.
•  Resolucon:
 –  Redesign
Coordinator
status

  11. 11. Coordinator
Status
Redesign
Current
 SUSPENDED
 KILLED
 PREP
 PREMATER
 Running
 SUCCEEDED
 FAILED
New
 SUSPENDED
 KILLED
 SUCCEEDED
 PREP
 Running
 DONE_WITH_ERROR
 PAUSED
 FAILED

  12. 12. Future
Plan
•  Higher
Scalability:
Change
polling‐based
data‐ dependency

check
to
push‐model
through
HCatalog
 and
Nocficacon
system.

•  Adaptability:
Graceful
handling
Hadoop
downcme:

 –  If
Hadoop
is
down,
block
submission.

 –  When
Hadoop
becomes
available

 •  Submit
the
blocked
job

 •  Auto‐resubmit
the
untraced
job.

•  Monitoring:
Rich
WS
API
for
applicacon
Monitoring/ Alercng.

  13. 13. Future
Plan
Cont.
•  Automa?c
Failover:
Using
ZooKeeper.
•  Load
Balancing
:
Through
server
replicacon
•  Improved
Usability:

 –  Distcp
accon
 –  Hive
Accon
•  Asynchronous
data
processing.
•  Incremental
data
processing.
•  Apache
Migra?on:
Works
inicated.


  14. 14. Q&A
•  Github
link:
hhp://yahoo.github.com/oozie
•  Mailing
list:
Oozie-users@yahoogroups.com
 Mohammad
K
Islam
 kamrul@yahoo‐inc.com

  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×