Oozie at Yahoo
Purshotam Shah, Ryota Egashira
Oozie Meetup 06/03 2014
Table of Contents
Yahoo Confidential & Proprietary
▪Scale at Yahoo!
▪Scale and Performance
▪Features - Usability
▪High ava...
Scale at Yahoo!
▪Busiest cluster
› 1 million+ workflows per month
› 45 - 55K workflows per day
› 40 - 50K coord actions pe...
Scale and Performance
▪ Database
› CLOB to BLOB to compress and store inline
› Remove unnecessary hadoop config stored in ...
Features - Usability
▪UI improvements
› Active Jobs, Custom Global Filters, Child Jobs for Pig/Hive actions
▪Faster log st...
High Availability and Load Balancing
▪HCat integration
▪SLA
▪Sharelib
▪Server-server authentication
▪Distributed sequence
Customer Asks
▪Coordinator dependency management
› Ability to view dependencies and rerun part of a pipeline
▪Better error...
Rohini Palaniswamy
Mona Chitnis
Michelle Chiang
Purshotam Shah
Ryota Egashira
Olga L. Natkovich
Yahoo Oozie Team
Upcoming SlideShare
Loading in...5
×

Oozie at Yahoo! Jun 3rd 2014

226

Published on

by Ryota Egashira and Purshotam Shah (Yahoo)

Published in: Engineering, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
226
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Oozie at Yahoo! Jun 3rd 2014

  1. 1. Oozie at Yahoo Purshotam Shah, Ryota Egashira Oozie Meetup 06/03 2014
  2. 2. Table of Contents Yahoo Confidential & Proprietary ▪Scale at Yahoo! ▪Scale and Performance ▪Features - Usability ▪High availability and Load Balancing ▪Customer Asks
  3. 3. Scale at Yahoo! ▪Busiest cluster › 1 million+ workflows per month › 45 - 55K workflows per day › 40 - 50K coord actions per day › 800 - 900 coordinators (5m, 15m, 30m, hourly, daily and weekly) › 30 - 40 bundles ▪Most complex bundle - 230 coordinators ▪Most complex workflow - 85 forks ▪Video Transcoding - 100-300 workflows per min
  4. 4. Scale and Performance ▪ Database › CLOB to BLOB to compress and store inline › Remove unnecessary hadoop config stored in protoActionConf › Select only needed columns instead of loading whole row › Partition tables by created time (in Oracle) ▪ Other › Huge improvements to materialization of coordinator actions › Reduce Launcher overhead • Merge the number of small files created per action to one sequence file • Launcher libraries shipped only once to HDFS • Uber mode launcher with Hadoop 2.x › Synchronously execute commands without queueing to speed up action transition › Automatically killing abandoned coordinator job › gzip compression for Rest API
  5. 5. Features - Usability ▪UI improvements › Active Jobs, Custom Global Filters, Child Jobs for Pig/Hive actions ▪Faster log streaming with more filters ▪Updating coordinator definition on the fly ▪Rerun workflows without having to specify all properties again ▪Mark coordinator and actions as ignored ▪Sharelib Enhancements › Update on the fly without failing jobs › Command to list different sharelib available › Specify directories using metafile instead of single share lib directory
  6. 6. High Availability and Load Balancing ▪HCat integration ▪SLA ▪Sharelib ▪Server-server authentication ▪Distributed sequence
  7. 7. Customer Asks ▪Coordinator dependency management › Ability to view dependencies and rerun part of a pipeline ▪Better error handling and automatic retries ▪Ability to Suspend/Turn off SLA alerting ▪One-click launcher log viewing ▪Zero downtime
  8. 8. Rohini Palaniswamy Mona Chitnis Michelle Chiang Purshotam Shah Ryota Egashira Olga L. Natkovich Yahoo Oozie Team
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×