Your SlideShare is downloading. ×
Oozie at Yahoo! Jun 3rd 2014
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Oozie at Yahoo! Jun 3rd 2014

154
views

Published on

by Ryota Egashira and Purshotam Shah (Yahoo)

by Ryota Egashira and Purshotam Shah (Yahoo)

Published in: Engineering, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
154
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Oozie at Yahoo Purshotam Shah, Ryota Egashira Oozie Meetup 06/03 2014
  • 2. Table of Contents Yahoo Confidential & Proprietary ▪Scale at Yahoo! ▪Scale and Performance ▪Features - Usability ▪High availability and Load Balancing ▪Customer Asks
  • 3. Scale at Yahoo! ▪Busiest cluster › 1 million+ workflows per month › 45 - 55K workflows per day › 40 - 50K coord actions per day › 800 - 900 coordinators (5m, 15m, 30m, hourly, daily and weekly) › 30 - 40 bundles ▪Most complex bundle - 230 coordinators ▪Most complex workflow - 85 forks ▪Video Transcoding - 100-300 workflows per min
  • 4. Scale and Performance ▪ Database › CLOB to BLOB to compress and store inline › Remove unnecessary hadoop config stored in protoActionConf › Select only needed columns instead of loading whole row › Partition tables by created time (in Oracle) ▪ Other › Huge improvements to materialization of coordinator actions › Reduce Launcher overhead • Merge the number of small files created per action to one sequence file • Launcher libraries shipped only once to HDFS • Uber mode launcher with Hadoop 2.x › Synchronously execute commands without queueing to speed up action transition › Automatically killing abandoned coordinator job › gzip compression for Rest API
  • 5. Features - Usability ▪UI improvements › Active Jobs, Custom Global Filters, Child Jobs for Pig/Hive actions ▪Faster log streaming with more filters ▪Updating coordinator definition on the fly ▪Rerun workflows without having to specify all properties again ▪Mark coordinator and actions as ignored ▪Sharelib Enhancements › Update on the fly without failing jobs › Command to list different sharelib available › Specify directories using metafile instead of single share lib directory
  • 6. High Availability and Load Balancing ▪HCat integration ▪SLA ▪Sharelib ▪Server-server authentication ▪Distributed sequence
  • 7. Customer Asks ▪Coordinator dependency management › Ability to view dependencies and rerun part of a pipeline ▪Better error handling and automatic retries ▪Ability to Suspend/Turn off SLA alerting ▪One-click launcher log viewing ▪Zero downtime
  • 8. Rohini Palaniswamy Mona Chitnis Michelle Chiang Purshotam Shah Ryota Egashira Olga L. Natkovich Yahoo Oozie Team