Oozie meetup Hadoop Summit 2014
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Oozie meetup Hadoop Summit 2014

on

  • 150 views

by Bowen Zhang (Hortonworks)

by Bowen Zhang (Hortonworks)

Statistics

Views

Total Views
150
Views on SlideShare
144
Embed Views
6

Actions

Likes
0
Downloads
8
Comments
0

2 Embeds 6

http://www.slideee.com 4
http://localhost 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Oozie meetup Hadoop Summit 2014 Presentation Transcript

  • 1. Oozie Meetup Bowen Zhang
  • 2. Agenda ● Oozie Log HA ● Oozie cron scheduling ○ Use cases ○ Troubleshooting ● Prospective 4.1 release
  • 3. Oozie Log HA ● HA implemented already ○ Server ○ HCat ○ SLA ○ Sharelib ● Remaining piece ○ log streaming: if a node is down, all oozie.log content on that node is not accessible
  • 4. Proposed Solution ● YARN faced the same issue when it comes to log streaming and retrieval ● Currently, YARN puts container log to HDFS when log aggregation is enabled ● Oozie can duplicate logs onto HDFS ○ Log directly to HDFS ○ Copy the log onto HDFS during log rotation
  • 5. Direct logging to HDFS ● Pros ○ complete log HA with 100% accuracy of the content ● Cons ○ This could be hard to implement and may need significant oozie logging mechanism changes ○ This introduces strict dependency on HDFS ○ Potential server performance issues.
  • 6. Copy log rotation ● Pros ○ Easy to implement without significant changes to oozie logging structure ○ Less performance issue ● Cons ○ Always has less than one hour window of log unavailability due to rotation schedule
  • 7. Other ideas? Other ideas are always welcome. Eg. Putting it into DB? Eg. Integrate Zookeeper to solve this problem?
  • 8. Coordinator Cron Scheduling Various Cron syntaxes exist and unix cron syntax is only one of them. ● Oozie cron has 5 fields since oozie operates on per minute base. ● Weekday starts at 2 which is Monday ● Complicated Overflowing ranges are discouraged to use
  • 9. Use cases ● A job running at 9am every weekday ○ frequency="0 9 * * 2-6" or "0 9 * * MON-FRI" ○ Notice in the first expression, we use 2-6 instead of 1-5 ● A job running every 15 minutes from 9- 11am every day ○ frequency="0/15 9,10 * * *" or "0,15,30,45 9,10 * * *" ○ Notice hour field should be 9,10 instead of 9-11
  • 10. Use cases continued ● A job running at 9am of every last Friday of the month ○ frequency = "0 9 * * 6L" ● A job running at 9am of every 2nd Friday of the month ○ frequency = "0 9 * * 6#2"
  • 11. General mistakes ● Oozie timezone by default is UTC. So your cron syntax should calculate this timezone differences ○ If you live in LA and want to run a job at 9am every day: ■ frequency = "0 9 * * *" is wrong!!! ■ frequency = "0 16 * * *" is the right one ○ We understand the inconvenience, but that’s the oozie server timezone.
  • 12. General mistakes continued ● Day of month and day of week are union, not intersection ○ "0 10 12-18 * 2-6" DOES NOT MEAN running a job at 10 am on weekdays between 12 and 18th of the month ○ It MEANS running a job at 10 am on weekdays AND on 12-18th of the month. Only one of the two needs to be satisfied for the job to fire.
  • 13. General Mistakes Continued ● An overflow range goes wild ○ "0 23-1 * * *" is reasonable ○ "0 23-1 * DEC-MAR FRI-MON". What does this mean? ■ Cron can produce many different edge cases in the above scenario. Do it at your own peril!
  • 14. 4.1 Release We have around 200 patches for this release and it’s been a while! ● All HA related work ● Cron scheduling for coordinator job ● Consolidation of JPAExecutors (not user facing” ● Introduction of SharelibService (some user facing backward imcompatibilty issue)
  • 15. 4.1 Release continued ● Better integration with YARN RM HA and restart ● Oozie Sqoop CLI functionality ● Many more versatility of Coordinator job functionalities ● Major overhaul of coordinator job execution order