Oozie High Availability (Hadoop Summit 2014 meetup)

1,386 views
1,135 views

Published on

by Robert Kanter (Cloudera)

Published in: Engineering, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,386
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
46
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Oozie High Availability (Hadoop Summit 2014 meetup)

  1. 1. 1 Oozie High Availability (HA) Robert Kanter
  2. 2. 2 High Availability • A system without non-planned downtime when partial failures occur • Typically achieved by having redundancies and removing single-points of failure • Our Goals • Don’t change the API or usage patterns • User doesn’t even have to know its HA
  3. 3. 3 The HA Solution Architectural Overview
  4. 4. 4 The HA Solution: Database • Oozie stores all state in a database • (submitted jobs, workflow definitions, etc) • Instead of a failover model, we want to run many Oozie servers against the same database • Active-Active HA • Also provides horizontal scalability • ZooKeeper for coordination
  5. 5. 5 The HA Solution: Database
  6. 6. 6 The HA Solution: Access • Users and client programs need a single address to connect (Web UI, REST/Java API, JobTracker callbacks, etc) • Load Balancer, Virtual IP, or DNS round-robin can be used to provide a single entry point to the Oozie servers • Technically also needs to be HA
  7. 7. 7 The HA Solution: Access
  8. 8. 8 The HA Solution: Log Streaming • Oozie’s log files are not in the database • Each Oozie Server only has access to its own logs • Jobs are not assigned to a specific Oozie server • What if Oozie Server A wants to get logs for a job processed by Oozie Server B? • Oozie Server A can ask Oozie Server B for its logs • Caveat: If an Oozie Server goes down, any logs from it will be unavailable until it is brought back up
  9. 9. 9 The HA Solution: Log Streaming
  10. 10. 10 How to Enable HA Configuration and Security
  11. 11. 11 How to Enable HA • Setup Load balancer, ZooKeeper ensemble, HA database, and multiple identically configured Oozie servers • Enable Oozie HA services: <property> <name>oozie.services.ext</name> <value> org.apache.oozie.service.ZKLocksService, org.apache.oozie.service.ZKXLogStreamingService, org.apache.oozie.service.ZKJobsConcurrencyService </value> </property>
  12. 12. 12 How to Enable HA • Point Oozie to ZooKeeper Ensemble: <property> <name>oozie.zookeeper.connection.string</name> <value>ZK_HOST1:2181,ZK_HOST2:2181</value> </property> • Point environment variable for callbacks to load balancer: export OOZIE_BASE_URL="http://loadbalancer:11000/oozie"
  13. 13. 13 How to Enable HA: Security • Extra step to configure Kerberos with Load Balancer: <property> <name> oozie.authentication.kerberos.principal </name> <value>HTTP/loadbalancer@REALM</value> </property> • Note: this currently prevents clients from talking directly to any Oozie server
  14. 14. 14 How to Enable HA: Security • Enable Kerberos connection to ZooKeeper and ACLs: <property> <name>oozie.zookeeper.secure</name> <value>true</value> </property> • ACLs prevent malicious users or programs from interfering with Oozie’s znodes
  15. 15. 15 Using Oozie with HA
  16. 16. 16 Using Oozie with HA • New Oozie CLI/REST API command to list all servers $ oozie admin -oozie http://loadbalancer:11000/oozie -servers hostA : http://hostA:11000/oozie hostB : http://hostB:11000/oozie hostC : http://hostC:11000/oozie • Log messages now include which server wrote them 2013-09-29 16:46:20,182 WARN org.apache.oozie.command.wf.ActionStartXCommand: SERVER[hostA] USER[root] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-130925230553293-oozie-oozi-W] ACTION[0000000- 130925230553293-oozie-oozi-W@streaming-node] [***0000000- 130925230553293-oozie-oozi-W@streaming-node***]Action status=RUNNING
  17. 17. 17 To Do What’s left
  18. 18. 18 To Do • HA support for SLAs and HCatalog integration • Sharelib Purging with HA • Log Streaming HA • With Kerberos, Oozie servers can’t talk to each other • Breaks log streaming, sharelibupdate • Other misc improvements
  19. 19. 19

×