Push jobs: an orchestration building block for private Chef


Published on

Push jobs is a new feature in Opscode Private Chef that will allow a user to run commands across hundreds of chef managed servers. Push Jobs leverages Erlang/OTP and ZeroMQ to provide scalable and fault tolerant execution.

In this talk I’ll cover the general motivation behind the design and an architectural overview of the system. This will include details of we used Erlang and ZeroMQ to build a robust, scalable system. I’ll also do a demo of the push job feature in action, covering the push jobs server, execution client and knife command line interface.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Push jobs: an orchestration building block for private Chef

  1. 1. The Opscode Push Jobs ServiceMark AndersonApril 28, 2013
  2. 2. Push jobsin a command line• knife job start -quorum 90% chef-client --search role:webapp• Finds all nodes with role webapp• Submits a job with quorum of 90% to the pushy server.• Checks quorum• Starts job on available nodes• Gathers success and failures• And will do this for ten nodes...or a thousand
  3. 3. Push jobsWhy not use X?• We wanted to build a tool that could be deeply integrated into chef.• Integrated with authentication model• Clients use their client key to authenticate to the server• Users use their keys to send commands to the api• Integrated with the authorization model• Groups control access now• Eventually there will be fine grained ACLs• Integrated with search and other Chef features• Scalability
  4. 4. Push jobsServer• Erlang service• Extends the Chef REST API• Job creation and tracking• Push client configuration• Controls the clients via ZeroMQ• Heartbeating to track node availability• Command execution
  5. 5. Push jobsClient• Simple ruby client• Receives heartbeats from the server• Sends back heartbeats to the server• Executes commands• Configuration requirements are minimal• The client initiates all connections to the server• Most configuration is via chef API call using the client key• Opens ZeroMQ connections to server for all other communication
  6. 6. Push jobsThe lifecycle of a jobServerClientJobAcceptedSendCommandClientsACKWait forQuorumStart ExecClientsExecCollectResults
  7. 7. Push jobsKnife extension• All control for pushy jobs is via extensions to the chef API• Node status• Job control• start• stop• status• Job listing
  8. 8. Pushy Demo
  9. 9. Push JobsDemoChef/PushyServerChicoHarpoGrouchoGummoZeppo
  10. 10. The nitty gritty
  11. 11. Internals:Client server interaction• The client initiates all connections to the server• The client authenticates to the server and receives• A session key and TTL• ZeroMQ connection information (ports, heartbeat rate, etc)• Subscribes via ZeroMQ to server heartbeats (1 to many)• Connects via ZeroMQ to the server (1-1)• Sends heartbeats to the server as long as it receives server heartbeats• Awaits commands from the server
  12. 12. Security• Protocol security• We leverage the existing API signing mechanism to exchange session keys• All ZeroMQ messages are signed• HMAC SHA256 signing protocol protects point to point messages• RSA 2048/SHA1 protects broadcast messages (just like the chef API)• Relies on the SSL chain of trust to the server.
  13. 13. Access control• Access rights controlled by groups• ‘push_job_writers’ group controls job creation and deletion• ‘push_job_readers’ group controls read access to job status and results• Whitelist for commands• The client rejects commands that aren’t on the whitelist• In the future we’d like to do finer grained access control• Perhaps persistent job templates with their own access rights and commands
  14. 14. Implementation
  15. 15. Why erlang?• Chef 11 work has been very successful• Easier integration• The process (think threads) model allows a great deal of parallelism• Every node has an erlang process to track its state• Every job has a process to track its state
  16. 16. Server process structureMessageswitchHeartbeatgeneratorREST APIClientsJobMonitorJobMonitorJobMonitorJobMonitorJobMonitorClientMonitorClientMonitorClientMonitorClientMonitorClientMonitor
  17. 17. Why ZeroMQ?• Abstracts away much of the pain of socket libraries• Reliable delivery• Portable• Broad language support• Proven scalability
  18. 18. Some impedance mismatch• ZeroMQ provides a lot of goodness for asynchronous execution• That is really helpful in many languages• Erlang doesn’t need that so much, and encourages finer grained tasks• ZeroMQ hides a lot of interesting state• It turns out we care about whether a node dies and comes back• Need a better ZeroMQ/Erlang glue library• ZeroMQ 3 offers some very interesting options for future work
  19. 19. Performance and scalability results• We can run a job over 2000 nodes• 15 sec heartbeats• c1.medium• Bottlenecks• Heartbeats consume a lot of resources• Everything goes through router process for zeromq messages
  20. 20. Moving towards a federated architectureChef APISQLErchefSolr SQLPush ReportingSQLSQLAuth
  21. 21. Where are we now?
  22. 22. Availability: Limited release• Time to get people using it• Private Chef only for now• Hosted Chef deferred• Scalability: hosted has more than 2k nodes• Security: ZeroMQ messages aren’t encrypted• Open Source: Eventually
  23. 23. Future directions• Scalability improvements• Our biggest customers will want more• Jobs as first class, persistent objects• More control of execution of jobs,• Inside a job: (push jobs makes an excellent DDOS tool)• Between jobs: Chaining results between jobs• Push jobs as a building block• Conducting experiments using it for building block for continuous delivery
  24. 24. Questions?