Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Opencast Job Dispatching

170 views

Published on

A deep dive into the job and workflow architecture of Opencast.

Published in: Software
  • As a single mother every little bit counts! This has been such a great way for me to earn extra money. As a single mother every little bit counts! Finally, a vehicle for making some honest to goodness real money to make life easier and happier now that I don't have to pull my hair out budgeting every penny every day.Thanks for the rainbow in my sky. ♥♥♥ https://tinyurl.com/vd3y33w
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Opencast Job Dispatching

  1. 1. Opencast Job Dispatching Greg Logan gregorydlogan@gmail.com February 15, 2018 Greg Logan February 15, 2018 1 / 30
  2. 2. Housekeeping This is going to be a deeply technical talk Greg Logan February 15, 2018 2 / 30
  3. 3. Housekeeping This is going to be a deeply technical talk If reality seems to be imploding... Feel free to zone out for a bit Ask questions Greg Logan February 15, 2018 2 / 30
  4. 4. Housekeeping This is going to be a deeply technical talk If reality seems to be imploding... Feel free to zone out for a bit Ask questions This presentation abuses UML Greg Logan February 15, 2018 2 / 30
  5. 5. Housekeeping This is going to be a deeply technical talk If reality seems to be imploding... Feel free to zone out for a bit Ask questions This presentation abuses UML This is being recorded Greg Logan February 15, 2018 2 / 30
  6. 6. Housekeeping This is going to be a deeply technical talk If reality seems to be imploding... Feel free to zone out for a bit Ask questions This presentation abuses UML This is being recorded Shout questions as you think of them Greg Logan February 15, 2018 2 / 30
  7. 7. Opencast Job Dispatching Overview Quick review: Services, and how they are registered Anatomy of a job How is a job created? How does a job get dispatched? What is a workflow? How does it differ from a job? How is a workflow created? (Relatively) complete workflow in steps, a descent into madness Greg Logan February 15, 2018 3 / 30
  8. 8. Quick Review: Service and Service Registration Opencast services register themselves with the service registry This registry is local The database synchronizes the registrations through the cluster Local services talk directly to the local service registry Remote services talk to their remote, which talks to its local registry The architecture of how this all works was explained last talk Greg Logan February 15, 2018 4 / 30
  9. 9. Anatomy of a Job What is an Opencast Job? Database object Greg Logan February 15, 2018 5 / 30
  10. 10. Anatomy of a Job What is an Opencast Job? Database object A representation of a unit of work within Opencast Greg Logan February 15, 2018 5 / 30
  11. 11. Anatomy of a Job What is an Opencast Job? Database object A representation of a unit of work within Opencast A way to asynchronously keep track of your operations! Greg Logan February 15, 2018 5 / 30
  12. 12. Anatomy of a Job What is an Opencast Job? Database object A representation of a unit of work within Opencast A way to asynchronously keep track of your operations! Contains the data for a full operation (ie, encode of a stream) Greg Logan February 15, 2018 5 / 30
  13. 13. Anatomy of a Job What is an Opencast Job? Database object A representation of a unit of work within Opencast A way to asynchronously keep track of your operations! Contains the data for a full operation (ie, encode of a stream) 19 fields! Status Creating Service Type Operation Dispatchable Job Load Blocking Job Blocked By Greg Logan February 15, 2018 5 / 30
  14. 14. Job Creation How is a job created? A job is created by the service registry (SR) when an operation is started Greg Logan February 15, 2018 6 / 30
  15. 15. Job Creation How is a job created? A job is created by the service registry (SR) when an operation is started Each encode generates a job, as does each publish Greg Logan February 15, 2018 6 / 30
  16. 16. Job Creation How is a job created? A job is created by the service registry (SR) when an operation is started Each encode generates a job, as does each publish These jobs may spawn subjobs An encode nearly always spawns an inspect job Greg Logan February 15, 2018 6 / 30
  17. 17. Job Creation How is a job created? A job is created by the service registry (SR) when an operation is started Each encode generates a job, as does each publish These jobs may spawn subjobs An encode nearly always spawns an inspect job Jobs can block waiting for their children Greg Logan February 15, 2018 6 / 30
  18. 18. Job Creation How is a job created? A job is created by the service registry (SR) when an operation is started Each encode generates a job, as does each publish These jobs may spawn subjobs An encode nearly always spawns an inspect job Jobs can block waiting for their children Jobs can block waiting for resources(*) Greg Logan February 15, 2018 6 / 30
  19. 19. Job Creation How is a job created? A job is created by the service registry (SR) when an operation is started Each encode generates a job, as does each publish These jobs may spawn subjobs An encode nearly always spawns an inspect job Jobs can block waiting for their children Jobs can block waiting for resources(*) An undispatchable job is handled by the host which created it Greg Logan February 15, 2018 6 / 30
  20. 20. Job Creation How is a job created? A job is created by the service registry (SR) when an operation is started Each encode generates a job, as does each publish These jobs may spawn subjobs An encode nearly always spawns an inspect job Jobs can block waiting for their children Jobs can block waiting for resources(*) An undispatchable job is handled by the host which created it Ingest Greg Logan February 15, 2018 6 / 30
  21. 21. Job Dispatching: The basics Job dispatching This is where the sausage gets made This is very simplified from the actual code Greg Logan February 15, 2018 7 / 30
  22. 22. Job Dispatching: The (initial) sausage factory function dispatchJobs(List[] jobs) for all job in jobs do serviceType ← job.serviceType candidateServices ← getServicesOfType(serviceType) serviceId ← dispatchJob(job, candidateServices) function dispatchJob(Job job, List services) for all service in services do accepter ← HTTP.POST(job, service) if accepter = null then return accepter.id Greg Logan February 15, 2018 8 / 30
  23. 23. Job Dispatching: Weak Sausages There are a number of issues here Service fairness Service load Job load Priority/Failed jobs Greg Logan February 15, 2018 9 / 30
  24. 24. Job Dispatching: Service and Job Load Job Load values ... are not the actual hardware cost to run a job Greg Logan February 15, 2018 10 / 30
  25. 25. Job Dispatching: Service and Job Load Job Load values ... are not the actual hardware cost to run a job ... are completely arbitrary Greg Logan February 15, 2018 10 / 30
  26. 26. Job Dispatching: Service and Job Load Job Load values ... are not the actual hardware cost to run a job ... are completely arbitrary ... should be thought of as a counter, rather than a load average Greg Logan February 15, 2018 10 / 30
  27. 27. Job Dispatching: Service and Job Load Service Load values ... are the sum of the Jobs currently in the RUNNING state Greg Logan February 15, 2018 11 / 30
  28. 28. Job Dispatching: Service and Job Load Service Load values ... are the sum of the Jobs currently in the RUNNING state ... do not represent the real load on the system Greg Logan February 15, 2018 11 / 30
  29. 29. Job Dispatching: Service and Job Load So what’s the point of the load value? Each node/host defines a maximum load for itself Typically this is equal to the number of processor cores Greg Logan February 15, 2018 12 / 30
  30. 30. Job Dispatching: Service and Job Load So what’s the point of the load value? Each node/host defines a maximum load for itself Typically this is equal to the number of processor cores The node will be assigned at most that much load Greg Logan February 15, 2018 12 / 30
  31. 31. Job Dispatching: Service and Job Load So what’s the point of the load value? Each node/host defines a maximum load for itself Typically this is equal to the number of processor cores The node will be assigned at most that much load (jobs.load) <= node.maxload Greg Logan February 15, 2018 12 / 30
  32. 32. Job Dispatching: Service and Job Load So what’s the point of the load value? Each node/host defines a maximum load for itself Typically this is equal to the number of processor cores The node will be assigned at most that much load (jobs.load) <= node.maxload If node.maxload = 8 job.load = 2 → 4 jobs job.load = 4 → 2 jobs job.load > 4 → 1 jobs Greg Logan February 15, 2018 12 / 30
  33. 33. Job Dispatching: Service and Job Load So what’s the point of the load value? Each node/host defines a maximum load for itself Typically this is equal to the number of processor cores The node will be assigned at most that much load (jobs.load) <= node.maxload If node.maxload = 8 job.load = 2 → 4 jobs job.load = 4 → 2 jobs job.load > 4 → 1 jobs Job load can be fractional! Greg Logan February 15, 2018 12 / 30
  34. 34. Job Dispatching: Service and Job Load So what’s the point of the load value? Each node/host defines a maximum load for itself Typically this is equal to the number of processor cores The node will be assigned at most that much load (jobs.load) <= node.maxload If node.maxload = 8 job.load = 2 → 4 jobs job.load = 4 → 2 jobs job.load > 4 → 1 jobs Job load can be fractional! Job load can be negative! Greg Logan February 15, 2018 12 / 30
  35. 35. Job Dispatching: Service and Job Load So what’s the point of the load value? Each node/host defines a maximum load for itself Typically this is equal to the number of processor cores The node will be assigned at most that much load (jobs.load) <= node.maxload If node.maxload = 8 job.load = 2 → 4 jobs job.load = 4 → 2 jobs job.load > 4 → 1 jobs Job load can be fractional! Job load can be negative! Don’t do this... Greg Logan February 15, 2018 12 / 30
  36. 36. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Greg Logan February 15, 2018 13 / 30
  37. 37. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Really really good at one thing Greg Logan February 15, 2018 13 / 30
  38. 38. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Really really good at one thing Set that job’s cost to very small (zero?) Greg Logan February 15, 2018 13 / 30
  39. 39. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Really really good at one thing Set that job’s cost to very small (zero?) Set that job’s cost to greater than node.maxload everywhere else Greg Logan February 15, 2018 13 / 30
  40. 40. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Really really good at one thing Set that job’s cost to very small (zero?) Set that job’s cost to greater than node.maxload everywhere else Set the rest of the costs to greater than node.maxload Greg Logan February 15, 2018 13 / 30
  41. 41. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Really really good at one thing Set that job’s cost to very small (zero?) Set that job’s cost to greater than node.maxload everywhere else Set the rest of the costs to greater than node.maxload That job will only run on that hardware Greg Logan February 15, 2018 13 / 30
  42. 42. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Really really good at one thing Set that job’s cost to very small (zero?) Set that job’s cost to greater than node.maxload everywhere else Set the rest of the costs to greater than node.maxload That job will only run on that hardware This can block processing! Greg Logan February 15, 2018 13 / 30
  43. 43. Job Dispatching: Service and Job Load Aside: Neat Tricks Specialist nodes Really really good at one thing Set that job’s cost to very small (zero?) Set that job’s cost to greater than node.maxload everywhere else Set the rest of the costs to greater than node.maxload That job will only run on that hardware This can block processing! Current bug: Cheaper encoding not prioritized (MH-12493) Greg Logan February 15, 2018 13 / 30
  44. 44. Job Dispatching: Service and Job Load Taking the safeties off Each node/host defines a maximum load for itself Greg Logan February 15, 2018 14 / 30
  45. 45. Job Dispatching: Service and Job Load Taking the safeties off Each node/host defines a maximum load for itself If the cost for a job exceeds maxload for all nodes the job never processes Greg Logan February 15, 2018 14 / 30
  46. 46. Job Dispatching: Service and Job Load Taking the safeties off Each node/host defines a maximum load for itself If the cost for a job exceeds maxload for all nodes the job never processes org.opencastproject.job.load.acceptexceeding Greg Logan February 15, 2018 14 / 30
  47. 47. Job Dispatching: Service and Job Load Taking the safeties off Each node/host defines a maximum load for itself If the cost for a job exceeds maxload for all nodes the job never processes org.opencastproject.job.load.acceptexceeding This is true by default Greg Logan February 15, 2018 14 / 30
  48. 48. Job Dispatching: Service and Job Load Taking the safeties off Each node/host defines a maximum load for itself If the cost for a job exceeds maxload for all nodes the job never processes org.opencastproject.job.load.acceptexceeding This is true by default Setting this to false is safe Greg Logan February 15, 2018 14 / 30
  49. 49. Job Dispatching: Service and Job Load Taking the safeties off Each node/host defines a maximum load for itself If the cost for a job exceeds maxload for all nodes the job never processes org.opencastproject.job.load.acceptexceeding This is true by default Setting this to false is safe Set this to false prior to changing job loads Greg Logan February 15, 2018 14 / 30
  50. 50. Job Dispatching: Accounting for Load function mainDispatch( ) repeat jobs ← getAllJobs( ) dispatchJobs(jobs) until shutdown function dispatchJobs(List[] jobs) for all job in jobs do serviceType ← job.serviceType candidateServices ← getServicesOfType(serviceType) candidateServices ← filterServicesByLoad(job.load) serviceId ← dispatchJob(job, candidateServices) Greg Logan February 15, 2018 15 / 30
  51. 51. Job Dispatching: Priority One thing people always want: How can I make this recording process in front of that one? Greg Logan February 15, 2018 16 / 30
  52. 52. Job Dispatching: Priority One thing people always want: How can I make this recording process in front of that one? This isn’t that Greg Logan February 15, 2018 16 / 30
  53. 53. Job Dispatching: Priority One thing people always want: How can I make this recording process in front of that one? This isn’t that MH-6850 Greg Logan February 15, 2018 16 / 30
  54. 54. Job Dispatching: Priority One thing people always want: How can I make this recording process in front of that one? This isn’t that MH-6850 This is for handling undispatchable, failed, and queued jobs Greg Logan February 15, 2018 16 / 30
  55. 55. Job Dispatching: Priority One thing people always want: How can I make this recording process in front of that one? This isn’t that MH-6850 This is for handling undispatchable, failed, and queued jobs Undispatchable: No service accepted them Greg Logan February 15, 2018 16 / 30
  56. 56. Job Dispatching: Priority One thing people always want: How can I make this recording process in front of that one? This isn’t that MH-6850 This is for handling undispatchable, failed, and queued jobs Undispatchable: No service accepted them Failed: Did not complete successfully Greg Logan February 15, 2018 16 / 30
  57. 57. Job Dispatching: Priority One thing people always want: How can I make this recording process in front of that one? This isn’t that MH-6850 This is for handling undispatchable, failed, and queued jobs Undispatchable: No service accepted them Failed: Did not complete successfully Queued: New jobs Greg Logan February 15, 2018 16 / 30
  58. 58. Job Dispatching: Accounting for Priority function mainDispatch( ) repeat jobs ← getPriorityJobs( ) dispatchJobs(jobs) jobs ← getRestartJobs( ) dispatchJobs(jobs) jobs ← getQueuedJobs( ) dispatchJobs(jobs) jobs ← getAllJobs( ) dispatchJobs(jobs) until shutdown Greg Logan February 15, 2018 17 / 30
  59. 59. On to workflows What is a workflow It’s a recording? Greg Logan February 15, 2018 18 / 30
  60. 60. On to workflows What is a workflow It’s a recording? It’s a processing run for a recording? Greg Logan February 15, 2018 18 / 30
  61. 61. On to workflows What is a workflow It’s a recording? It’s a processing run for a recording? It’s a collection of jobs Greg Logan February 15, 2018 18 / 30
  62. 62. On to workflows What is a workflow It’s a recording? It’s a processing run for a recording? It’s a collection of jobs It’s a job with some metadata Greg Logan February 15, 2018 18 / 30
  63. 63. The Workflow Service The Workflow Service Keeps track of all workflows Greg Logan February 15, 2018 19 / 30
  64. 64. The Workflow Service The Workflow Service Keeps track of all workflows Organizes the creation of jobs Greg Logan February 15, 2018 19 / 30
  65. 65. The Workflow Service The Workflow Service Keeps track of all workflows Organizes the creation of jobs Organizes the sequence of jobs Greg Logan February 15, 2018 19 / 30
  66. 66. The Workflow Service The Workflow Service Keeps track of all workflows Organizes the creation of jobs Organizes the sequence of jobs Note that this is creation, not execution Greg Logan February 15, 2018 19 / 30
  67. 67. The Workflow Service The Workflow Service Keeps track of all workflows Organizes the creation of jobs Organizes the sequence of jobs Note that this is creation, not execution The origin point of all work in the system Greg Logan February 15, 2018 19 / 30
  68. 68. So how does this work? Who calls the workflow service? You do Created via the admin UI Created via ingest You get a WorkflowInstance Updating the workflow service takes the job ID! Greg Logan February 15, 2018 20 / 30
  69. 69. What does this look like? User AdminUI WorkflowService ServiceRegistry Start .Start() .createJob Greg Logan February 15, 2018 21 / 30
  70. 70. Wait, what? Some of you might have noticed that the previous sequence has problems It just creates a job, then it stops Greg Logan February 15, 2018 22 / 30
  71. 71. Wait, what? Some of you might have noticed that the previous sequence has problems It just creates a job, then it stops It does not actually do any processing Greg Logan February 15, 2018 22 / 30
  72. 72. Wait, what? Some of you might have noticed that the previous sequence has problems It just creates a job, then it stops It does not actually do any processing That’s because your workflow is a job Job type: workflow Job operation START WORKFLOW This gets dispatched just like any other job Greg Logan February 15, 2018 22 / 30
  73. 73. What does this look like? User AdminUI WorkflowService ServiceRegistry Start .Start() .createJob(ST WORKFLOW) Greg Logan February 15, 2018 23 / 30
  74. 74. What does this look like? ServiceRegistry WorkflowService .createJob(START WORKFLOW) .process() Greg Logan February 15, 2018 24 / 30
  75. 75. What does this look like? ServiceRegistry WorkflowService .createJob(START WORKFLOW) .process() .createJob(START OPERATION) Greg Logan February 15, 2018 25 / 30
  76. 76. But wait, there’s more! It begins Everything is a job It’s jobs all the way down What is START OPERATION? Greg Logan February 15, 2018 26 / 30
  77. 77. But wait, there’s more! It begins Everything is a job It’s jobs all the way down What is START OPERATION? It is a Workflow Job Greg Logan February 15, 2018 26 / 30
  78. 78. We need to go deeper... ServiceRegistry WorkflowService .createJob(START WORKFLOW) .process() .createJob(START OPERATION) Greg Logan February 15, 2018 27 / 30
  79. 79. We need to go deeper... ServiceRegistry WorkflowService SomeService createJob(START WORKFLOW) .process() .createJob(START OPERATION) process LoopLoop For each workflow step Greg Logan February 15, 2018 28 / 30
  80. 80. And deeper... ServiceRegistry WorkflowService SomeWOH SomeService .process() .createJob() process .start() .foo() LoopLoop For each workflow step Greg Logan February 15, 2018 29 / 30
  81. 81. Wrapup This was a long, complex talk I hope I was clear Please ask any questions you might have This was actually simplified, there are at least two layers missing Bonus points if you can guess what they are! Greg Logan February 15, 2018 30 / 30

×