12. How is fork/join
implemented?
• One Thread Per Fork (inefficient!)
• Thread Pool
(race condition for one global task queue!)
• Internal Task Queue (fewer race condition)
• Task Stealing (load balancer)
16. Then, suppose that Task 2 spawns three subtasks: Task 3, Task
4, and Task 5. These tasks are placed on the local queue of
Worker Thread 1.
17. Next, assume Worker Thread 1 completes Task 2. It looks at its local queue, and takes the last task (Task 5) off to process.
It purposefully takes the last task, the point being that the last task might still be in the cache, while it is likely that the
first task (Task 3) is out of the cache. Hence, there are performance improvements in processing local queues in a LIFO
order.