Fork/Join for Fun and Profit!

Nov. 8, 2011

More Related Content


Fork/Join for Fun and Profit!

  1. Fork/Join for Fun and Profit! @Sander_Mak
  2. What is the problem anyway?
  3. What is the problem anyway?
  4. What is the problem anyway?
  5. What is the problem anyway? ‣ So let the Compiler figure out the hard stuff! if (n < 2) n else fib(n - 1) + fib(n - 2) n < 2 n ‣ Or maybe not ...
  6. Fork/Join in pictures Fork: Recursively decompose Result large task into subtasks Task 1 Join: Task 2 Task 3 Await results of recursive tasks Task 4 Task 5 Task 6 Task 7 and combine
  7. Fork/Join in pseudocode compute(problem) {   if (problem.size < threshold)     directlySolve(problem)   else {     do-forked { leftResult = compute(left(problem)) rightResult = compute(right(problem)) }     join(leftResult, rightResult)     return combine(leftResult, rightResult)   } }
  8. ForkJoinPool Introducing:   java.uCl.concurrent.ForkJoinPool ForkJoinTask   java.uCl.concurrent.ForkJoinTask RecursiveAction RecursiveTask ForkJoinPool: void execute (ForkJoinTask<?>) T invoke (ForkJoinTask<T>) ForkJoinTask<T> submit (ForkJoinTask<T>)   
  9. ForkJoinPool Introducing:   java.uCl.concurrent.ForkJoinPool Worker 1 ‣ Implements ExecutorService ‣ Autosizing workers Worker 2 ‣ Double‐ended queue ‣ Workstealing algorithm Worker 3   
  10. Sorting demo Mergesort
  11. ForkJoinTasks ‣ 100 < ‘basic computaConal steps’ < 10.000 ‣ Acyclic, typically decreasing in size ‣ Join doesn’t block thread! ‣ Do:  ‣ OpCmize sequenCal threshold ‣ Share, don’t copy input (task locality) ‣ Don’t: ‣ Synchronize/lock (but use: Phaser) ‣ Do blocking I/O
  12. Work stealing Worker 1 Worker 2 Worker 3 ForkJoinPool
  13. Speedups
  14. What about threads? ‣ Heavyweight (try starCng a million) ‣ Implicit dependencies between tasks ‣ Manual synchronizaCon ‣ Deadlock/livelock/race condiCons ‣ Hard to scale to available parallelism
  15. Pooling/ExecutorService then? ‣ ForkJoinPool implements ExecutorService ‣ Coarse‐grained independent tasks ‣ Recursively decomposed tasks spend most Cme  waiCng ‣ In normal threadpool: starvaCon ‣ Task‐queue of threadpool‐backed ExecutorService not  opCmized for many small tasks ‣ No workstealing
  16. Map/Reduce? Environment Single JVM Cluster Model Recursive forking O^en single map Scales with Cores/CPUs Nodes Worker Workstealing No inter‐node  interacDon communicaCon
  17. Demo Sony’s been hacked... Are we compromised...?
  18. Fork/Join and ‣ ForkJoinPool starts threads ‣ Illegal in EJBs ‣ Fair game in servlets/CDI beans ‣ Don’t create ForkJoinPool for each request ‣ Idea: WorkManager to create poolthreads ‣ Single pool, async submit(ForkJoinTask) ‣ Don’t Ce up request thread: Servlet 3.0
  19. Alternatives outside Java? ‣ Actors ‣ So^ware transacConal memory ‣ Dataflow concurrency ‣ Agents ‣ ... Some of this built on F/J Akka GPars
  20. Criticism ‣ ImplementaCon too complex (uses Unsafe) ‣ Some assumpCons quesConable ‣ 1‐1 mapping workerthread/OS thread ‣ Workstealing best opCon? ‣ Scalability 100+ cores?
  21. Future GPU? Java 8 OpenCL? Java 7 Parallel Java 5 Collections Fork/Join Java 1.0 - 1.4 - j.u.concurrent: high-level locks, concurrent coll. - threads - synchronized - volatile
  22. Future int scanLog(List<String> lines, String query) { Pattern p = ... return lines.parallel() .filter(s => p.matcher(s).matches()) .count(); }
  23. ForkJoinPool.shutDownNow() Questions? Code @

Editor's Notes

  1. Sander Mak, Info Support\nGaan het hebben over Doug Lea&amp;#x2019;s fork/join (al sinds 1998 in the works), nu onderdeel van Java7\nFun &amp; Profit: profit vanwege snellere apps, fun hangt van je interesses af :)\n
  2. No choice but to churn on each task sequentially\nPentium 5, 3.4Ghz -&gt; hitting physical limits! (approach: speed, smarter instructions onchip)\nThreading alleviates IO contention, not CPU contention\n
  3. No choice but to churn on each task sequentially\nPentium 5, 3.4Ghz -&gt; hitting physical limits! (approach: speed, smarter instructions onchip)\nThreading alleviates IO contention, not CPU contention\n
  4. More cores, lower speed. Do nothing -&gt; your app could be slower! Embrace parallelism.\nIn webapps we get a lot of request-oriented parallelism for free.\nWhat about other applications/algorithms?\n
  5. CPU architectures changed all the time without having to change Java code... why is this different? -&gt; Implicit data-dependencies in code (also, shared memory/state problems)\n
  6. Builds a tree with explicit dependencies between tasks!\nDivide-and-conquer algorithms\nHier: 6 fork acties, 6 join acties\n
  7. Threshold is key: overhead vs. effective work per unit\nApplicable to for example:search, sort, aggregating data\n
  8. ForkJoinTask is a future\n
  9. - getAvailableProcessors on System.runtime\n- possible to set own size (e.g. restrict to 4 of n cores)\n
  10. \n
  11. \n
  12. - forked tasks pushed on top local deque\n- idle worker steals from bottom deque of other worker -&gt; prevent contention on deque, and largest tasks get stolen!\n- no work stolen? eventually yield workerthread. Typ. 0-2% stolen\n- Workstealing == loadbalancing without central coord!\n
  13. Fibonacci in het begin bekeken. Allemaal reken-intensief.\nSprekend voorbeeld: password hashes tegen rainbow table aanhouden.\n
  14. Still threads are better when doing blocking I/O networking etc.\n
  15. \n
  16. F/J: recursively apply same task, M/R: apply different map/reduce steps, not necessarily recursive (but could happen)\n
  17. \n
  18. Websphere staat bv ook geen threadcreatie in servlet toe\nEvt. JCA adapter noemen\n
  19. \n
  20. \n
  21. \n
  22. needs lambdas!\nbut already available in jsr166y with anonymous inner classes\n
  23. \n