ForkJoin




@Sander_Mak
About
Coding      at
                    (           )


  Writing              Blog @
                 branchandbound.net




   Speaking
Agenda
ForkJoin   Setting the scene
           ForkJoin API & Patterns
           Comparisons & Future

           break

 Akka      Introduction
           Actors
           Async IO
Problem?
Problem?
Problem?
Problem?
Problem?


Architectural mismatch
Problem?
So let the compiler/runtime solve it!
Problem?
So let the compiler/runtime solve it!


                                                  n	
  <	
  2




                                  n




if	
  (n	
  <	
  2)	
  
	
  {	
  return	
  n;	
  }
else	
  
	
  {	
  return	
  fib(n	
  -­‐	
  1)	
  +	
  fib(n	
  -­‐	
  2);	
  }
Problem?
So let the compiler/runtime solve it!




                  Unfortunately not in ,
                    but feasible in pure
                 functional languages like
First attempt
	
  	
  	
  	
  public	
  static	
  int	
  fib(final	
  int	
  n)	
  {
	
  	
  	
  	
  	
  	
  	
  	
  if	
  (n	
  <	
  2)	
  {	
  return	
  n;	
  }
	
  	
  	
  	
  	
  	
  	
  	
  else	
  {
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  final	
  int[]	
  t1	
  =	
  {0},	
  t2	
  =	
  {0};
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Thread	
  thread1	
  =	
  new	
  Thread()	
  {
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  public	
  void	
  run()	
  {
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  t1[0]	
  =	
  fib(n	
  -­‐	
  1);
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  };

	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Thread	
  thread2	
  =	
  new	
  Thread()	
  {
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  public	
  void	
  run()	
  {
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  t2[0]	
  =	
  fib(n	
  -­‐	
  2);
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  };
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  thread1.start();	
  thread2.start();
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  thread1.join();	
  thread2.join();

	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  return	
  t1[0]	
  +	
  t2[0];
	
  	
  	
  	
  	
  	
  	
  	
  }
	
  	
  	
  	
  }                                                                                                              if	
  (n	
  <	
  2)	
  
                                                                                                                               	
  {	
  return	
  n;	
  }
                                                                                                                               else	
  
                                                                                                                               	
  {	
  return	
  fib(n	
  -­‐	
  1)	
  +	
  fib(n	
  -­‐	
  2);	
  }
Threads
Threads are mostly waiting
Improve with threadpooling
 but not by much
 starvation
What if we sum in a new thread?
 synchronization
 visibility issues (Java Memory Model)
ForkJoin
Fork:
Recursively decompose                                       Result = 3
large task into subtasks
                                                                Fib(4)

Join:
                                              Fib(3)                              Fib(2)
Await results of
recursive tasks
and combine                         Fib(2)             Fib(1)            Fib(1)            Fib(0)



                           Fib(1)            Fib(0)
ForkJoin
Introducing:
                            ForkJoinPool
           java.util.concurrent.

       java.util.concurrent.ForkJoinTask
                                                      ForkJoinTask




                                            RecursiveAction   RecursiveTask




 ForkJoinPool:
void                               execute (ForkJoinTask<?>)
T                                  invoke (ForkJoinTask<T>)

	
  	
  
ForkJoin
compute(problem)	
  {
	
  	
  if	
  (problem.size	
  <	
  threshold)
	
  	
  	
  	
  directlySolve(problem)
	
  	
  else	
  {
	
  	
  	
  	
  do-­‐forked	
  {
	
  	
  	
  	
  	
  	
  	
  leftResult	
  	
  =	
  compute(left(problem))
	
  	
  	
  	
  	
  	
  	
  rightResult	
  =	
  compute(right(problem))
	
  	
  	
  	
  }
	
  	
  	
  	
  join(leftResult,	
  rightResult)
	
  	
  	
  	
  return	
  combine(leftResult,	
  rightResult)
	
  	
  }
}




                                                Keep overhead down:
                                                  sequential cutoff
ForkJoinPool
Implements ExecutorService
Autosize workers (overridable)
Work queue per worker
 Work-stealing between queues
ForkJoinPool
Work stealing
ForkJoinPool
    Work stealing




1
ForkJoinPool
    Work stealing




1
ForkJoinPool
    Work stealing




2

3

1
ForkJoinPool
              Work stealing




    st
      ol
         en

3               2
ForkJoinPool
    Work stealing




3     2
ForkJoinPool
    Work stealing




4     6

5     7

3     2
ForkJoinPool
    Work stealing



4




          sto
           len
5     7             6
ForkJoinPool
    Work stealing



4

8    10             12

          sto
           len
9    11             13

5     7             6
ForkJoinPool
Scalability
Patterns
          #1 Problem structure


Acyclic        CPU Bound




            - I/O upfront
            - No webcrawlers
              please...
Patterns
                         #2 Sequential cutoff


  Guidelines
                         compute(problem)	
  {
> 100 and < 10.000       	
  	
  if	
  (problem.size	
  <	
  threshold)
‘basic computational     	
  	
  	
  	
  directlySolve(problem)
                         	
  	
  else	
  {
steps’                   	
  	
  	
  	
  do-­‐forked	
  {
                         	
  	
  	
  	
  	
  	
  	
  ...
Experiment and tune      	
  	
  	
  	
  }
                         	
  	
  	
  	
  join	
  ...
                         	
  	
  }
Never lock/synchronize   }
Patterns
                         #3 Fork once, fool me twice



      Why?               left.fork()
                         rightResult	
  =	
  right.compute()
                         leftResult	
  =	
  left.join()
Implementation           return	
  leftResult	
  +	
  rightResult
specific
Avoids overhead
                         left.fork()
Especially on smallish   leftResult	
  =	
  left.join()
                         rightResult	
  =	
  right.compute()
tasks                    return	
  leftResult	
  +	
  rightResult
Patterns
                                            #4 Use convenience methods




invoke                                       invokeAll
fjTask.invoke();                             invokeAll(fjTask1,	
  fjTask2,
//	
  Equivalent	
  to                       	
  	
  	
  fjTaskN);
fjTask.fork();
fjTask.join();                               //	
  Or:
//	
  But	
  always	
  tries	
  execution
//	
  in	
  current	
  thread	
  first!      invokeAll(collectionOfTasks);
Demo


2.797.245 world cities (Maxmind.com)
Demo 1: simple search
Demo 2: lat/long bounded
Demo

Search range:
  i.e. 0.4°
ForkJoin & Java EE
   ForkJoinPool creates threads
    Illegal in EJBs
    CDI/Servlet is a gray area
   JCA/WorkManager could work
   @Asynchronous as alternative

But: Java EE7 may contain javax.util.concurrent
Comparison
                  ExecutorService

Thread pooling (bounded or
unbounded)
Single work queue (no workstealing)
Coarse-grained independent tasks
Blocking I/O ok
Comparison
                       MapReduce




Environment   Single JVM          Cluster

Model         Recursive forking   Often single map

Scales with   Cores/CPUs          Nodes

Worker        Workstealing        No inter-node
interaction                       communication
Criticism
Complex implementation
 (uses sun.misc.Unsafe)


                          Scalability > 100 cores?


         Questionable assumption:
           1-1 mapping worker
              thread to core
Criticism
Complex implementation
 (uses sun.misc.Unsafe)


                          Scalability > 100 cores?


         Questionable assumption:
           1-1 mapping worker
              thread to core


                                   Too low-level
Future
InfoQ: “What is supported out of the box?”



                         “Almost nothing".

             We chickened out; we are not going to release
                        the layers on top of this
           That means that right now, people who are using
            this framework are going to be the people who
                 actually get into this parallel recursive
                 decomposition and know how to use it.
Future
JDK 8 plans

   Parallel collections
     Depends on Project Lambda
   CountedCompleter for I/O



Some already available in jsr166extra
Future
int	
  findCities(List<String>	
  cities,	
  String	
  query)	
  {
	
  	
  	
  Pattern	
  p	
  =	
  Pattern.compile(query)
	
  	
  	
  return	
  cities.parallel()
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .filter(c	
  =>	
  p.matcher(c).matches());
	
  	
  	
  	
  
}




               int	
  findNearestCities(List<String>	
  lines,	
  int	
  lat,	
  int	
  lng)	
  {
               	
  	
  	
  return	
  lines.parallel()
               	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .map(c	
  =>	
  toCity(c))
               	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .filter(c	
  =>	
  c.isNear(lat,	
  lng))
               	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .sort();
               }
Questions?


        Code @ bit.ly/bejug-fj

Fork Join (BeJUG 2012)

  • 1.
  • 2.
    About Coding at ( ) Writing Blog @ branchandbound.net Speaking
  • 3.
    Agenda ForkJoin Setting the scene ForkJoin API & Patterns Comparisons & Future break Akka Introduction Actors Async IO
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    Problem? So let thecompiler/runtime solve it!
  • 10.
    Problem? So let thecompiler/runtime solve it! n  <  2 n if  (n  <  2)    {  return  n;  } else    {  return  fib(n  -­‐  1)  +  fib(n  -­‐  2);  }
  • 11.
    Problem? So let thecompiler/runtime solve it! Unfortunately not in , but feasible in pure functional languages like
  • 12.
    First attempt        public  static  int  fib(final  int  n)  {                if  (n  <  2)  {  return  n;  }                else  {                        final  int[]  t1  =  {0},  t2  =  {0};                                                Thread  thread1  =  new  Thread()  {                                public  void  run()  {                                        t1[0]  =  fib(n  -­‐  1);                                }  };                        Thread  thread2  =  new  Thread()  {                                public  void  run()  {                                                t2[0]  =  fib(n  -­‐  2);                                }  };                        thread1.start();  thread2.start();                        thread1.join();  thread2.join();                        return  t1[0]  +  t2[0];                }        } if  (n  <  2)    {  return  n;  } else    {  return  fib(n  -­‐  1)  +  fib(n  -­‐  2);  }
  • 13.
    Threads Threads are mostlywaiting Improve with threadpooling but not by much starvation What if we sum in a new thread? synchronization visibility issues (Java Memory Model)
  • 14.
    ForkJoin Fork: Recursively decompose Result = 3 large task into subtasks Fib(4) Join: Fib(3) Fib(2) Await results of recursive tasks and combine Fib(2) Fib(1) Fib(1) Fib(0) Fib(1) Fib(0)
  • 15.
    ForkJoin Introducing: ForkJoinPool java.util.concurrent. java.util.concurrent.ForkJoinTask ForkJoinTask RecursiveAction RecursiveTask ForkJoinPool: void execute (ForkJoinTask<?>) T invoke (ForkJoinTask<T>)    
  • 16.
    ForkJoin compute(problem)  {    if  (problem.size  <  threshold)        directlySolve(problem)    else  {        do-­‐forked  {              leftResult    =  compute(left(problem))              rightResult  =  compute(right(problem))        }        join(leftResult,  rightResult)        return  combine(leftResult,  rightResult)    } } Keep overhead down: sequential cutoff
  • 17.
    ForkJoinPool Implements ExecutorService Autosize workers(overridable) Work queue per worker Work-stealing between queues
  • 18.
  • 19.
    ForkJoinPool Work stealing 1
  • 20.
    ForkJoinPool Work stealing 1
  • 21.
    ForkJoinPool Work stealing 2 3 1
  • 22.
    ForkJoinPool Work stealing st ol en 3 2
  • 23.
    ForkJoinPool Work stealing 3 2
  • 24.
    ForkJoinPool Work stealing 4 6 5 7 3 2
  • 25.
    ForkJoinPool Work stealing 4 sto len 5 7 6
  • 26.
    ForkJoinPool Work stealing 4 8 10 12 sto len 9 11 13 5 7 6
  • 27.
  • 28.
    Patterns #1 Problem structure Acyclic CPU Bound - I/O upfront - No webcrawlers please...
  • 29.
    Patterns #2 Sequential cutoff Guidelines compute(problem)  { > 100 and < 10.000    if  (problem.size  <  threshold) ‘basic computational        directlySolve(problem)    else  { steps’        do-­‐forked  {              ... Experiment and tune        }        join  ...    } Never lock/synchronize }
  • 30.
    Patterns #3 Fork once, fool me twice Why? left.fork() rightResult  =  right.compute() leftResult  =  left.join() Implementation return  leftResult  +  rightResult specific Avoids overhead left.fork() Especially on smallish leftResult  =  left.join() rightResult  =  right.compute() tasks return  leftResult  +  rightResult
  • 31.
    Patterns #4 Use convenience methods invoke invokeAll fjTask.invoke(); invokeAll(fjTask1,  fjTask2, //  Equivalent  to      fjTaskN); fjTask.fork(); fjTask.join(); //  Or: //  But  always  tries  execution //  in  current  thread  first! invokeAll(collectionOfTasks);
  • 32.
    Demo 2.797.245 world cities(Maxmind.com) Demo 1: simple search Demo 2: lat/long bounded
  • 33.
  • 34.
    ForkJoin & JavaEE ForkJoinPool creates threads Illegal in EJBs CDI/Servlet is a gray area JCA/WorkManager could work @Asynchronous as alternative But: Java EE7 may contain javax.util.concurrent
  • 35.
    Comparison ExecutorService Thread pooling (bounded or unbounded) Single work queue (no workstealing) Coarse-grained independent tasks Blocking I/O ok
  • 36.
    Comparison MapReduce Environment Single JVM Cluster Model Recursive forking Often single map Scales with Cores/CPUs Nodes Worker Workstealing No inter-node interaction communication
  • 37.
    Criticism Complex implementation (usessun.misc.Unsafe) Scalability > 100 cores? Questionable assumption: 1-1 mapping worker thread to core
  • 38.
    Criticism Complex implementation (usessun.misc.Unsafe) Scalability > 100 cores? Questionable assumption: 1-1 mapping worker thread to core Too low-level
  • 39.
    Future InfoQ: “What issupported out of the box?” “Almost nothing". We chickened out; we are not going to release the layers on top of this That means that right now, people who are using this framework are going to be the people who actually get into this parallel recursive decomposition and know how to use it.
  • 40.
    Future JDK 8 plans Parallel collections Depends on Project Lambda CountedCompleter for I/O Some already available in jsr166extra
  • 41.
    Future int  findCities(List<String>  cities,  String  query)  {      Pattern  p  =  Pattern.compile(query)      return  cities.parallel()                                .filter(c  =>  p.matcher(c).matches());         } int  findNearestCities(List<String>  lines,  int  lat,  int  lng)  {      return  lines.parallel()                              .map(c  =>  toCity(c))                              .filter(c  =>  c.isNear(lat,  lng))                              .sort(); }
  • 42.
    Questions? Code @ bit.ly/bejug-fj