Your SlideShare is downloading. ×
ParaForming - Patterns and Refactoring for Parallel Programming
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

ParaForming - Patterns and Refactoring for Parallel Programming


Published on

Despite Moore's "law", uniprocessor clock speeds have now stalled. Rather than single processors running at ever higher clock speeds, it is …

Despite Moore's "law", uniprocessor clock speeds have now stalled. Rather than single processors running at ever higher clock speeds, it is
common to find dual-, quad- or even hexa-core processors, even in consumer laptops and desktops.
Future hardware will not be slightly parallel, however, as in today's multicore systems, but will be
massively parallel, with manycore and perhaps even megacore systems
becoming mainstream.
This means that programmers need to start thinking parallel. To achieve this they must move away
from traditional programming models where parallelism is a
bolted-on afterthought. Rather, programmers must use languages where parallelism is deeply embedded into the programming model
from the outset.

By providing a high level model of computation, without explicit ordering of computations,
declarative languages in general, and functional languages in particular, offer many advantages for parallel
One of the most fundamental advantages of the functional paradigm is purity.
In a purely functional language, as exemplified by Haskell, there are simply no side effects: it is therefore impossible for parallel computations to conflict with each
other in ways that are not well understood.
ParaForming aims to radically improve the process
of parallelising purely functional programs through a comprehensive set of high-level parallel refactoring patterns for Parallel Haskell,
supported by advanced refactoring tools.
By matching parallel design patterns with appropriate algorithmic skeletons
using advanced software refactoring techniques and novel cost information, we will bridge the gap between fully automatic
and fully explicit approaches to parallelisation, helping programmers "think parallel" in a systematic,
guided way. This talk introduces the ParaForming approach, gives some examples and shows how
effective parallel programs can be developed using advanced refactoring technology.

Published in: Education, Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Paraforming:   Forming  Parallel  (Func2onal)  Programs  from   High-­‐Level  Pa:erns    using  Advanced   Refactoring   Kevin  Hammond,  Chris  Brown,  Vladimir  Janjic   University  of  St  Andrews,  Scotland   Build  Stuff,  Vilnius,  Lithuania,  December  10  2013   T:    @paraphrase_fp7,  @khstandrews   E:­‐   W:!
  • 2. The  Present   Pound  versus  Dollar   3  
  • 3. The  Future:  “megacore”  computers?   §  Hundreds  of  thousands,  or  millions,  of  (small)  cores   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   Core   5  
  • 4. What  will  “megacore”  computers  look   like?   §  Probably  not  just  scaled  versions  of  today’s  mul2core   §  §  §  §  §  §  Perhaps  hundreds  of  dedicated  lightweight  integer  units   Hundreds  of  floa9ng  point  units  (enhanced  GPU  designs)   A  few  heavyweight  general-­‐purpose  cores   Some  specialised  units  for  graphics,  authen9ca9on,  network  etc   possibly  so*  cores  (FPGAs  etc)   Highly  heterogeneous   6  
  • 5. What  will  “megacore”  computers  look   like?   §  Probably  not  uniform  shared  memory   §  NUMA  is  likely,  even  hardware  distributed  shared  memory   §  or  even  message-­‐passing  systems  on  a  chip   §  shared-­‐memory  will  not  be  a  good  abstrac:on   int  arr      [x]  [y];   7  
  • 6. Laki (NEC Nehalem Cluster) and hermit (XE6) Laki hermit (phase 1 step 1) 700 dual socket Xeon 5560 2,8GHz (“Gainestown”) 4/6 Nodes with 32GB and 64GB memory reflecting different user needs 2.7PB storage capacity @ 150GB/s IO bandwidth I Scientific Linux 6.0 Each compute node will have 2 sockets AMD Interlagos @ 2.3GHz 16 Cores each leading to 113.664 cores External Access Nodes, Pre- Postprocessing Nodes, Remote Visualization Nodes 32 nodes with additional Nvidia Tesla S1070 I 96 service nodes and 3552 compute nodes I Infiniband (QDR) I I 12 GB DDR3 RAM / node I 38 racks with 96 nodes each I I I I I :: HLRS in ParaPhrase :: Turin, 4th/5th October 2011 :: 8  
  • 7. The  Biggest  Computer  in  the  World   Tianhe-­‐2,  Chinese  Na2onal  University  of  Defence  Technology     33.86  petaflops/s  (June  17,  2013)   16,000  Nodes;  each  with  2  Ivy  Bridge  mul9cores  and  3  Xeon  Phis   3,120,000  x86  cores  in  total!!!   9  
  • 8. It’s  not  just  about  large  systems   •  Even  mobile  phones  are   mul9core   §  Samsung  Exynos  5  Octa  has  8  cores,  4  of   which  are  “dark”   •  Performance/energy  tradeoffs   mean  systems  will  be   increasingly  parallel   •  If  we  don’t  solve  the  mul9core   challenge,  then  no  other   advances  will  maber!   ALL  Future     Programming  will  be   Parallel!   10  
  • 9. The  Manycore  Challenge   “Ul9mately,  developers  should  start  thinking  about  tens,  hundreds,  and   thousands  of  cores  now  in  their  algorithmic  development  and  deployment   pipeline.”         Anwar  Ghuloum,  Principal  Engineer,  Intel  Microprocessor  Technology  Lab   The  ONLY  important  challenge  in  Computer  Science   (Intel)   “The  dilemma  is  that  a  large  percentage  of  mission-­‐cri9cal  enterprise  applica9ons   will  not  ``automagically''  run  faster  on  mul9-­‐core  servers.  In  fact,  many  will   actually  run  slower.  We  must  make  it  as  easy  as  possible  for  applica9ons     programmers  to  exploit  the  latest  developments  in  mul9-­‐core/many-­‐core   Also  recognised  as  thema9c  priori9es  by  EU  and   architectures,  while  s9ll  making  it  easy  to  target  future  (and  perhaps   na9onal  funding  bodies   unan9cipated)  hardware  developments.”     Patrick  Leonard,  Vice  President  for  Product  Development   Rogue  Wave  Sobware  
  • 10. But  Doesn’t  that  mean  millions  of   threads  on  a  megacore  machine??   13  
  • 11. How  to  build  a  wall   (with  apologies  to  Ian  Watson,  Univ.  Manchester)  
  • 12. How  to  build  a  wall  faster  
  • 13. How  NOT  to  build  a  wall   Typical  CONCURRENCY   Approaches  require  the   Programmer  to  solve  these   Task  iden2fica2on  is  not  the  only  problem…   Must  also  consider  Coordina9on,  communica9on,  placement,   scheduling,  …  
  • 14. We  need  structure   We  need  abstrac2on     We  don’t  need  another  brick  in  the  wall   17  
  • 15. Thinking  Parallel   §  Fundamentally,  programmers  must  learn  to  “think  parallel”   §  this  requires  new  high-­‐level  programming  constructs   §  perhaps  dealing  with  hundreds  of  millions  of  threads   §  You  cannot  program  effec2vely  while  worrying  about  deadlocks  etc.   §  they  must  be  eliminated  from  the  design!   §  You  cannot  program  effec2vely  while  fiddling  with  communica2on  etc.   §  this  needs  to  be  packaged/abstracted!   §  You  cannot  program  effec2vely  without  performance  informa2on   §  this  needs  to  be  included  as  part  of  the  design!   18  
  • 16. A  Solu2on?   “The  only  thing  that  works  for   parallelism  is  func2onal   programming”   Bob  Harper,  Carnegie  Mellon  University  
  • 17. Parallel  Func2onal  Programming   §  No  explicit  ordering  of  expressions   §  Purity  means  no  side-­‐effects   §  Impossible  for  parallel  processes  to  interfere  with  each  other   §  Can  debug  sequen2ally  but  run  in  parallel   §  Enormous  saving  in  effort   §  Programmer  concentrate  on  solving  the  problem   §  Not  por9ng  a  sequen9al  algorithm  into  a  (ill-­‐defined)  parallel  domain   §  No  locks,  deadlocks  or  race  condi2ons!!   §  Huge  produc2vity  gains!   λ   λ   λ  
  • 18. ParaPhrase  Project:  Parallel  Pa:erns  for  Heterogeneous  Mul2core  Systems   (ICT-­‐288570),    2011-­‐2014,  €4.2M  budget     13  Partners,  8  European  countries    UK,  Italy,  Germany,  Austria,  Ireland,  Hungary,  Poland,  Israel     Coordinated  by  Kevin  Hammond  St  Andrews   0  
  • 19. The  ParaPhrase  Approach   §  Start  bobom-­‐up   §  iden9fy    (strongly  hygienic)  COMPONENTS   §  using  semi-­‐automated  refactoring   both  legacy  and   new  programs   §  Think  about  the  PATTERN  of  parallelism   §  e.g.  map(reduce),  task  farm,  parallel  search,  parallel  comple9on,  ...   §  STRUCTURE  the  components  into  a  parallel  program   §  turn  the  pa?erns  into  concrete  (skeleton)  code   §  Take  performance,  energy  etc.  into  account  (mul9-­‐objec9ve  op9misa9on)   §  also  using  refactoring   §  RESTRUCTURE  if  necessary!  (also  using  refactoring)   25  
  • 20. Some  Common  Pa:erns   §  High-­‐level  abstract  paberns  of  common  parallel  algorithms   Google  map-­‐ reduce  combines   two  of  these!   Generally,  we  need   to  nest/combine   paberns  in  arbitray   ways   35  
  • 21. The  Skel  Library  for  Erlang   §  Skeletons  implement  specific  parallel  paberns   §  Pluggable  templates   §  Skel  is  a  new  (AND  ONLY!)  Skeleton  library  in  Erlang   §  map,  farm,  reduce,  pipeline,  feedback   §  instan9ated  using  skel:run   §  Fully  Nestable­‐   hbps://   §  A  DSL  for  parallelism   ! OutputItems = skel:run(Skeleton, InputItems).! ! 36  
  • 22. e Parallel  Pipeline  Skeleton   §  Each  stage  of  the  pipeline  can  be  executed  in  parallel   §  The  input  and  output  are  streams   {pipe, [Skel1 , Skel2 , · · · , Skeln ]} Tn · · · T1 Skel1 Skel2 ··· Skeln Tn · · · T1 skel:run([{pipe,[Skel1, Skel2,..,SkelN]}], Inputs).! Inc = { seq , fun ( X ) - X +1 end } , ! Double = { seq , fun ( X ) - X *2 end } , skel : run ( { pipe , [ Inc , Double ] } ,   [ 1 ,2 ,3 ,4 ,5 ,6 ] ). 37  
  • 23. m Farm  Skeleton     §  Each  worker  is  executed  in  parallel   §  A  bit  like  a  1-­‐stage  pipeline   {farm, Skel, M} Skel1 Tn · · · T1   ! Skel2 . . . Tn · · · T1 SkelM skel:do([{farm, Skel, M}], Inputs).! nc = { seq , fun ( X ) - X +1 end } , 38  
  • 24. Using  The  Right  Pa:ern  Ma:ers   Speedup Speedups for Matrix Multiplication 24 22 20 18 16 14 12 10 8 6 4 2 Naive Parallel Farm Farm with Chunk 16 12 4 8 12 16 No. cores 20 24 39  
  • 25. The  ParaPhrase  Approach   Erlang   SequenGal   Code   Generic   Pa:ern  Library   Parallel   Code   Erlang   C/C++   Java   Haskell   Cos9ng/ Profiling   Refactoring   C/C++   Java   ...   ...   Haskell   Mellanox  Infiniband   Nvidia   Tesla   AMD   Opteron   AMD   Opteron   Intel   Core   Intel   Core   Nvidia   GPU   Nvidia   GPU   Intel   GPU   Intel   GPU   Intel   Xeon  Phi  
  • 26. Refactoring   §  Refactoring  changes  the   structure  of  the  source  code   §  using  well-­‐defined  rules   §  semi-­‐automa:cally  under   programmer  guidance                     Review
  • 27. S1Refactoring:  Farm  Introduc2on   S2 ⌘ P ipe(S1 , S2 ) pipe seq Map(S1 S2 , d, r) ⌘ Map(S1 , d, r) Map(S2 , d, r) map fission/fusion S ⌘ F arm(S) farm intro/elim Map(F, d, r) ⌘ P ipe(Decomp(d)), F arm(F ), Recomp(r)) data2stream 0 S1 ⌘ Map(S1 , d, r) map intro/elim Figure 3.3: Some Standard Skeleton Equivalences Farm   The following describes each of the patterns in turn: • a MAP is made up of three OPERATIONs: a worker, a partitioner, and a combiner, followed by an INPUT; • a SEQ is made up of a single OPERATION denoting the sequential computation to be performed, followed by an INPUT; • a FARM is made up of a single OPERATON denoting the working, an INPUT 44  
  • 28. Image  Processing  Example   Read  Image  1   Read  Image  2   {  Build            Stuff  }     “White   screening”   {  Build   Merge   Images   {  Build          Stuff  }   Write  Image   45  
  • 29. Basic  Erlang  Structure   [ writeImage(convertMerge(readImage(X))) ! ! ! ! ! !|| X - Images() ]! ! readImage({In1, in2, out) -! !…! !{ Image1, Image2, out}.! ! convertImage({Image1, Image2, out}) -! !Image1P = whiteScreen(Image1),! !Image2P = mergeImages(Image1, Image2),! !{Image2P, out}.! ! writeImage({Image, Out}) - …!     46  
  • 30. Refactoring  Demo   47  
  • 31. Refactoring  Demo   48  
  • 32. Speedup  Results  (Image  Processing)   Speedup Speedups for Haar Transform (Skel Task Farm) 24 22 20 18 16 14 12 10 8 6 4 2 1 1D Skel Task Farm 1D Skel Task Farm with Chunk Size = 4 2D Skel Task Farm 12 4 8 12 16 20 No. Farm Workers 24 50  
  • 33. Large-­‐Scale  Demonstrator  Applica2ons   §  ParaPhrase  tools  are  being  used  by  commercial/end-­‐user  partners   §  SCCH  (SME,  Austria)   §  Erlang  Solu9ons  Ltd  (SME,  UK)   §  Mellanox  (Israel)   §  ELTESos,  Hungary  (SME)   §  AGH  (University,  Poland)   §  HLRS  (High  Performance  Compu9ng  Centre,  Germany)  
  • 34. Speedup  Results  (demonstrators)   Speedup Speedups for Ant Colony, BasicN2 and Graphical Lasso 24 22 20 18 16 14 12 10 8 6 4 2 1 BasicN2 BasicN2 Manual Graphical Lasso Graphical Lasso Manual Ant Colony Optimisation Manual Ant Colony Optimisation Speedup  close  to   or  beHer  than   manual   op9misa9on   1 2 4 6 8 10 12 14 16 18 20 22 24 No of Workers 55  
  • 35. Bow2e2:  most  widely  used  DNA   alignment  tool   28 30 26 Speedup Speedup 24 22 20 25 20 18 16 15 Bt2FF-pin+int Bt2 14 20 30 40 50 60 70 80 Read Length 90 100 110 Bt2FF-pin+int Bt2 28 30 32 34 Quality 36 38 40 Original   Paraphrase   C.  Misale.  Accelera9ng  Bow9e2   with  a  lock-­‐less  concurrency   approach  and  memory  affinity.   IEEE  PDP  2014.  To  appear.   56  
  • 36. Comparison  of  Development  Times   ge pipeline (k), ates the images the images (F ). tained from the e first farm and o three workers es, and one for e load balancers e, the nature of second stage of first stage takes e takes around n a substantial Convolution Ant Colony BasicN2 Graphical Lasso Man.Time 3 days 1 day 5 days 15 hours Refac. Time 3 hours 1 hour 5 hours 2 hours LOC Intro. 58 32 40 53 Figure 3. Approximate manual implementation time of use-cases vs. refactoring time with lines of code introduced by refactoring tool linear scaling for higher numbers of cores, because of cache synchronisation (disjunct but interleaving memory regions are updated in the tasks), and an uneven size combined with a limited number of tasks (48). At the end of the computation, 58   some cores will wait idly for the completion of remaining
  • 37. Heterogeneous  Parallel  Programming   Profile# Informa'on* 1.#Iden(fy* Applica'on* Structured*Code* Ini'al*Structure* …* Int*main*()*…* For*(int*I*=0;*I**N;*i++)** **f*(*g*(x));* …* Config.*1* 2.#Enumerate## Skeleton* Configura'ons* Config.*2* 3.#Filter# Using*Cost* Model* Pipeline* 4.*Apply*MCTS* …* …* Op'mal*Parallel**Configura'on* With*Mappings# Refactorer* with*Mappings# CPU* 7.#Execute# …* Int*main*()*…* Farm1*=*Farm(f,*8,*2);* Pipe(farm1,*GPU(g));* …* * GPU* Config.*2(a)* 5.#Choose#Op'mal* Mapping/Configura'on* Heterogeneous*Machine# CPU* Config.*1(b)* Profile# Informa'on* GPU* CPU* Component* Component* CPU* Config.*2* Config.*3* Config.*1(a)* Farm* Config.*1* GPU* Refactorer* 6.#Refactor# Applica'on* CPU* CPU* CPU* [RGU  /   USTAN]  
  • 38. Example:  Enumerate  Skeleton  Configura2ons   for  Image  Convolu2on   Δ(r  p) r || Δ(p) Δ(r) p r p r || p Δ(r) Δ(p) r  :  read  image  file   p  :  process  image  file   r  Δ(p)
  • 39. Results  on  Benchmark:  Image  Convolu2on   MCTS  Mapping  (C,  G):        (6,  0)  ||  (0,  3)   Speedup  39.12     Best  Speedup:  40.91  
  • 40. Conclusions   §  The  manycore  revolu9on  is  upon  us   §  Computer  hardware  is  changing  very  rapidly   (more  than  in  the  last  50  years)   §  The  megacore  era  is  here  (aka  exascale,  BIG  data)   §  Heterogeneity  and  energy  are  both  important   §  Most  programming  models  are  too  low-­‐level   §  concurrency  based   §  need  to  expose  mass  parallelism   §  Paberns  and  func:onal  programming  help  with  abstrac9on   §  millions  of  threads,  easily  controlled  
  • 41. Conclusions  (2)   §  Func9onal  programming  makes  it  easy  to  introduce  parallelism   §  No  side  effects  means  any  computa9on  could  be  parallel   §  Matches  pabern-­‐based  parallelism   §  Much  detail  can  be  abstracted   §  Lots  of  problems  can  be  avoided   §  e.g.  Freedom  from  Deadlock   §  Parallel  programs  give  the  same  results  as  sequen9al  ones!   §  Automa9on  is  very  important   §  Refactoring  drama9cally  reduces  development  9me   (while  keeping  the  programmer  in  the  loop)   §  Machine  learning  is  very  promising  for  determining  complex  performance  sewngs      
  • 42. But  isn’t  this  all  just  wishful  thinking?   Rampant-­‐Lambda-­‐Men  in  St  Andrews   66  
  • 43. NO!   §  C++11  has  lambda  func9ons  (and  some  other  nice  func9onal-­‐ inspired  features)   §  Java  8  will  have  lambda  (closures)   §  Apple  uses  closures  in  Grand  Central  Dispatch   67  
  • 44. ParaPhrase  Parallel  C++  Refactoring   §  Integrated  into  Eclipse   §  Supports  full  C++(11)  standard   §  Uses  strongly  hygienic  components   §  func9onal  encapsula9on  (closures)   68  
  • 45. Image  Convolu2on   Componentff_im genStage(generate); Componentff_im filterStage(filter); for(int i = 0; iNIMGS; i++) { r1 = genStage.callWorker( new ff_im(images[i])); results[i] = filterStage.callWorker( new ff_im(r1)); } Step%1:%Introduce%Components% ff_farm gen_farm; gen_farm.add_collector(NULL); std::vectorff_node* gw; for (int i=0; inworkers; i++) gw.push_back(new gen_stage); gen_farm.add_workers(gw); ff_farm filter_farm; filter_farm.add_collector(NULL); std::vectorff_node* gw2; for (int i=0; inworkers2; i++) gw2.push_back(new CPU_Stage); filter_farm2.add_workers(gw2); StreamGen streamgen(NIMGS,images); ff_pipeline pipe; pipe.add_stage(streamgen); pipe.add_stage(gen_farm); pipe.add_stage(filter_farm); Step%2:%Introduce%Pipeline% ff_pipeline pipe; StreamGen streamgen(NIMGS,images); pipe.add_stage(streamgen); pipe.add_stage(new genStage); pipe.add_stage(new filterStage); pipe.run_and_wait_end(); ff_farm gen_farm; gen_farm.add_collector(NULL); std::vectorff_node* gw; for (int i=0; inworkers; i++) gw.push_back(new gen_stage); gen_farm.add_workers(gw); ff_pipeline pipe; StreamGen streamgen(NIMGS,images); pipe.add_stage(streamgen); pipe.add_stage(gen_farm); pipe.add_stage(new filterStage); pipe.run_and_wait_end(); pipe.run_and_wait_end(); Step%4:%Introduce%Farm% Step%3:%Introduce%Farm% 69  
  • 46. Refactoring  C++  in  Eclipse   70  
  • 47. Funded  by   •  ParaPhrase  (EU  FP7),  Pa:erns  for  heterogeneous  mul2core,     €4.2M,  2011-­‐2014     •  •  SCIEnce  (EU  FP6),  Grid/Cloud/Mul2core  coordina2on   •  €3.2M,  2005-­‐2012     Advance  (EU  FP7),  Mul2core  streaming   •  €2.7M,  2010-­‐2013   •  HPC-­‐GAP  (EPSRC),  Legacy  system  on  thousands  of  cores   •  £1.6M,  2010-­‐2014   •  Islay  (EPSRC),  Real-­‐2me  FPGA  streaming  implementa2on   •  £1.4M,  2008-­‐2011   •  TACLE:  European  Cost  Ac2on  on  Timing  Analysis   •  €300K,  2012-­‐2015   74  
  • 48. Some  of  our  Industrial  Connec2ons   Mellanox  Inc.   Erlang  Solu9ons  Ltd   SAP  GmbH,  Karlsrühe   BAe  Systems   Selex  Galileo   BioId  GmbH,  Stubgart   Philips  Healthcare   Sosware  Competence  Centre,  Hagenberg   Microsos  Research   Well-­‐Typed  LLC     75  
  • 49. ParaPhrase  Needs  You!   •  Please  join  our  mailing  list   and  help  grow  our  user  community   §  §  §  §  §  §  •  news  items   access  to  free  development  sosware   chat  to  the  developers   free  developer  workshops   bug  tracking  and  fixing   Tools  for  both  Erlang  and  C++   Subscribe  at   hbps://­‐   lis9nfo/paraphrase-­‐news   •  •  We’re  also  looking  for  open  source   developers...   We  also  have  8  PhD  studentships...   76  
  • 50. Further  Reading   Chris  Brown.  Vladimir  Janjic,  Kevin  Hammond,  Mehdi  Goli  and  John  McCall   “Bridging  the  Divide:  Intelligent  Mapping  for  the  Heterogeneous  Parallel  Programmer”,   Submi?ed  to  IPDPS  2014   Chris  Brown.  Marco  Danelu:o,  Kevin  Hammond,  Peter  Kilpatrick  and  Sam  Elliot   “Cost-­‐Directed  Refactoring  for  Parallel  Erlang  Programs”   To  appear  in  InternaGonal  Journal  of  Parallel  Programming,  2013   Vladimir  Janjic,  Chris  Brown.  Max  Neunhoffer,  Kevin  Hammond,  Steve  Linton  and  Hans-­‐ Wolfgang  Loidl   “Space  Explora2on  using  Parallel  Orbits”   Proc.  PARCO  2013:  Interna2onal  Conf.  on  Parallel  Compu2ng,  Munich,  Sept.  2013   Ask  me  for  copies!   Chris  Brown.  Hans-­‐Wolfgang  Loidl  and  Kevin  Hammond   Many  technical   “ParaForming  Forming  Parallel  Haskell  Programs  using   efactoring  Techniques”   results  011  Trends  he   uncGonal  Programming  (TFP),  MNovel  Rpain,  May  2011   also  on  t in  F Proc.    2 adrid,  S project  web  site:   Henrique   ownload!   free  for  dFerreiro,  David  Castro,  Vladimir  Janjic  and  Kevin  Hammond   “Repea2ng  History:  Execu2on  Replay  for  Parallel  Haskell  Programs”   Proc.  2012  Trends  in  FuncGonal  Programming  (TFP),  St  Andrews,  UK,  June  2012  
  • 51. In Preparation
  • 52. THANK  YOU!!!  @paraphrase_fp7   80