Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Application parallelisation Android - Klaas Vangend

726 views

Published on

Application parallelisation Android - Klaas Vangend

Published in: Devices & Hardware
  • Be the first to comment

  • Be the first to like this

Application parallelisation Android - Klaas Vangend

  1. 1. Appiication paralielization for muiti-core Android devices Klaas van Gend k| aas@vectorfabrics. com LinuxCon I ELC Europe Nov. 6, 2012, Barcelona, Spain _ _ A V “'7, , ,, _ , _, Y/ ‘filtffil ii ‘xiii 1! ii}.
  2. 2. Let me introduce myself + Vector Fabrics Name: Klaas van Gend f “ Founded: 1973 History I Claim to fame: - Started programming C at 14 . - First experience with Linux in 1993 - Co—founder of ELC—Europe -——-3 1— Name: Vector Fabrics Founded:2007 Claim to fame: 2012: Pareon tool - analyse & optimize for multicore - 2 | NoV_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 <, </ Vector Fabrics
  3. 3. Multicore ARM and Android conquer the world G009“? NGXUS 10 Asus Transformer Prime H-I-C J Butterfly 2'C0Ve 33mSU“9 A15 “4”-core Nvidia Tegra3 4_Core Qualcomm Samsung Galaxy Slll Sony Experia P Huawei Honor2 4-core Samsung 2-core sT. Erics5on 4-core Huawei 3 | NoV_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 / ,<>Vect0r Fabrics
  4. 4. M ! :!ai; ifi—-«*: ~i*~lf'e l. t€v; r?ai; ’jiF* ii’! aiv: ‘ai~:1*~i*i-'! ie“= 2 core processors: Assume the OS has multiple processes andlor kernel threads to occupy the two cores. Easy! 4 core processors (and beyond): Requires multi—threaded applications Hard! , To obtain sufficient concurrent workload : To obtain top user experience Who makes such applications? ? 4 | Nov_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 V/ Vector Fabrics
  5. 5. Creating parallel programs is hard. .. Herb Sutter, chair of the ISO C++ standards committee, Microsoft: “Everybody who learns concurrency thinks they understand it, ends up finding mysterious races they thought weren’t possible, and discovers that they didn’t actually understand it yet after all” Steve Jobs, Apple: “The way the processor industry is going, is to add more and more cores, but nobody g knows how to program those things. I mean, two yeah; four not really; eight, forget it. ” 5 | Nov_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 / ,‘>Vect0r Fabrics
  6. 6. Presentation index . Introduction . Multi-threaded concurrency: Data- versus Task-partitioning . Parallelization with dependencies: Reduction expressions or Streaming . Multi-threading: difficult. .. 1 . Android: help from Pareon and Perf . Conclusion 6 | Nov_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 / ,<$/ Vector Fabrics
  7. 7. 1! , I"':4r. JI C I :4 ~51! C 2 I "H C72 L! ,I*‘_If’§-I‘, III' C W Basic fork-join pattern, created through different higher—| eve| programming constructs Main program thread Concurrent computation threads Creation of threads is application responsibility. ‘V; Operating System handles run-time scheduling l ii3'1"- across available processors! Main thread continues 7 | Nov_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 W/ Vector Fabrics
  8. 8. .F”L’«'1_l. _I"5I_ , l?4,lfE1ifi. I:lfl! ; -Jr! /~ : i“rtr'I; » ; .PT. I.. ,I‘lfI; lfl*“; T', IIIII’_ILCI *i§>'I, iI‘lfI*‘ijI'_I‘_t~TI Source code: Sequential execution order: for (i=0; i<4; i++) { A<o> , <1) , (2) , <3) A(i); B(0)/ B(1)/ ’ B(2»' 13(3) 3(1); c<, o$ c(,1’> c(,2’) C(3) C(i); } Data partitioning: i Task partitioning: I 541'"! i i Mo) (1) <2) (3) B(0) B(1) B(2) B(3) T C(0) C(1) C(2) C(3) l . icl'III I 8 | Nov_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 W/ Vector Fabrics
  9. 9. Issue: Data dependencies & (3) 13(0) 13(1) B(2) B(3) c(o) c(1) C(2) C(3) Maybe, B (i) produces a value that is used by A(i+1) Adjust program source for parallelization: a When feasible, remove inter—thread data dependencies o Implement required data synchronization 3 i LinuxCon Euroge I ELC-Euroge 2012
  10. 10. Example Data dependencies Variable assigned in loop body, used in later iteration / / search linked—list for matching items / / save matches in ‘found’ array of pointers for (p = head, n_found = 0; p; p = p—>next) if (match_criterion (p) ) found[n_found++] = p; Cannot (easily/ trivially) spawn data-parrallel tasks! . No direct parallel access to list members *p . No direct way to assign index to matched item n_found . Maybe more problems hidden in match_criterion () 10 Nov_ 6 2012 LinuxCon Europe I ELC—Europe 2012 VectorFabrics
  11. 11. Presentation index . Parallelization with dependencies: Reduction expressions or Streaming . Multi-threading: difficult. .. . Android: help from Pareon and Perf U . Conclusion 11 I Nov_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 <, & Vector Fabrics
  12. 12. Can do: reduction data dependencies . Reduction expressions: accumulate results of loop bodies with commutative operations 0 Freedom of re-ordering allows to break sequential constraints / / conditionally accumulate results int acc = 0; for (i=0; i<N; i++) { int result = some_work(i); if (some condition(i)) acc += result; } . ..use of acc . .. 0 Commutative operations are basic math like +, *, &&, &, II, but also more complex operations like ‘add item to set’. . Three(? ) different methods to handle these 12 Nov_6 2012 LinuxCon Euro e/ ELC-Euro e2012 / & F
  13. 13. Three methods for reduction dependencies 0 Create thread-local copies of the accumulator. Accumulate over local copy in each thread. Merge the partial accumulators after thread—join. Eg. created automatically by: #pragma omp parallel for reduction(. ..) 0 Maintain single accumulator, synchronize updates through atomic operations. Eg. in C11 or C++11: atomic_add_fetch( &acc, result); std: :atomic<int> acc; acc += result; 0 Maintain single accumulator, synchronize updates through protection by acquiring and releasing semaphores. Eg. Used by Intel “Threaded Building Blocks’’ (C++): concurrent_unordered_set<. ..> s; s. insert(. ..); 13 I NoV_ 6, 2012 LinuxCon Europe I ELC—Europe 2012 ‘Vector Fabrics
  14. 14. PAREON: Parallelization Analysis — 1 ‘Labs View Help Close Statuszready My changes " P; p:-tlu " Dynamic Help ‘ ‘select cheat sheet v Property Value Memory dependency cluster 278 (carried by Loop_16429) I‘ ' 'nt'°d“cfi°" 0 OI‘ dme, come, “ 735k W429 (cV: comQu[eDI5Qa, :~ ‘ ‘his guide will walk you through the process of parallelizing a mependende, B _ in yourprogram. - I . . . . _ n h r the helpiconon the right to get rnoreinforrnationclick lgnore Wm‘ d3 P l l n g l ll at lgnme g g y kipieun to skip an optional step or click thezheck icon to "l0°P‘C3"l9dI ll "3”5f9'5/5 ' ' 0 ' the ste as com Ieted and roceed to the next ste . . a Iication 100 0 width P P P P FVOV" ! D°f5WIlT95 ’ [ Ialocpforparallelizarion 9 T° “P°“2'““* means active 100% of run. vanmonins 0 Me"‘°'Yd9Pe"d9"‘Y . na narallialimtlnn Iv zoeroliie " S(h¢d: |c " Ardxitecluro ‘ . ~II int 3 ; + 4 D : : Q Name Coverage Delay '2: ParaIIeI_1642a 94.05!‘ 9 "'5‘rask_1u29 94.05 , . 9'5"’ IT“: _ , _ __ , Ln 7-‘ Matzzpti 0.00 I a_mi. ._. « IL! ' : : ParaI| e|_I 5430 27.44 I: h; I;“j'“‘; " l "ETTask_16431 27.43 "“‘~‘“” '@Laop_417a 0.34 f , vf, memset 0.00 I. " _vf_mem§e[ o_oo . *” T’ I. " Kvtfmemset 0.00 . .. Task_16431 Loop_4‘I 79 ParaI| e|_l 6432 _ _memset 0.00 , ,E, Lmp74179 4809 III _¢_; :- Compute dependency ¢- -— Memory dependency = Streaming pattern _I = Antl~dependericy 9 '5‘ L0op_4215 ‘E7’ Laop_421 6 ,3ParaM Oops, there are moon-» dependencies between I ’ 5‘ Loop_42. , ,,, ,m iterations of Loop_4179! "Hinton 4714 ‘I. gI I pi. “ er gcrcg ircftoini _= |.l'lOl',0l= ' I ; lI_. ‘Iu; tIiin 01:. gcrtg
  15. 15. PAREON: Parallelization Analysis - 2 I "'1 Latisilfiew Help ESQ; ’ L 3 9 d (9 5.; _M_.1_, —,, oop . xspee up. Partitionirigcandidates—Loop_38 Total 2.3x speedup, yay! pg , , , ::'"r-ig-, - . ..euuIeoverview Niimberofthreads| :B I Apply Gobalspeeduix 2.3 Exuaw’ -. tnreads: 3 (Julia! averlieatt 6 96 Thread creation delay: 420 us I. Invocation Speedup Overhead Streams ‘I Loopja 39 1% I p = p->neXt dependency at the beginning of each iteration ' Property Value ‘ Loop_38(sget'f2_) — Loop Loop 38(sgetf2 ' I '' Iteration count 150 Iteration time Iteration statistics “" : ’ , =}.2-'~. ‘«1«, Z; . A / I I A 95/ ‘ ' ' ' ’ ' 4.-Az'»', :$/ z """""""""""""" ‘' Computation time 85.3 us (927 %) l l 7 Memory penalty 6.8 us (73 9(2) Load count 15770 Store count 7802 Instruction count 104658 Mapped tolnstance ARM-A9 _ _ _ Scurcelocation sgetf2.c:141~l85 a Line coverage 791 96 a potential parallelization Uncovered lines 15 I Not/ _ 6, 2012 LinuxCon Europe I ELC—Europe 2012 / ,<§/ Vector Fabrics
  16. 16. Pipelining: Data deps & functional partitioning Functional partitioning with inter-thread dependencies: Producer-Consumer pattern: Application Thread A() Queue Thread B() Queue Thread C() Queue implementation solves dependencies: o Solve Data dependencies: Consumer thread waits for available data (stalls until queue is non-empty) o Solve Anti dependencies: Producer thread creates next item in next memory location (prevents overwriting previous value) 16 | Nov_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 <, & VectorFabrics
  17. 17. FL rl I’ r': z:- f—-rim iii ~: in ii , Multi-threading: difficult. .. . Android: help from Pareon and Perf Conclusion 17 i N01/_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 V/ Vector Fabrics
  18. 18. Concurrent C/ C++ programming: Pitfalls Risc introduction of functional errors: 0 Overlooking use of shared/ global variables (deep down inside called functions, or inside 3rd party library) 0 Overlooking exceptions that are raised and catched outside studied scope o Incorrect use of semaphores: flawed protection, deadlocks Unexpected performance issues: a Underestimation of time spent in added multi-threading or synchronization code and libraries a Underestimation of other penalties in OS and HW (inter-core cache penalties, context switches, clock-frequency reductions) Parallel programming remains hard! 18 i Nov_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 Vector Fabrics
  19. 19. Development of parallel code Guidelines: . Base upon a sequential program: functional and performance reference . Apply higher—| eve| parallelization patterns and primitives: clear semantics, re—use code, reduce risk 0 Use tooling for analysis and verification o Prevent introduction of hard-to-find bugs 0 Prevent recoding effort that does not perform Managable development process! 19 Noi/ _6 2012 LinuxCon Euro e/ ELC-Euro e2012 / & F
  20. 20. Presentation index . Android: help from Pareon and Perf 8 . Conclusion 20 Nov_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 / & Vector Fabrics
  21. 21. Android Application 1: Plain, just Java Many apps have no critical CPU load For now, no Java support in Pareon Android Libraries JNI System Libraries Linux Kernel 21 i Nov_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 Q‘/ Vector Fabrics
  22. 22. Android Application 2: with native libraries Apps can include “native” binary code for best performance Android Libraries JNI System Libraries Your custom native library from C/ C++ source code Trace to Pareon Linux Kernel 22 i NoV_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 / ,<>Vector Fabrics
  23. 23. Android Application 3: NativeActivity “Native activities” are created without Java source code Android Libraries JNI Trace to Pareon Your custom C/ C++ ‘Application’ System Libraries Linux Kernel 23 i NoV_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 / ,<>Vector Fabrics
  24. 24. Application Analysis on Android target Application instrumented Real Target Hardware (ARM) 24 i Nov_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 <, </ Vector Fabrics
  25. 25. System Setup using Android Simulator Application instrumented Android Simulator 25 i N01/_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 <, </ Vector Fabrics
  26. 26. NDK plasma demo app analyzed on Android 26 i NoV_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 Vector Fabrics
  27. 27. Finding data parallelism on Android : ['Labs View Help Close Profile ' Partitions ‘ 2D-Profile ‘V Schedule " pIasma. c ' Partitioning candidates - Loop_313 0 ' l' . CPU data partitioning - vffasks Number ofthreads Global speedup: 24 Extra workerthreads; Global overhead: 1 % Thread creation delay: ii Invocation Speedup Overhead ii Loop_120 4.0 1% Q Loop_189 4.0 1% ii Loop_313 3.9 1% My dlanges ' Property Value compute dependency 313,46 ,9 — Compute dependency — Memory dependency ,9 . _ Streaming pattern Ignore with data partitioning induction expression y = Anti-dEP€"dE”CY SOWC9 _lA| l0]_I, j_: i‘KyI-; I)V= l1'l¢l¢, ‘E-Q= lIfi= iI¥lI= Il| ~1Ell= l(: ) :11-‘1.wIll= lIL1‘f; I|~, "L‘ Operation (+) Loop 313(fill plasma) mu-. .-aunnu mj (: mIdlIL1(: is1(OA0I u: imt= as1L3;«- Ir t-,1,-1:: -’t= urg'--IImt: wILm= mimtarsitsy Location E| a5ma, c,2n ', J(0lIIl, II| |(‘-II'lHg, 'FlIii(HH1131.‘HI((Ill= lIL1f= Jt1£? )flP= |II314IIlilatIIlylllI5l= j9l= lIlil= lIl§”'4l| |~1(HR , , &: ,-v moi Destinations _J 0.) ttloopcarried transfers 68.2 Ki transfers/ s Loop carried yes 2'? l mom a germ‘ l_. lnl»: c$tmi Elliroipxe V _= l_. U4_= !IK0l,0I= 31671;’ 4©'1 V[: tci'(C]i‘r‘; ';ll']i’[«"fi§.
  28. 28. Finding data parallelism on Android : ['Labs View Help Close Profile ' Partitions ‘ 2D-Profile ‘V Schedule " pIasma. c ' Partitioning candidates - Loop_313 0 ' l' . CPU data partitioning - vffasks Number ofthreads Global speedup: 24 Extra workerthreads; Global overhead: 1 % Thread creation delay: ii Invocation Speedup Overhead 9] Loop_120 4.0 1% ii Loop_189 4.0 1% .4 Loop_313 3.9 1:; My dlanges ' Property Value compute dependency 313,46 ,9 — Compute dependency — Memory dependency ,9 E2 Streaming pattern Ignore with data partitioning induction expression y = Anti-dEP€"dE”CY ‘ *I"Y SOWC9 _uIo1g, __: iKI't-inmoiqE-mind: -¥InInL1EInita :11-1§Ill= iIL1‘t; i|~, "L‘ Operation (+) Loop 313(fill plasma) mu-. .-aunnu mj (: iIIIdlIL1(: Is1(OA0I uzimtaasitsyu Ir t-,1,-1:: -’t= urg'-¢II! nt: w-Lexi: ll: |IL1?= lb1L3y Location E| a5ma, c,2n ', J(0iIIl, Ii| l(‘-Iilflfifllii(HIli'l31IK(flKIll= lIL1f= Jt1£? )flhill? »4IIiiIatlIl9ilIl5t= j9i= II(il= lIl§"'dlllxflfik , , &: ,-v moi Destinations _d. 0.) ttloopcarried transfers 68.2 Ki transfers/ s Loop carried yes 21;‘, l Nun at gcrtg‘ l_. lnl»: c$tmi Elliroipxe 1 _= l_. U4_= !IK0l,0I= 31671;’ 4©'1 V[: tci'(C]i‘r‘, —'; ll']i’[«"fi§.
  29. 29. Finding data parallelism on Android : ['Labs View Help Close Profile ' Partitions ‘ 2D-Profile ‘V Schedule " pIasma. c ' Partitioning candidates - Loop_313 0 ' l' . CPU data partitioning - vfiasks Number oftltreads ' Global speedL p: 24 Extra workerthreads; Global overhead: 1 % Thread creation delay: 11 Invocation Speedup Overhead 9] Loop_120 4.0 1% Q Loop_189 4.0 1% ii Loop_313 3.9 1% My dlanges ' Property Value compute dependency 313,46 ,9 — Compute dependency — Memory dependency ,9 . _ Streaming pattern Ignore with data partitioning induction expression 1v = Anti-dEP€"dE”CY SOWC9 _lA9I0]_I, j_§i‘KyI-; IiV= l1'l¢I0,‘E-Q= iIFl= iI¥l! =IIl~1Ell: l(: ) ill-‘1.wIll= iIL1‘t; i|~. "L‘ Operation (+) Loop 313(fill plasma) mu-.1-aunnu mj (: iIIIdlIL1(: Is1(OA0I u: imt= as1L3;«- Ir t-,1,-r: i -’t= urg'--IIunt: is-Ital: uczimtaisstsy Location oiasmactzn ', uoiui, -iuita-‘I-. ig, 'iniita: mat0Km£(Imimf: u»1£1;aInInif~«urinalii91-: uiiqgi: urii= mgr~dIIL1(ar , , &: ,-v cw Destinations _J 0.) itloopcarried transfers 68.2 Ki transfers/ s Loop carried yes 221 i Nun =11 gcrtg I_. ini»: m.m at-. iror, ox=1 _= l_. U4_= !IK0l,0I= alerts’ Vl: tcIi'(c)1'r*. ;-; |g"]i’[«1‘: :.
  30. 30. ' u uifi__, l0Lg_, IiflIi'I ‘u‘b.1‘v. =Is-. _,-1‘v. =Iii= r-, Iut-. . '*' 410143. Ur-iiilur-: ""l ill, __IihImIn1 ‘E-l'2,Ir-. iIt4I_, :m 5' :1NilWl= l'i'llIiilo| '.'g_, io:4(' . "' : ln‘I= ifi'r4‘I'llIIiiIl'. '._. IiIiio: I1il: iuIi| i'oL1 “~'lu‘nis-. ant-‘laniur-. I | 'Ih». '.ll_lIL: 'L}’ " l—""i! I=ll| ' :1!I: Ifi'l: i'I'lluIoioI'. '._, IIuiioIIlZl:1IIo| ' illlflwllo Iurrozzlilm-iiui-. aitiunit-: o' ‘toimtniini llliloiifiltlillml‘-ll11ll$'I I'Ik]g)oI: iI(cIiIL1r. IIl(; Fr-illicit-itim 30 | Nov. 6, 2012 Willi: . V , V . u . i . ..1 , m,1!: ,.! II, :i. ..u-. Rim‘- . -iv; . In I1 dela 1 mini '1'“ '1w<». .«'. a|. xl r. °'-arwra-r‘-V HVVYFVI mi III 1.. .. l'—'. .F1‘, '. ,,. .'1.. '._ . r;t man: may . Par-al| :|_3fi3 : rginc aI. .. A , i ANo. . i , _J__. i '7_4l'L Ti-‘RH , E ;1t‘r-Ii. .. ‘. - ‘Hi. .. ;1t‘I.1'z. ir. Ili'-i___£7fl zldhllfi1'A'ill. |l| i'. '.A, |llll| iA1. — "llIl'l'lI. JIld?1-il"liJI'§' — I'I'. :III-4‘! -la, -:‘-InIaI4a'q' 7 —‘ -1-I-r-‘II'ir_-| ,-r-Iu-in —« ;1Ili‘s. -l. ioiHa‘-l4I[q' Select (left~click] a loop orfunction in the Profile or 2D~Profi| e tab. Right-click i. e. "Show internal dependencies" to view the internal data d pendencies for the selected loop/ function. Select the type of dependencies to display using the check boxes in th legend above. Select (left-click) a dependency to see its properties (load and store op rations. variables. etc). I 1 I ~ LinuxCon Europe / ELC—Europe 2012 §/ Vector Fabrics
  31. 31. Performance Verification For example: PERF ‘flame graph’ ° sampling-based profiling ° multi-thread supprt ° with view into kernel-level _ov_open? R_PaIs UV spew Lfll R, Load. ov I: IpenFIIe l’1l1Vl{1I’_}(> idWaveFile: :.. idImageM. . I lVV'c1/ elllk Ci idMaterI. . IclDec| P.. Iclsoundsample iclMateII. . It‘lDet| P. ILlSuiIIIr1i’atl~e- F IdMateiI. . IclDe(| L.. idsoundshader l4'll)@(lLO I idsoundshader “I. ‘M, II1» 'I. .1 11 V >I IciDeclMa. . idRenderModelM. . IdL)etlManaqeILo. IciRenr‘ieIWo iciRen<1eIMorlelM idDec| ManagerLo. . idReI1deIWoIl. . idRendeIWorldLo. . I : s.11.- III» II I: I. iL'. -I . i.—-Ivv IIIIiI 4: LII: II HKRII ldSoundWoildLoc. . / ~ . I I»- I -I ‘I Irllnteiac . ‘i 1- III I. :k. —III: -I lClSeSSIOl1LOCal‘ TrmeRendeI'[)emo(_i: har R_Ai‘lFl‘_ H AI1*. MI: =r1ulx. . §PS5lOr‘I_Tll11l3DPIT10C)| llt_t(lf‘l(lTIClAl’(: |< R_RendeIVIew(vIewDef. . do_generic. . Ii» : - 15w. "Ii LII -I Ex»-III! »-I'. Ix»>IIzI: I1‘siI1I. III-£I' idRender. . idRenderWoI| dLocal: ... geI1eIit_fi. . ldCmdSy5ternLoca| ::E><ecuteComrnandBuffer() IdSessIonLoc . II-l<o<<II: II: ILcII: aI: -iriaw nit. _pIaIIrIIs I‘: .I. I 'l . ; I1.I I III. iil r.1<ec«: .IrIni Irml Updates II1i omIII. I:mI ()1 .3: II: (‘lO_‘= ynr_I@ ll‘ IIIII III_III. I,»I,3l I ’l"I>4 1’ I III 1. , « 1 . .1» I»—_—. If main </ S_Iead __lllIi_‘Il.1lt_l111l’l , ~:y: mItpI_i1I<pat
  32. 32. I A I ~17 al Lt~, ‘~T‘l -5» Today’s gap: , Mu| ti—core CPUs are everywhere, Yet multi—threaded programming remains hard: , Risk of creating hard-to-locate bugs regarding dynamic data races and semaphore issues , Obtained speedup is lower then expected Asequential functional reference implementation helps to set a baseline for parallelization A Android sets a new record in development complexity _. Proper tooling is needed to save on edit—verify development cycles 32 | NoV_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 V/ Vector Fabrics
  33. 33. Q Li ; :~r‘f til <1 : ~r, t~rs P Today’s gap: : . Mu| ti—core CPUs are eve‘ "= . Yet multi—threaded pry ming , Risk of creating H -to-locai races and semaphore issur , Obtained speedup is loy l Asequential functional reference implementation helps to set a baselinei parallelization ains hard: ugs regarding dynamic data en expected A Android sets a new record in development complexity _. Proper tooling is needed to save on edit—verify development cycles 33 | Nov_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 V/ Vector Fabrics
  34. 34. Check www. vectorfabrics. com for a free demo on concurrency analysis Vector Fabrics Thank you! 34 | Nov_ 6, 2012 LinuxCon Europe / ELC—Europe 2012 Vector Fabrics

×