IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

3,328 views

Published on

More at http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009

Note that some slides were borrowed from Matthew Bolitho (John Hopkins) and NVIDIA.

  • Be the first to comment

  • Be the first to like this

IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

  1. 1. 6.963 IT / A@M CUD 9 IAP0 Supercomputing on your desktop: Programming the next generation of cheap and massively parallel hardware using CUDA Lecture 03 Nicolas Pinto (MIT) CUDA - Basics #2 Tuesday, January 13, 2009
  2. 2. During this course, 3 6 for 6.9 ed adapt we’ll try to “ ” and use existing material ;-) Tuesday, January 13, 2009
  3. 3. Today yey!! Tuesday, January 13, 2009
  4. 4. 6.963 IT / A@M CUD 9 IAP0 Language Compilation API Threading Model Memory Model Tuesday, January 13, 2009
  5. 5. 6.963 IT / A@M CUD 9 IAP0 CUDA Language Tuesday, January 13, 2009
  6. 6. age gu an L !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04% ! !5!66 $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.% ! <4&'%04%!quot;#$ ! ='++'*+%-',3*)*.%</3:' !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  7. 7. age gu an L !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04% ! !5!66 !quot;#$%&$'()*$'+',,$-%../0/12$.0quot;3$$ ! &241-40-$'+',,5 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  8. 8. age gu an L !quot;#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04% ! !5!66 >9*0,<0)<%';0'*+)4*+? ! ! #'<-,3,0)4*%@/,-)()'3+ ! A/)-0B)*%C,3),D-'+ ! A/)-0B)*%E98'+ ! F;'</0)4*%!4*()./3,0)4* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  9. 9. age gu an L #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*% ! H/,-)()'3 ! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(? ! C,3),D-'+ ! I/*<0)4*+ F;,28-'+?%%!quot;#$%J%&'%&(#J%$%)%*! ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  10. 10. uage ang L !quot;#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*% ! H/,-)()'3+%(43%:,3),D-'+? ++,&-*!&++ ! ++$.)(&,++ ! ++!quot;#$%)#%++ ! K*-9%,88-9%04%.-4D,-%:,3),D-'+ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  11. 11. age gu an L !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1( ! )*quot;(0quot;./#quot; 2*quot;(0%)%(&quot;'/0quot;'(/1(+$,-%$(3quot;3,&4 ! 5%'($/6quot;)/3quot;(,6()*quot;(quot;1)/&quot;(%77$/#%)/,1 ! 8##quot;''/-$quot;(),(%$$(9:;()*&quot;%0' ! 8##quot;''/-$quot;(),()*quot;(<:;(./%(8:= ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  12. 12. age gu an L !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1( ! )*quot;(0quot;./#quot; 2*quot;(0%)%(&quot;'/0quot;'(/1('*%&quot;0(3quot;3,&4 ! 5%'($/6quot;)/3quot;(,6()*quot;()*&quot;%0(-$,#> ! 8##quot;''/-$quot;(),(%$$()*&quot;%0'?(,1quot;(#,74(7quot;&()*&quot;%0( ! -$,#> !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  13. 13. age gu an L =6(1,)(0quot;#$%&quot;0(%'(!quot;#$%&#'?(&quot;%0'(6&,3( ! 0/66quot;&quot;1)()*&quot;%0'(%&quot;(1,)(./'/-$quot;(@1$quot;''(%( '41#*&,1/A%)/,1(-%&&/quot;&(@'quot;0 B,)(%##quot;''/-$quot;(6&,3(<:; ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  14. 14. age gu an L !quot;#$%&quot;'()*%)(%(+$,-%$(.%&/%-$quot;(/'('),&quot;0(,1( ! )*quot;(0quot;./#quot; 2*quot;(0%)%(&quot;'/0quot;'(/1(#,1')%1)(3quot;3,&4 ! 5%'($/6quot;)/3quot;(,6(quot;1)/&quot;(%77$/#%)/,1 ! 8##quot;''/-$quot;(),(%$$(9:;()*&quot;%0'(C&quot;%0(,1$4D ! 8##quot;''/-$quot;(),(<:;(./%(8:=(C&quot;%0EF&/)quot;D ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  15. 15. uage ang L <;!8(@'quot;'()*quot;(6,$$,F/1+(0quot;#$'7quot;#' 6,&( ! .%&/%-$quot;'G (()'!&*'(( ! ((+quot;,%(( ! ((-#quot;.$#(( ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  16. 16. age gu an L !quot;#$%&quot;'()*%)(%(6@1#)/,1(/'(#,37/$quot;0(),?(%10( ! quot;Hquot;#@)quot;'(,1()*quot;(0quot;./#quot; <%$$%-$quot;(,1$4(6&,3(%1,)*quot;&(6@1#)/,1(,1()*quot;( ! 0quot;./#quot; !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  17. 17. age gu an L !quot;#$%&quot;'()*%)(%(+,-#)./-(.'(#/01.$quot;2()/(%-2( ! quot;3quot;#,)quot;'(/-()*quot;(*/') 4%$$%5$quot;(/-$6(+&/0(%-/)*quot;&()*quot;(*/') ! 7,-#)./-'(8.)*/,)(%-6(49!:(2quot;#$'1quot;# %&quot;( ! */')(56(2quot;+%,$) 4%-(,'quot;(!!quot;#$%!! %-2(!!&'()*'!!+ ! )/;quot;)*quot;& !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  18. 18. age gu an L !quot;#$%&quot;'()*%)(%(+,-#)./-(.'(#/01.$quot;2()/(%-2( ! quot;3quot;#,)quot;'(/-()*quot;(2quot;<.#quot; 4%$$%5$quot;(+&/0()*quot;(*/') ! 9'quot;2(%'()*quot;(quot;-)&6(1/.-)(+&/0(*/')()/(2quot;<.#quot; ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  19. 19. age gu an L 49!:(1&/<.2quot;'(%('quot;)(/+(5,.$)=.-(<quot;#)/&()61quot;'> ! *quot;,-./+0*quot;,-./+*quot;,-1/+0*quot;,-1/+*quot;,-2/+ ! 0*quot;,-2/+*quot;,-3/+0*quot;,-3/+ $quot;#-%./+0$quot;#-%./+$quot;#-%1/+0$quot;#-%1/+ ! $quot;#-%2/+0$quot;#-%2/+$quot;#-%3/+0$quot;#-%3/ )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+ ! 0)4%2/+)4%3/+0)4%3/+ 5#46./+05#46./+5#461/+05#461/+5#462/+ ! 05#462/+5#463/+05#463/+ 75#,%./+75#,%1/+75#,%2/+75#,%3+ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  20. 20. age gu an L 4%-(#/-')&,#)(%(<quot;#)/&()61quot;(8.)*('1quot;#.%$( ! +,-#)./-> 8,9'!!quot;#$%&'(%):(;/+(.!quot;#$ 4%-(%##quot;''(quot;$quot;0quot;-)'(/+(%(<quot;#)/&()61quot;(8.)*( ! !quot;#$%&!quot;'$%&!quot;($%&!quot;)$* ('*(,-<= !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  21. 21. age gu an L &)82 .'(%('1quot;#.%$(<quot;#)/&()61quot; ! ?%0quot;(%'(0)4%2@(quot;3#quot;1)(#%-(5quot;(#/-')&,#)quot;2( ! +&/0(%('#%$%&()/(+/&0(%(<quot;#)/&> :$*,5,-/+./+.> !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  22. 22. age gu an L 49!:(1&/<.2quot;'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$quot;' ! %quot;-',&?&=@(@5#*9?&=@(@5#*9A)8@( ! 6-)&A)8 +',-.&/0&/&1&)822&34&10)4%22& ! :##quot;''.5$quot;(/-$6(+&/0(2quot;<.#quot;(#/2quot; ! 4%--/)()%Aquot;(%22&quot;'' ! 4%--/)(%''.;-(<%$,quot; ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  23. 23. uage ang L !quot;#$%&'()*+,-%-./0120*2%-341'%0(%513/26%06,% ! ,7,230*(/%(8%9,'/,5- !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./ !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./ !quot;#$%%%&'()*(+,-./0$1*(+!!!quot;#$%&'()*+,-./ !quot;#$ *-%1%%%&'()*'%%+83/20*(/ ! @6,%2(>&*5,'%03'/-%06*-%0.&,%(8%-010,>,/0% ! */0(%1%=5(29%(8%2(+,%0610%2(/8*43',-A%1/+% 513/26,-%06,%9,'/,5 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  24. 24. age gu an L !quot;#$%+,8*/,-%1%51/4314,%0610%*-%-*>*51'%0(% ! !B!CC D>&('01/0%#*88,',/2,-E ! ! F3/0*>,%G*='1'. ! H3/20*(/- ! !51--,-A%I0'320-A%quot;/*(/- !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  25. 25. uage ang L !quot;#$%&'(&%)*+,%quot;+%&'$%#$-./$%/(+0&%*,$%+quot;)1(2% ! !quot;!##$%&'()*+$,)-./.0$1&'2()3'4 53$!quot;#$%&6$&quot;'()6$*(++,-6$+(2 ! 734($*/(8$1&'2()3'4$8/9+$:+9)2+$+;&)9/<+'( ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  26. 26. uage ang L J'$/$!DFG$:+9)2+6$(8+.+$)4$'3$4(/2K ! L0$:+1/&<(6$/<<$1&'2()3'$2/<<4$/.+$)'<)'+: ! !/'$&4+$!!quot;#$quot;%$quot;&!! (3$>.+9+'($M!DFG$HIHN ! G<<$<32/<$9/.)/-<+46$1&'2()3'$/.E&*+'(4$/.+$ ! 4(3.+:$)'$.+E)4(+.4 ! '( 1&'2()3'$.+2&.4)3' 53$1&'2()3'$>3)'(+.4 ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  27. 27. uage ang L !DFG$4&>>3.(4$43*+$!##$1+/(&.+4$13.$:+9)2+$ ! 23:+I$$OIE? ! =+*></(+$1&'2()3'4 !</44+4$/.+$4&>>3.(+:$)'4):+$I2&$43&.2+6$-&($ ! *&4($-+$834($3'<0 P(.&2(4quot;D')3'4$A3.K$3'$:+9)2+$23:+$/4$>+.$! ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  28. 28. Common Runtime Component: age angu Mathematical Functions L • pow, sqrt, cbrt, hypot • exp, exp2, expm1 • log, log2, log10, log1p • sin, cos, tan, asin, acos, atan, atan2 • sinh, cosh, tanh, asinh, acosh, atanh • ceil, floor, trunc, round • Etc. – When executed on the host, a given function uses the C runtime implementation if available – These functions are only supported for scalar types, not vector types 16 !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  29. 29. Device Runtime Component: uage ang Mathematical Functions L • Some mathematical functions (e.g. sin(x)) have a less accurate, but faster device-only version (e.g. __sin(x)) – __pow – __log, __log2, __log10 – __exp – __sin, __cos, __tan 17 !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  30. 30. 6.963 IT / A@M CUD 9 IAP0 CUDA Compilation Tuesday, January 13, 2009
  31. 31. tion pila m Co !quot;#$%&'()*+%,-.+&%+/0%-/%12*(3 ! !quot;#$%&#'%'(&)'quot;*'+,-&.,'%#+'/quot;0$'.quot;+,1+%$% ! !quot;(2&3,+'45'!quot;## ! !quot;## &0'6,%335'%'76%22,6'%6quot;8#+'%'(quot;6,' ! .quot;(23,)'.quot;(2&3%$&quot;#'26quot;.,00 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  32. 32. tion pila m Co !quot;#$% ! 9quot;6(%3':.;':.22 0quot;86.,'*&3,0 ! !<=>':.8'0quot;86.,'.quot;+,'*&3,0 &$%#$% ! ?4@,.$1,),.8$%43,'.quot;+,'*quot;6'/quot;0$ ! :.84&# ,),.8$%43,'.quot;+,'*quot;6'$/,'+,-&., !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  33. 33. tion pila m Co Aquot;6':.'%#+':.22 *&3,0;'#-.. &#-quot;B,0'$/,'#%$&-,' ! !1!CC'.quot;(2&3,6'*quot;6'$/,'050$,('D,EF'E..1.3G 4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0: ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  34. 34. tion pila m Co '($ .22 '($ '( '* .8+%*, .22 3&#B,6 '.%$,'( '* .22 3&#B,6 ')#$'( '#%+ '($,-quot; #-quot;2,#.. 2$)%0 .84&# !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  35. 35. tion pila m Co Hquot;'0,,'$/,'0$,20'2,6*quot;6(,+'45'#-..;'80,'$/,' ! //0121$quot; %#+'//344#5.quot;((%#+'3&#,'quot;2$&quot;#0 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  36. 36. tion pila m Co !quot;#$%&$'$()*+%, -%,./0$#%12$12/$3/&1$quot;4$12/$ ! 53quot;63'78 9',$+/: ! ! ;quot;'0/0$'&$'$4%-/$'1$3*,1%7/ ! <7+/00/0$%,$0'1'$&/67/,1 ! <7+/00/0$'&$'$3/&quot;*3)/ !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  37. 37. tion pila m Co !quot;#$quot;%&&'($)'*)+,(-),(.'/0 ! ! =2/$53quot;63'7$)3'&2/& ! >1$53quot;0*)/&$12/$#3quot;,6$3/&*-1 !0 ?*1@$12/3/$'3/$7',A$0/+*66%,6$1/)2,%B*/& ! ! C/+*66%,6$&quot;41#'3/$D/6:$60+@$E%&*'-$F1*0%quot;G ! !quot;#$%& !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  38. 38. bug De 9HCI$53quot;63'77%,6$%&$/J/,$-/&&$4*, ! ! =2/3/$%&$,quot;$0/+*66/3 ! =2/3/$%&$,quot;$!quot;#$%& C/+*66%,6$)quot;0/$quot;,$12/$0/J%)/$%&$J/3A$2'30 ! ! 9',$13A$1quot;$#3%1/$%,1/37/0%'1/$3/&*-1&$1quot;$7/7quot;3A$ ',0$)quot;5A$+').$1quot;$2quot;&1$1quot;$/K'7%,/ ! <7*-'1%quot;,$7quot;0/ !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  39. 39. mu E !quot;#$%&'(#)#*+,-&./0#1.)(2#quot;+$#*)'#/,$.)3/#4quot;# ! 0$''&'(#!quot;quot; *+5/#+'#36/#6+%3 ! 7+,-&./0#8.)(9 ##$%&'(%#%)*quot;!+',- :++5#1+0#,+%3#5/4$((&'(9#*)'#$%/#(54;-0&'31 ! <+3#)#30$/#/,$.)3&+'9 ! ! =)*/#7+'5&3&+'%2#>/,+0quot;#,+5/.#5&11/0/'*/%2#/3* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  40. 40. mu E Device Emulation Mode Pitfalls • Emulated device threads execute sequentially, so simultaneous accesses of the same memory location by multiple threads could produce different results. • Dereferencing device pointers on the host or host pointers on the device can produce correct results in device emulation mode, but will generate an error in device execution mode !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  41. 41. mu E Floating Point • Results of floating-point computations will slightly differ because of: – Different compiler outputs, instruction sets – Use of extended precision for intermediate results • There are various options to force strict single precision on the host !quot;#$%&'quot;(&)*+,-.#./quot;$0'quot;120342&quot;15quot;678 9):$0$;quot;.<<&0=&>;quot;/8?8>@quot;AB3CC;quot;CDDB Tuesday, January 13, 2009
  42. 42. lkit oo T CUDA Toolkit Application Software Industry Standard C Language Libraries !quot;%&'( !quot;##$ !quot;)** CUDA Compiler CUDA Tools GPU:card, system + !quot;#$#%& '()*++(#,,*-./01- Multicore CPU 4 cores 3 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  43. 43. lkit oo T CUDA Many-core + Multi-core support C CUDA Application NVCC NVCC --multicore Many-core Multi-core PTX code CPU C code PTX to Target gcc and Compiler MSVC Many-core Multi-core 5 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  44. 44. lkit oo T CUDA Compiler: nvcc Any source file containing CUDA language extensions (.cu) must be compiled with nvcc NVCC is a compiler driver Works by invoking all the necessary tools and compilers like cudacc, g++, cl, ... NVCC can output: Either C code (CPU Code) That must then be compiled with the rest of the application using another tool Or PTX or object code directly An executable with CUDA code requires: The CUDA core library (cuda) The CUDA runtime library (cudart) 6 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  45. 45. lkit oo T CUDA Compiler: nvcc Important flags: -arch sm_13 Enable double precision ( on compatible hardware) -G Enable debug for device code --ptxas-options=-v Show register and memory usage --maxrregcount <N> Limit the number of registers -use_fast_math Use fast math library 7 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  46. 46. lkit oo T Compiling CUDA for Multi-Core Using “—multicore” compile C/C++ CUDA switch with the NVCC Application compiler generates C code for multi-core CPU NVCC --multicore Performance scales linearly with more cores Multicore CPU C Code Control numbers of cores with environment variable CUDA_NROF_CORES=n gcc / MSVC Multicore Optimized Application 8 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  47. 47. lkit oo T GPU Tools Profiler Available now for all supported OSs Command-line or GUI Sampling signals on GPU for: Memory access parameters Execution (serialization, divergence) Debugger Runs on the GPU Emulation mode Compile and execute in emulation on CPU Allows CPU-style debugging in GPU source 35 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  48. 48. 6.963 IT / A@M CUD 9 IAP0 CUDA API Tuesday, January 13, 2009
  49. 49. PI A !Aquot;(DGHI(IMK(71/'.'$'(19($A&quot;quot;(B*&$'2 ! ! !Aquot;(A1'$(IMK ! !Aquot;(-quot;F.7quot;(IMK ! !Aquot;(71))1/(IMK !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  50. 50. PI A !quot;#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08 ! ! '#127#$9:6:;#9#6, ! <#9*0=$9:6:;#9#6, ! >,0#:9$9:6:;#9#6, ! ?1#6,$9:6:;#9#6, ! !#@,50#$9:6:;9#6, ! A/#6BCD'20#7,E$26,#0*/#0:F2G2,= !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  51. 51. PI A !quot;#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$ ! !quot;#$%& ! !quot;#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J ! !quot;#$quot;2;quot;$G#1#G$K56,29#$(-.$I/0#42@8$753:J >*9#$,quot;26;+$7:6$F#$3*6#$,quot;0*5;quot;$F*,quot;$(-.+L$ ! *,quot;#0+$:0#$+/#72:G2M#3 ! %:6$F#$92@#3$,*;#,quot;#0$IH2,quot;$7:0#J !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  52. 52. PI A (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127# ! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$ ! ,quot;#$quot;:03H:0#L$H#$6##3$:$!quot;#$%quot;&%'()quot;*) '#127#$7*6,#@,+$:0#$F*563$N8N$H2,quot;$quot;*+,$ ! ,quot;0#:3+$IO5+,$G2P#$A/#6BCQJ ! >*L$#:7quot;$quot;*+,$,quot;0#:3$9:=$quot;:1#$:,$9*+,$*6#$3#127#$ 7*6,#@, ! (63L$#:7quot;$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$ *6#$quot;*+,$,quot;0#:3 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  53. 53. PI A (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$ ! 7*3#$*4$,=/#8$+,-quot;./0) ! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$ 7*3#$*4$,=/#$%/!12--'-3) (6$26,#;#0$1:G5#$H2,quot;$M#0*$R$6*$#00*0 ! %/!14quot;)51.)2--'-L$%/!14quot;)2--'-6)-$(7 ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  54. 54. PI A K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M# ! '#127#$(-.$7:GG+$95+,$7:GG$%/8($) ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  55. 55. PI A !quot;#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,quot;#$ ! :1:2G:FG#$3#127#+ %/9quot;#$%quot;4quot;)+'/() ! %/9quot;#$%quot;4quot;) ! %/9quot;#$%quot;4quot;):1;quot; ! %/9quot;#$%quot;4quot;)<')10=quot;;'-> ! %/9quot;#$%quot;4quot;)?))-$@/)quot; ! ! ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  56. 56. PI A !quot;#$%&$%#'(()$%*%+$,-#$%&-.'%!quot;#$%&!$'$( ! &$%/$.%*%+$,-#$%'*quot;+0$%(1%.23$%)*+$%&!$ 4*quot;%quot;(&%#5$*.$%*%#(quot;.$6.%&-.'%!quot;)(,)-$.($ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  57. 57. PI A 78quot;.-9$%:;<%35(,-+$)%*%)-930-1-$+%-quot;.$51*#$% ! 1(5%#5$*.-quot;/%*%#(quot;.$6.= !quot;+.'$(#$%&!$)/quot;0( ! !quot;+.1$(#$%&!$ ! :quot;+%.'$%8)$180= ! !quot;+.)2//3$#$%&!$ ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  58. 58. PI A Device Management CPU can query and select GPU devices cudaGetDeviceCount( int* count ) cudaSetDevice( int device ) cudaGetDevice( int *current_device ) cudaGetDeviceProperties( cudaDeviceProp* prop, int device ) cudaChooseDevice( int *device, cudaDeviceProp* prop ) Multi-GPU setup: device 0 is used by default one CPU thread can control one GPU multiple CPU threads can control the same GPU – calls are serialized by the driver 28 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  59. 59. PI A !quot;#$%&$%'*,$%*%#(quot;.$6.%>)*!/0($,(?%#*quot;% ! *00(#*.$%9$9(52@%#*00%*%A;B%18quot;#.-(quot;%$.#C%% ! 4(quot;.$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-quot;/% .'5$*+ D(%)2quot;#'5(quot;-E$%*00%.'5$*+)%>4;B%'().%&-.'% ! A;B%.'5$*+)?%#*00%!quot;)(,140!2-/0&5$ ! F*-.)%1(5%*00%A;B%.*)G)%.(%1-quot;-)'% !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  60. 60. PI A :00(#*.$HI5$$%9$9(52= ! !quot;6$7899/!:;!quot;6$7<-$$ ! <quot;-.-*0-E$%9$9(52= ! !quot;6$73$( ! 4(32%9$9(52= ! !quot;6$7!=4>(/#:;!quot;6$7!=4#(/>:; ! !quot;6$7!=4#(/# !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  61. 61. PI A F'$quot;%*00(#*.-quot;/%9$9(52%1(5%.'$%2/3(@%#*quot;% ! 8)$%!quot;##$% H%&'( H%!!quot;) ! !5%8)$%!quot;6$7899/!>/3(@%!quot;6$7<-$$>/3( D'$)$%18quot;#.-(quot;)%*00(#*.$%'().%9$9(52%.'*.%-)% ! )quot;*'+#$%,'- ;$51(59*quot;#$%-935(,$+%1(5%#(32%.(H15(9% ! 3*/$J0(#G$+%'().%9$9(52 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  62. 62. PI A :00(#*.$HI5$$%9$9(52= ! !quot;+.6.99/!@%!quot;+.<-$$ ! <quot;-.-*0-E$%9$9(52= ! !quot;+.6$73$( ! 4(32%9$9(52= ! !quot;+.6$7!=4 ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  63. 63. PI A !quot;#$%&''(!!quot;#$%quot;&''(%&$#quot;!quot;#$%& )#)(*+ ! ,&-quot;&'.(quot;&''(%&$#quot;%&&%' )#)(*+quot;/012 ! 3**&+.quot;&*#quot;%*#&$#4quot;56$7quot;&quot;.8#%696%quot;564$7quot;&-4quot; ! 7#6:7$quot;&-4quot;#'#)#-$quot;$+8# ! ;#)(*+quot;'&+(<$quot;6.quot;(8$6)6=#4quot;/#>:>quot;8&%?6-:2quot;@+quot; *<-$6)# !quot;&))*+,)$*-$! !quot;&))*+.$/-)(+ ! !quot;#$%!0+.-(&! !quot;#$%!0+1-(&!quot;# ! !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  64. 64. PI A 3quot;)(4<'#quot;6.quot;&quot;@'(@quot;(9quot;ABCquot;%(4#D4&$&quot;&'(-:quot; ! 56$7quot;.()#quot;$+8#quot;6-9(*)&$6(- ! >%<@6- 96'#. 3quot;)(4<'#quot;6.quot;%*#&$#4quot;@+quot;'(&46-:quot;&quot;%<@6- 56$7quot; ! !quot;#(2quot;'$,)$*-$ (*quot;!quot;#(2quot;'$3(*2.*-* ;(4<'#quot;%&-quot;@#quot;<-'(&4#4quot;56$7quot; ! !quot;#(2quot;'$45'(*2 !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  65. 65. PI A E(&46-:quot;&quot;)(4<'#quot;&'.(quot;%(86#.quot;6$quot;$(quot;$7#quot;4#F6%# ! ,&-quot;$7#-quot;:#$quot;$7#quot;&44*#..quot;(9quot;9<-%$6(-.quot;&-4quot; ! :'(@&'quot;F&*6&@'#.G !quot;#(2quot;'$6$-7quot;5!-8(5 !quot;#(2quot;'$6$-6'(9*' !quot;#(2quot;'$6$-:$;<$= !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  66. 66. PI A H-%#quot;&quot;)(4<'#quot;6.quot;'(&4#4!quot;&-4quot;5#quot;7&F#quot;&quot; ! 9<-%$6(-quot;8(6-$#*!quot;5#quot;%&-quot;%&''quot;&quot;9<-%$6(- I#quot;)<.$quot;.#$<8quot;$7#quot;!quot;!#$%&'()!(*&+'(,!(%) ! 96*.$ !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  67. 67. PI A JK#%<$6(-quot;#-F6*(-)#-$quot;6-%'<4#.G ! quot; L7*#&4quot;M'(%?quot;N6=# quot; N7&*#4quot;;#)(*+quot;N6=# quot; O<-%$6(-quot;B&*&)#$#*. quot; A*64quot;N6=# !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  68. 68. PI A L7*#&4quot;M'(%?quot;N6=#Gquot; ! !quot;7quot;5!>$-?'(!@>A*0$ N7&*#4quot;;#)(*+quot;N6=#G ! !quot;7quot;5!>$->A*)$2>8B$ O<-%$6(-quot;B&*&)#$#*.G ! !quot;C*)*%>$->8B$DE!quot;C*)*%>$-8DE !quot;C*)*%>$-=DE!quot;C*)*%>$-F !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  69. 69. PI A !quot;#$%&#'(%#)%)(*%+*%*,(%)+-(%*#-(%+)%*,(% ! ./01*#20%#0321+*#204 !quot;#$quot;%!&'()* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  70. 70. PI A +,!$--. !quot;#$%&#'()*+,#-*#%.quot;#&+*/#01*#2223444# ! '&quot;%0(5quot;#(quot;65%.0(5quot;#758*9.059: 5,(%12-6#7(quot;%8(0(quot;+*()%1+77)%*2%+77%$(3#1(%9:;% ! *2%)(*/6%*,(%(<(1/*#20%(03#quot;20-(0* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  71. 71. PI A 9%)*quot;(+-%#)%+%)(=/(01(%2.%26(quot;+*#20)%*,+*% ! 211/quot;%#0%2quot;$(quot;%%>?8? @? A26B%$+*+%.quot;2-%,2)*%*2%$(3#1( C? ><(1/*(%$(3#1(%./01*#20% D? A26B%$+*+%.quot;2-%$(3#1(%*2%,2)* !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  72. 72. PI A 9%)*quot;(+-%#)%+%)(=/(01(%2.%26(quot;+*#20)%*,+*% ! 211/quot;%#0%2quot;$(quot; E#..(quot;(0*%)*quot;(+-)%1+0%F(%/)($%*2%-+0+8(% ! 1201/quot;quot;(01B%%>?8? G3(quot;7+66#08%-(-2quot;B%126B%.quot;2-%20(%)*quot;(+-% H#*,%*,(%./01*#20%(<(1/*#20%.quot;2-%+02*,(quot; !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  73. 73. PI A <=quot;4$;',&quot;','8,J'3D'+quot;$quot;&:2424H'$Fquot;'E&3H&quot;;;' ! 3D',';$&quot;,: ! !quot;#$%&'()*quot;+,#'-'./-)0#)1'+$'-'&%)#-/'-%'-' ;Equot;#2D2#'E3;2$234 ! -'F3O+quot;&'3D',4'quot;=quot;4$'F,4+Oquot;'#,4) ! P,2$'D3&',4'quot;=quot;4$'$3'3##%& ! Qquot;,;%&quot;'$Fquot;'$2:quot;'$F,$'3##%&&quot;+'Nquot;$8quot;quot;4'$83' quot;=quot;4$; !quot;#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<( =3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0 Tuesday, January 13, 2009
  74. 74. 6.963 IT / A@M CUD 9 IAP0 CUDA Execution and Threading Model Tuesday, January 13, 2009
  75. 75. ing ead Execution Model hr T Software Hardware Threads are executed by thread Thread processors Processor Thread Thread blocks are executed on multiprocessors Thread blocks do not migrate Several concurrent thread blocks can Thread reside on one multiprocessor - limited Multiprocessor Block by multiprocessor resources (shared memory and register file) A kernel is launched as a grid of thread blocks ... Only one kernel can execute on a device at one time Grid Device © 2008 NVIDIA Corporation. Tuesday, January 13, 2009
  76. 76. ding hrea T CUDA Uses Extensive Multithreading • CUDA threads express fine-grained data parallelism – Map threads to GPU threads or CPU vector elements – Virtualize the processors – You must rethink your algorithms to be aggressively parallel • CUDA thread blocks express coarse-grained parallelism – Map blocks to GPU thread arrays or CPU threads – Scale transparently to any number of processors • GPUs execute thousands of lightweight threads – One DX10 graphics thread computes one pixel fragment – One CUDA thread computes one result (or several results) – Provide hardware multithreading & zero-overhead scheduling 9 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  77. 77. ing ead hr T CUDA Programming Model Parallel code (kernel) is launched and executed on a device by many threads Threads are grouped into thread blocks Parallel code is written for a thread Each thread is free to execute a unique code path Built-in thread and block ID variables 4 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  78. 78. ing ead hr T Thread Hierarchy Threads launched for a parallel section are partitioned into thread blocks Grid = all blocks for a given launch Thread block is a group of threads that can: Synchronize their execution Communicate via shared memory 5 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  79. 79. ing ead hr T IDs and Dimensions Threads: Device 3D IDs, unique within a block Grid 1 Blocks: Block Block Block 2D IDs, unique within a grid (0, 0) (1, 0) (2, 0) Dimensions set at launch time Block Block Block (0, 1) (1, 1) (2, 1) Can be unique for each section Built-in variables: Block (1, 1) threadIdx, blockIdx Thread Thread Thread Thread Thread blockDim, gridDim (0, 0) (1, 0) (2, 0) (3, 0) (4, 0) Thread Thread Thread Thread Thread (0, 1) (1, 1) (2, 1) (3, 1) (4, 1) Thread Thread Thread Thread Thread (0, 2) (1, 2) (2, 2) (3, 2) (4, 2) 6 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  80. 80. ing ead hr T Programming Model Host Device A kernel is executed as a Grid 1 grid of thread blocks Block Block Block Kernel A thread block is a batch (0, 0) (1, 0) (2, 0) 1 of threads that can Block Block Block cooperate with each (0, 1) (1, 1) (2, 1) other by: Grid 2 Sharing data through shared memory Kernel 2 Synchronizing their execution Block (1, 1) Threads from different Thread Thread Thread Thread Thread (0, 0) (1, 0) (2, 0) (3, 0) (4, 0) blocks cannot cooperate Thread Thread Thread Thread Thread (0, 1) (1, 1) (2, 1) (3, 1) (4, 1) Thread Thread Thread Thread Thread (0, 2) (1, 2) (2, 2) (3, 2) (4, 2) 3 © NVIDIA Corporation 2006 Tuesday, January 13, 2009
  81. 81. ing ead hr T Blocks must be independent Any possible interleaving of blocks should be valid presumed to run to completion without pre-emption can run in any order can run concurrently OR sequentially Blocks may coordinate but not synchronize shared queue pointer: OK shared lock: BAD … can easily deadlock Independence requirement gives scalability 10 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  82. 82. ing ead hr T Hardware Multithreading Hardware allocates resources to blocks blocks need: thread slots, registers, shared SM memory MT IU blocks don’t run until resources are available SP Hardware schedules threads threads have their own registers any thread not waiting for something can run context switching is free – every cycle Hardware relies on threads to hide latency Shared Memory i.e., parallelism is necessary for performance 39 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  83. 83. ing ead hr T SIMT Thread Execution Groups of 32 threads formed into warps always executing same instruction SM shared instruction fetch/dispatch MT IU some become inactive when code path diverges hardware automatically handles divergence SP Warps are the primitive unit of scheduling SIMT execution is an implementation choice sharing control logic leaves more space for ALUs largely invisible to programmer Shared must understand for performance, not correctness Memory 40 M02: High Performance Computing with CUDA Tuesday, January 13, 2009
  84. 84. ing ead Transparent Scalability hr T Hardware is free to schedule thread blocks on any processor A kernel scales across parallel multiprocessors Kernel grid Device Device Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 0 Block 1 Block 2 Block 3 Block 0 Block 1 Block 6 Block 7 Block 4 Block 5 Block 6 Block 7 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 © 2008 NVIDIA Corporation. Tuesday, January 13, 2009

×