Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
El proceso social de la definición de la salud y enfermedad
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Talk About Performance

Download to read offline

The talk I did during IT Weekend Rivne event 2 years ago.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Talk About Performance

  1. 1. Talk About Performance @YaroslavBunyak Senior Software Engineer, SoftServe Inc.
  2. 2. What is Performance?
  3. 3. What is a Program? data xform data
  4. 4. What is a Program? data xform data
  5. 5. What is a Program? data xform TH IS ! ! data
  6. 6. What is a Program? data xform data
  7. 7. What is a Program? data xform data
  8. 8. How to Create a Program?
  9. 9. Simple
  10. 10. Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc.
  11. 11. Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code
  12. 12. Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box
  13. 13. Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box <- Righ t?
  14. 14. Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box Wro ng! <- Righ t?
  15. 15. Simple Write code Your favorite programming language: C, C++, Objective-C, Java etc. Compile Compiler will transform your code into machine code Run on target hardware Hardware is a black box
  16. 16. Bad Programs
  17. 17. Bad Programs Sloppy Using the program is like trying to swim in jelly
  18. 18. Bad Programs Sloppy Using the program is like trying to swim in jelly Use memory inefficiently
  19. 19. Bad Programs Sloppy Using the program is like trying to swim in jelly Use memory inefficiently Battery is dead already
  20. 20. Good Programs
  21. 21. Good Programs Run fast
  22. 22. Good Programs Run fast Use little memory
  23. 23. Good Programs Run fast Use little memory Save battery
  24. 24. Good Programs Run fast Use little memory Save battery I w r i te t h e m !
  25. 25. Good Programs Run fast Use little memory Save battery I w r i te t h e m ! I t wa s a jo k e :)
  26. 26. Good Programs Run fast Use little memory Save battery
  27. 27. How to Create a Good Program?
  28. 28. What is a Program? data xform data
  29. 29. What is a Program?
  30. 30. What is a Program?
  31. 31. What is a Program? code hardware
  32. 32. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  33. 33. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  34. 34. Code Sample int a = ... int b = ... // more code... ! int c = a + b; Q : H o w f a s t t h is c o de is?
  35. 35. Code Sample int a = ... int b = ... // more code... ! int c = a + b; Q : H o w f a s t t h is c o de is? A: De pe nd s.. .
  36. 36. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  37. 37. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  38. 38. Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on ho w fa st CP U adds t wo in te ge rs?
  39. 39. Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on ho w fa st CP U adds t wo in te ge rs? NO
  40. 40. Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on ho w fa st CP U adds t wo Any mo de ge rs? U in te rn CP ca n add in te geO N rs ve ry fa st ! ~1 cycle
  41. 41. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  42. 42. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  43. 43. Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on wh et he r `a’ an d `b’ are re ad y fo r proc es sing
  44. 44. Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on wh et he r `a’ an d `b’ are re ad y pr loade d in fo r i.e .oc es sing to CP U re gis te rs
  45. 45. Code Sample int a = ... int b = ... // more code... ! int c = a + b; ... on wh et he r `a’ an d `b’ are re ad y foo apr.oc es sing to d at de L r i.e dloaa d in me re r y f romCP Umogis te rs in t o a re g is te r ! ~600 cyc le s
  46. 46. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  47. 47. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  48. 48. Code Sample int a = ... int b = ... // more code... ! int c = a + b; Q : Wh at CP U is do ing in t h e me a n t ime?
  49. 49. Code Sample int a = ... int b = ... // more code... ! Q : Wh at CP U is do ing in t h e me a n t ime? int c = a + b; A: Nothing! It’s waiting for data
  50. 50. Code Sample int a = ... int b = ... // more code... ! int c = a + b;
  51. 51. You Ask
  52. 52. You Ask Can we do better?
  53. 53. You Ask Can we do better? Yes. And your hardware will help you
  54. 54. CPU
  55. 55. CPU Operation
  56. 56. CPU Operation Load & decode instruction(s)
  57. 57. CPU Operation Load & decode instruction(s) Load data memory -> registers
  58. 58. CPU Operation Load & decode instruction(s) Load data memory -> registers Execute instruction(s)
  59. 59. CPU Operation Load & decode instruction(s) Load data memory -> registers Execute instruction(s) Store results registers -> memory
  60. 60. (Not) Pipeline cycle pipeline stage IL ID DL EX DS
  61. 61. (Not) Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS
  62. 62. (Not) Pipeline cycle 1 2 pipeline stage IL ID instr. 1 instr. 1 DL EX DS
  63. 63. (Not) Pipeline cycle 1 2 3 pipeline stage IL ID DL instr. 1 instr. 1 instr. 1 EX DS
  64. 64. (Not) Pipeline cycle 1 2 3 4 pipeline stage IL ID DL EX instr. 1 instr. 1 instr. 1 instr. 1 DS
  65. 65. (Not) Pipeline cycle 1 2 3 4 5 pipeline stage IL ID DL EX DS instr. 1 instr. 1 instr. 1 instr. 1 instr. 1
  66. 66. (Not) Pipeline cycle 1 pipeline stage IL DL EX DS instr. 1 2 instr. 1 3 instr. 1 4 instr. 1 5 6 ID instr. 1 instr. 2
  67. 67. (Not) Pipeline cycle 1 pipeline stage IL ID DS instr. 1 3 instr. 1 4 instr. 1 5 7 EX instr. 1 2 6 DL instr. 1 instr. 2 instr. 2
  68. 68. Pipeline cycle pipeline stage IL ID DL EX DS
  69. 69. Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS
  70. 70. Pipeline cycle pipeline stage IL ID 1 instr. 1 2 instr. 2 instr. 1 DL EX DS
  71. 71. Pipeline cycle pipeline stage IL ID DL 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 EX DS
  72. 72. Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 instr. 4 instr. 3 instr. 2 instr. 1 DS
  73. 73. Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1
  74. 74. Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1 6 instr. 4 instr. 3 instr. 2
  75. 75. Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1 6 instr. 4 instr. 3 instr. 2 7 instr. 4 instr. 3
  76. 76. Pipeline cycle pipeline stage IL ID DL EX 1 instr. 1 2 instr. 2 instr. 1 3 instr. 3 instr. 2 instr. 1 4 DS instr. 4 instr. 3 instr. 2 instr. 1 5 instr. 4 instr. 3 instr. 2 instr. 1 6 instr. 4 instr. 3 instr. 2 7 instr. 4 instr. 3
  77. 77. Branch Prediction if (day == Monday) dose = kDouble; else dose = kStandard; ! make_coffee(dose);
  78. 78. Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4
  79. 79. Branch Prediction What if (day == Monday) // 1 <dose = kDouble; // 2 ins tr uc tio n to load & de co de else ne xt ? dose = kStandard; // 3 ! make_coffee(dose); // 4
  80. 80. Branch Prediction What if (day == Monday) // 1 <dose = kDouble; // 2 ins tr ucttio n to <- wo load & de co de or else xt ? <-neth re e dose = kStandard; // 3 ? ! make_coffee(dose); // 4
  81. 81. Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4
  82. 82. Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4
  83. 83. Branch Prediction if (day == Monday) dose = kDouble; else dose = kStandard; ! make_coffee(dose); // 1 // 2 CP U wi ll tr y to pr 3 //edict an d st art load & de co de // 4
  84. 84. Branch Prediction if (day == Monday) dose = kDouble; else dose = kStandard; ! make_coffee(dose); // 1 // 2 wa s w ro ng: If it CPis cwi ll tr s utos, d U a rd re y lt pr flus p d st ar //edicthanip e li ne t 3 load & de co de // 4
  85. 85. Branch Prediction if (day == Monday) // 1 dose = kDouble; // 2 else dose = kStandard; // 3 ! make_coffee(dose); // 4
  86. 86. Pipeline cycle pipeline stage IL ID DL EX DS
  87. 87. Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS
  88. 88. Pipeline cycle pipeline stage IL ID 1 instr. 1 2 instr. 2 instr. 1 DL EX DS
  89. 89. Pipeline cycle pipeline stage IL ID DL 1 instr. 1 2 instr. 2 instr. 1 3 instr. 4 instr. 2 instr. 1 EX DS
  90. 90. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 <- ins tr. 1 exec uted , predict ion wa s co rrec t
  91. 91. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 5 instr. 4 instr. 2 instr. 1 instr. 4 instr. 2 instr. 1
  92. 92. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 4 instr. 2 instr. 1 6 instr. 4 instr. 2
  93. 93. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 4 instr. 2 instr. 1 6 instr. 4 instr. 2 7 instr. 4
  94. 94. Pipeline cycle pipeline stage IL ID DL EX DS
  95. 95. Pipeline cycle 1 pipeline stage IL instr. 1 ID DL EX DS
  96. 96. Pipeline cycle pipeline stage IL ID 1 instr. 1 2 instr. 2 instr. 1 DL EX DS
  97. 97. Pipeline cycle pipeline stage IL ID DL 1 instr. 1 2 instr. 2 instr. 1 3 instr. 4 instr. 2 instr. 1 EX DS
  98. 98. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 <- ins tr. 1 exec uted , wrong predict ion de te cted
  99. 99. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 5 instr. 4 instr. 2 instr. 1 instr. 3 instr. 1
  100. 100. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 3 6 instr. 4 instr. 3 instr. 1
  101. 101. Pipeline cycle pipeline stage IL ID DL 1 instr. 2 instr. 1 3 DS instr. 1 2 EX instr. 4 instr. 2 instr. 1 4 instr. 4 instr. 2 instr. 1 5 instr. 3 6 instr. 4 instr. 3 7 instr. 1 instr. 4 instr. 3
  102. 102. Takeaways
  103. 103. Takeaways Branches are bad for the pipeline
  104. 104. Takeaways Branches are bad for the pipeline Avoid if possible
  105. 105. Takeaways Branches are bad for the pipeline Avoid if possible Help branch predictor to help you
  106. 106. Memory
  107. 107. Workflow
  108. 108. Workflow Program data is stored in memory
  109. 109. Workflow Program data is stored in memory CPU requests data for processing
  110. 110. Workflow Program data is stored in memory CPU requests data for processing Typical cycle: load, process, store
  111. 111. Architecture CPU Memory Controller Memory Banks
  112. 112. Architecture CPU Memory Controller Memory Banks
  113. 113. Architecture CPU Memory Controller Memory Banks
  114. 114. Architecture CPU Memory Controller Memory Banks
  115. 115. Architecture CPU Memory Controller Memory Banks
  116. 116. Parameters
  117. 117. Parameters There are two main parameters of memory subsystem:
  118. 118. Parameters There are two main parameters of memory subsystem: latency
  119. 119. Parameters There are two main parameters of memory subsystem: latency bandwidth
  120. 120. Latency
  121. 121. Latency Shows how much time passes between data request and its delivery
  122. 122. Latency Shows how much time passes between data request and its delivery Very important concept (see further)
  123. 123. Bandwidth
  124. 124. Bandwidth Shows how much data can be accessed per second
  125. 125. Bandwidth Shows how much data can be accessed per second Also important
  126. 126. History Lesson VAX-11 (1980) Modern Desktop Improvement Clock Speed, Mhz 6 3000 +500x Memory Size, MB 2 2000 +1000x Memory Bandwidth, MB/s 13 7000 +540x Memory Latency, ns 225 70 +3x Memory Latency, cycles 1.4 210 -150x Data from “Machine Architecture” talk by Herb Sutter
  127. 127. History Lesson
  128. 128. History Lesson For the past 30+ years we saw huge improvements in CPU processing power and data sizes
  129. 129. History Lesson For the past 30+ years we saw huge improvements in CPU processing power and data sizes ... b u t
  130. 130. History Lesson For the past 30+ years we saw huge improvements in CPU processing power and data sizes Memory speeds couldn’t keep up with the progress
  131. 131. Takeaways
  132. 132. Takeaways Latency is the king!
  133. 133. Takeaways Latency is the king! You can trade CPU time for memory, i.e. calculate more - load/store less
  134. 134. Memory types
  135. 135. Memory types There are two main memory types:
  136. 136. Memory types There are two main memory types: Static RAM - fast, but very expensive
  137. 137. Memory types There are two main memory types: Static RAM - fast, but very expensive Dynamic RAM - slow, but cheaper
  138. 138. Memory types There are two main memory types: W - h one but very expensive Static RAM hicfast, to use? Dynamic RAM - slow, but cheaper
  139. 139. Memory types There are two main memory types: Static RAM - fast, but very expensive Dynamic RAM - slow, but cheaper
  140. 140. Solution
  141. 141. Solution Build memory hierarchy which utilizes large amounts of cheap DRAM storage and small amounts of fast SRAM cache
  142. 142. Memory Hierarchy L1i/L1d L2 Cache Memory
  143. 143. Memory Hierarchy iPh one 4s: ! 32KB L1i 32KB L1d 1 MB L2 512 MB DR AM L1i/L1d L2 Cache Memory
  144. 144. Memory Hierarchy iPh one 4s: ! 32KB L1i 32KB L1d 1 MB L2 512 MB DR AM A c c e s s: L1i/L1d L2 Cache Memory ! re g is te rs - 1 cyc le L1 - 5 cyc le s L2 - 40 cyc le s DR AM - 610
  145. 145. Memory Hierarchy L1i/L1d L2 Cache Memory
  146. 146. Cache Miss
  147. 147. Cache Miss If data requested by CPU is not in the cache it has to be loaded from the main (slow) memory
  148. 148. Cache Line
  149. 149. Cache Line Minimum amount of data that can be read from and written to memory
  150. 150. Cache Line Minimum amount of data that can be read from and written to memory Usually 64-128 bytes
  151. 151. Cache Line
  152. 152. Cache Line What does it mean?
  153. 153. Cache Line What does it mean? Consider you have an array of 16 floats and you want the first float for calculations
  154. 154. Cache Line What does it mean? Consider you have an array of 16 floats and you want the first float for calculations If it’s not in cache already, you will pay the “full price” to load entire cache line
  155. 155. Cache Line What does it mean? Consider you have an array of 16 floats and you want the first float for calculations If it’s not in cache already, you will pay the “full price” to load entire cache line Access remaining 15 floats “for free”
  156. 156. Prefetch
  157. 157. Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively
  158. 158. Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively So, data will be ready when you need it
  159. 159. Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively So, data will be ready when you need it But your data access patterns must be very simple - linear is a good one
  160. 160. Prefetch Modern CPUs and compilers are able to detect memory access+patterns and BT W, C+ p e rat o rocaches> speculatively preload data in t ime s s ome e r re d t a s re freadyowhen you need it So, data will be “c ach e m is s” ope rat o r But your data access patterns must be very simple - linear is a good one
  161. 161. Prefetch Modern CPUs and compilers are able to detect memory access+patterns and BT W, C+ p e rat o rocaches> speculatively preload data in Can tyimue gue s s o s s ome w h y? s e r re d t a re freadyowhen you need it So, data will be “c ach e m is s” ope rat o r But your data access patterns must be very simple - linear is a good one
  162. 162. Prefetch Modern CPUs and compilers are able to detect memory access patterns and preload data in caches speculatively So, data will be ready when you need it But your data access patterns must be very simple - linear is a good one
  163. 163. How to Create a Good Program?
  164. 164. Simple
  165. 165. Simple Know your target hardware
  166. 166. Simple Know your target hardware Know your data
  167. 167. Simple Know your target hardware Know your data Use your brain
  168. 168. One More Thing...
  169. 169. One More Thing... Data-Oriented Design
  170. 170. Thank You!
  171. 171. Questions?
  172. 172. References Ulrich Drepper, “What Every Programmer Should Know About Memory” Крис Касперски, “Техника оптимизации программ. Еффективное использование памяти” @mike_acton

The talk I did during IT Weekend Rivne event 2 years ago.

Views

Total views

336

On Slideshare

0

From embeds

0

Number of embeds

15

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×