The Quantum Physics of Java

  • 248 views
Uploaded on

"If we were able to take a microscope and observe how our programs work on the lowest level, we would be surprised and shocked. Close to the wire, programs behave very differently from what we …

"If we were able to take a microscope and observe how our programs work on the lowest level, we would be surprised and shocked. Close to the wire, programs behave very differently from what we expect.

In this session we will go through code examples that show the counter-intuitive behavior of Java on the microscopic scale. We will take a detailed look at how the underlying technology works that causes the surprising behavior and how we can measure our programs on the lowest level. Topics covered will be the cache hierarchy, false sharing, pipelining, branch prediction, and out-of-order execution.

After this talk you will have a good understanding of how modern CPUs work and how this can affect the performance of your programs."

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
248
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Quantum Physics of Java Michael Heinrichs Canoo Engineering AG
  • 2. for (int i = 0; i < n; i++) { a[i] *= 3; } for (int i = 0; i < n; i+=16) { a[i] *= 3; } 26,1 ms 25,8 ms
  • 3. Michael Heinrichs http://blog.netopyr.com @net0pyr canoo delivering end-user happiness[ ]
  • 4. for (int i = 0; i < n; i++) { a[i] *= 3; } for (int i = 0; i < n; i+=16) { a[i] *= 3; } 26,1 ms 25,8 ms
  • 5. CPU Main Memory
  • 6. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CPU
  • 7. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  • 8. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  • 9. CPU 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  • 10. CPU 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  • 11. CPU 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  • 12. CPU 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  • 13. 1980 1985 1990 1995 2000 2005 2010 1 10 100 1000 10000 100000 CPU Memory Year Performance
  • 14. Main Memory CPU
  • 15. Main Memory Cache CPU
  • 16. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CPU
  • 17. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CPU
  • 18. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  • 19. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  • 20. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 3
  • 21. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1
  • 22. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 3
  • 23. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 3
  • 24. for (int i = 0; i < n; i++) { a[i % a.length] *= 3; }
  • 25. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 1 2 3 4 5 6 7
  • 26. CPU Main Memory Cache
  • 27. Main Memory L1 Cache L2 Cache L3 Cache CPU
  • 28. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 1 2 3 4 5 6 7
  • 29. Not quite pints By radioedit (CC BY-SA 2.0)
  • 30. Main Memory L1 Cache L2 Cache L3 Cache < 1 ns, 32 KB 7 ns, 256 KB 25 ns, 8 MB 100 ns, 16 GB CPU
  • 31. for (int i = 0; i < n; i++) { a[i % a.length] *= 3; } for (int i = 0; i < n; i++) { a[rnd()] *= 3; }
  • 32. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 1 2 3 4 5 6 7
  • 33. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 5 10 15 20 25
  • 34. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 5 10 15 20 25
  • 35. perf
  • 36. perf stat application
  • 37. perf stat -p 1234 sleep 5
  • 38. Main Memory L1 Cache L2 Cache L3 Cache CPU
  • 39. 1 1 1 1 1 11 1 1 1 11 1 1 1 1 CPU
  • 40. 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 CPU
  • 41. CPU 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 1
  • 42. CPU 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1
  • 43. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 5 10 15 20 25
  • 44. int a; Thread 1: a++; Thread 2: return a; 2,1 ops/ns 1,2 ops/ns
  • 45. Main Memory L2 Cache L3 Cache L1 Cache L1 Cache Core 1 Core 2
  • 46. L1 Cache L1 Cachea Core 1 Core 2
  • 47. L1 Cache L1 Cachea a ? Core 1 Core 2
  • 48. Modified Exclusive Shared Invalid
  • 49. Modified Exclusive Shared Invalid
  • 50. Modified Exclusive Shared Invalid
  • 51. Modified Exclusive Shared Invalid
  • 52. Modified Exclusive Shared Invalid
  • 53. Modified Exclusive Shared Invalid
  • 54. Modified Exclusive Shared Invalid
  • 55. L1 Cache L1 Cachea a Core 1 Core 2
  • 56. int a; int b; Thread 1: a++; Thread 2: return b; 2,1 ops/ns 1,2 ops/ns
  • 57. L1 Cache L1 Cachea b Core 1 Core 2
  • 58. L1 Cache L1 Cachea ab b Core 1 Core 2
  • 59. Given: sorted int[16] Linear or Binary Search? 21,8 ns 28,2 ns
  • 60. A B C D E CPU
  • 61. Fetch A A B C D E
  • 62. Fetch Decode A A B C D E
  • 63. Fetch Decode Execute A A B C D E
  • 64. Fetch Decode Execute Write- back A A B C D E
  • 65. Fetch Decode Execute Write- back A A B C D E
  • 66. Fetch Decode Execute Write- back B A B C D E A
  • 67. Fetch Decode Execute Write- back C A B C D E B A
  • 68. Fetch Decode Execute Write- back D A B C D E C B A
  • 69. Fetch Decode Execute Write- back E A B C D E D C B
  • 70. Fetch Decode Execute Write- back A B C D E
  • 71. Fetch Decode Execute Write- back A B C D E A
  • 72. Fetch Decode Execute Write- back A B C D E A?
  • 73. Fetch Decode Execute Write- back A B C D E AB
  • 74. Fetch Decode Execute Write- back A B C D E C B A
  • 75. Fetch Decode Execute Write- back A B C D E C B A
  • 76. Fetch Decode Execute Write- back A B C D E C B A
  • 77. Fetch Decode Execute Write- back A B C D E AE
  • 78. Binary Search while (low <= high) … if (midVal < needle) … else if (midVal > needle) … else …
  • 79. Linear Search for element : haystack if element == needle … else if (element > needle) …
  • 80. int a; int b; a *= 3; a *= 5; a *= 3; b *= 5; 2,3 ns 2,1 ns
  • 81. Fetch Decode Execute Write- back
  • 82. Fetch Decode Execute Write- back Execute Execute
  • 83. Fetch Decode Execute Write- back Execute Execute Fetch Fetch
  • 84. Fetch Decode Write- back Fetch Fetch A B C Execute Execute Execute
  • 85. Fetch Decode Write- back Fetch Fetch A B C A B C Execute Execute Execute
  • 86. Fetch Decode Write- back Fetch Fetch A B C A B C Execute Execute Execute
  • 87. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C A B C
  • 88. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C A B C
  • 89. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C A B C
  • 90. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C AB C
  • 91. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C B C
  • 92. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C C
  • 93. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C C
  • 94. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C
  • 95. a *= 3 a *= 5 a *= 3 b *= 5 int a; int b; a *= 3; a *= 5; a *= 3; b *= 5;
  • 96. Know thy CPU (because sometimes it matters)
  • 97. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.