The Quantum Physics
of Java
Michael Heinrichs
Canoo Engineering AG
for (int i = 0; i < n; i++) {
a[i] *= 3;
}
for (int i = 0; i < n; i+=16) {
a[i] *= 3;
} 26,1 ms
25,8 ms
Michael Heinrichs
http://blog.netopyr.com
@net0pyr
canoo
delivering end-user happiness[ ]
for (int i = 0; i < n; i++) {
a[i] *= 3;
}
for (int i = 0; i < n; i+=16) {
a[i] *= 3;
} 26,1 ms
25,8 ms
CPU
Main Memory
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CPU
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3
CPU
3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3
CPU
3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
CPU
3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3
CPU
3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3
1980 1985 1990 1995 2000 2005 2010
1
10
100
1000
10000
100000
CPU
Memory
Year
Performance
Main Memory
CPU
Main Memory
Cache
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
CPU
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
3
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1
3
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1
1
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1
3
CPU
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 3 1 1 1 1
3
for (int i = 0; i < n; i++) {
a[i % a.length] *= 3;
}
1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB
0
1
2
3
4
5...
CPU
Main Memory
Cache
Main Memory
L1 Cache
L2 Cache
L3 Cache
CPU
1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB
0
1
2
3
4
5...
Not quite pints
By radioedit (CC BY-SA 2.0)
Main Memory
L1 Cache
L2 Cache
L3 Cache
< 1 ns, 32 KB
7 ns, 256 KB
25 ns, 8 MB
100 ns, 16 GB
CPU
for (int i = 0; i < n; i++) {
a[i % a.length] *= 3;
}
for (int i = 0; i < n; i++) {
a[rnd()] *= 3;
}
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1 MB
2 MB
4 MB
8 MB
16 MB
32 MB
64 MB
128 MB
256 MB
0
1
2
3
4
5...
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1 MB
2 MB
4 MB
8 MB
16 MB
32 MB
64 MB
128 MB
256 MB
0
5
10
15
2...
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1 MB
2 MB
4 MB
8 MB
16 MB
32 MB
64 MB
128 MB
256 MB
0
5
10
15
2...
perf
perf stat application
perf stat -p 1234 sleep 5
Main Memory
L1 Cache
L2 Cache
L3 Cache
CPU
1 1 1 1 1 11 1 1 1 11 1 1 1 1
CPU
1 1 1 1 1 11 1 1 1 11 1 1 1 1
1 1 1 1
CPU
CPU
1 1 1 1 1 11 1 1 1 11 1 1 1 1
1 1 1 1
1
CPU
1 1 1 1 1 11 1 1 1 11 1 1 1 1
1 1 1 1
1
1 1 1 1
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1 MB
2 MB
4 MB
8 MB
16 MB
32 MB
64 MB
128 MB
256 MB
0
5
10
15
2...
int a;
Thread 1: a++;
Thread 2: return a;
2,1 ops/ns
1,2 ops/ns
Main Memory
L2 Cache
L3 Cache
L1 Cache L1 Cache
Core 1 Core 2
L1 Cache L1 Cachea
Core 1 Core 2
L1 Cache L1 Cachea a
?
Core 1 Core 2
Modified Exclusive
Shared Invalid
Modified Exclusive
Shared Invalid
Modified Exclusive
Shared Invalid
Modified Exclusive
Shared Invalid
Modified Exclusive
Shared Invalid
Modified Exclusive
Shared Invalid
Modified Exclusive
Shared Invalid
L1 Cache L1 Cachea a
Core 1 Core 2
int a;
int b;
Thread 1: a++;
Thread 2: return b;
2,1 ops/ns
1,2 ops/ns
L1 Cache L1 Cachea b
Core 1 Core 2
L1 Cache L1 Cachea ab b
Core 1 Core 2
Given: sorted int[16]
Linear or Binary Search?
21,8 ns 28,2 ns
A
B
C
D
E
CPU
Fetch
A
A
B
C
D
E
Fetch Decode
A
A
B
C
D
E
Fetch Decode Execute
A
A
B
C
D
E
Fetch Decode Execute Write-
back
A
A
B
C
D
E
Fetch Decode Execute Write-
back
A
A
B
C
D
E
Fetch Decode Execute Write-
back
B
A
B
C
D
E A
Fetch Decode Execute Write-
back
C
A
B
C
D
E B A
Fetch Decode Execute Write-
back
D
A
B
C
D
E C B A
Fetch Decode Execute Write-
back
E
A
B
C
D
E D C B
Fetch Decode Execute Write-
back
A
B
C
D
E
Fetch Decode Execute Write-
back
A
B
C
D
E A
Fetch Decode Execute Write-
back
A
B
C
D
E A?
Fetch Decode Execute Write-
back
A
B
C
D
E AB
Fetch Decode Execute Write-
back
A
B
C
D
E C B A
Fetch Decode Execute Write-
back
A
B
C
D
E C B A
Fetch Decode Execute Write-
back
A
B
C
D
E C B A
Fetch Decode Execute Write-
back
A
B
C
D
E AE
Binary Search
while (low <= high)
…
if (midVal < needle)
…
else if (midVal > needle)
…
else
…
Linear Search
for element : haystack
if element == needle
…
else if (element > needle)
…
int a;
int b;
a *= 3;
a *= 5;
a *= 3;
b *= 5;
2,3 ns
2,1 ns
Fetch Decode Execute Write-
back
Fetch Decode Execute Write-
back
Execute
Execute
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
Fetch Decode Write-
back
Fetch
Fetch
A
B
C
Execute
Execute
Execute
Fetch Decode Write-
back
Fetch
Fetch
A
B
C
A
B
C
Execute
Execute
Execute
Fetch Decode Write-
back
Fetch
Fetch
A
B C
A
B
C
Execute
Execute
Execute
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C
A
B
C
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C
A
B
C
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C
A
B
C
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C AB
C
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C B
C
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C
C
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C
C
Fetch Decode Execute Write-
back
Execute
Execute
Fetch
Fetch
A
B
C
a *= 3 a *= 5
a *= 3
b *= 5
int a;
int b;
a *= 3;
a *= 5;
a *= 3;
b *= 5;
Know thy CPU
(because sometimes it matters)
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
To view a copy of this ...
The Quantum Physics of Java
The Quantum Physics of Java
The Quantum Physics of Java
The Quantum Physics of Java
Upcoming SlideShare
Loading in …5
×

The Quantum Physics of Java

770 views

Published on

"If we were able to take a microscope and observe how our programs work on the lowest level, we would be surprised and shocked. Close to the wire, programs behave very differently from what we expect.

In this session we will go through code examples that show the counter-intuitive behavior of Java on the microscopic scale. We will take a detailed look at how the underlying technology works that causes the surprising behavior and how we can measure our programs on the lowest level. Topics covered will be the cache hierarchy, false sharing, pipelining, branch prediction, and out-of-order execution.

After this talk you will have a good understanding of how modern CPUs work and how this can affect the performance of your programs."

Published in: Software, Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
770
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Quantum Physics of Java

  1. 1. The Quantum Physics of Java Michael Heinrichs Canoo Engineering AG
  2. 2. for (int i = 0; i < n; i++) { a[i] *= 3; } for (int i = 0; i < n; i+=16) { a[i] *= 3; } 26,1 ms 25,8 ms
  3. 3. Michael Heinrichs http://blog.netopyr.com @net0pyr canoo delivering end-user happiness[ ]
  4. 4. for (int i = 0; i < n; i++) { a[i] *= 3; } for (int i = 0; i < n; i+=16) { a[i] *= 3; } 26,1 ms 25,8 ms
  5. 5. CPU Main Memory
  6. 6. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CPU
  7. 7. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  8. 8. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  9. 9. CPU 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  10. 10. CPU 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  11. 11. CPU 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  12. 12. CPU 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  13. 13. 1980 1985 1990 1995 2000 2005 2010 1 10 100 1000 10000 100000 CPU Memory Year Performance
  14. 14. Main Memory CPU
  15. 15. Main Memory Cache CPU
  16. 16. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CPU
  17. 17. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CPU
  18. 18. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  19. 19. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
  20. 20. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 3
  21. 21. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1
  22. 22. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 3
  23. 23. CPU 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 3
  24. 24. for (int i = 0; i < n; i++) { a[i % a.length] *= 3; }
  25. 25. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 1 2 3 4 5 6 7
  26. 26. CPU Main Memory Cache
  27. 27. Main Memory L1 Cache L2 Cache L3 Cache CPU
  28. 28. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 1 2 3 4 5 6 7
  29. 29. Not quite pints By radioedit (CC BY-SA 2.0)
  30. 30. Main Memory L1 Cache L2 Cache L3 Cache < 1 ns, 32 KB 7 ns, 256 KB 25 ns, 8 MB 100 ns, 16 GB CPU
  31. 31. for (int i = 0; i < n; i++) { a[i % a.length] *= 3; } for (int i = 0; i < n; i++) { a[rnd()] *= 3; }
  32. 32. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 1 2 3 4 5 6 7
  33. 33. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 5 10 15 20 25
  34. 34. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 5 10 15 20 25
  35. 35. perf
  36. 36. perf stat application
  37. 37. perf stat -p 1234 sleep 5
  38. 38. Main Memory L1 Cache L2 Cache L3 Cache CPU
  39. 39. 1 1 1 1 1 11 1 1 1 11 1 1 1 1 CPU
  40. 40. 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 CPU
  41. 41. CPU 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 1
  42. 42. CPU 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1
  43. 43. 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 0 5 10 15 20 25
  44. 44. int a; Thread 1: a++; Thread 2: return a; 2,1 ops/ns 1,2 ops/ns
  45. 45. Main Memory L2 Cache L3 Cache L1 Cache L1 Cache Core 1 Core 2
  46. 46. L1 Cache L1 Cachea Core 1 Core 2
  47. 47. L1 Cache L1 Cachea a ? Core 1 Core 2
  48. 48. Modified Exclusive Shared Invalid
  49. 49. Modified Exclusive Shared Invalid
  50. 50. Modified Exclusive Shared Invalid
  51. 51. Modified Exclusive Shared Invalid
  52. 52. Modified Exclusive Shared Invalid
  53. 53. Modified Exclusive Shared Invalid
  54. 54. Modified Exclusive Shared Invalid
  55. 55. L1 Cache L1 Cachea a Core 1 Core 2
  56. 56. int a; int b; Thread 1: a++; Thread 2: return b; 2,1 ops/ns 1,2 ops/ns
  57. 57. L1 Cache L1 Cachea b Core 1 Core 2
  58. 58. L1 Cache L1 Cachea ab b Core 1 Core 2
  59. 59. Given: sorted int[16] Linear or Binary Search? 21,8 ns 28,2 ns
  60. 60. A B C D E CPU
  61. 61. Fetch A A B C D E
  62. 62. Fetch Decode A A B C D E
  63. 63. Fetch Decode Execute A A B C D E
  64. 64. Fetch Decode Execute Write- back A A B C D E
  65. 65. Fetch Decode Execute Write- back A A B C D E
  66. 66. Fetch Decode Execute Write- back B A B C D E A
  67. 67. Fetch Decode Execute Write- back C A B C D E B A
  68. 68. Fetch Decode Execute Write- back D A B C D E C B A
  69. 69. Fetch Decode Execute Write- back E A B C D E D C B
  70. 70. Fetch Decode Execute Write- back A B C D E
  71. 71. Fetch Decode Execute Write- back A B C D E A
  72. 72. Fetch Decode Execute Write- back A B C D E A?
  73. 73. Fetch Decode Execute Write- back A B C D E AB
  74. 74. Fetch Decode Execute Write- back A B C D E C B A
  75. 75. Fetch Decode Execute Write- back A B C D E C B A
  76. 76. Fetch Decode Execute Write- back A B C D E C B A
  77. 77. Fetch Decode Execute Write- back A B C D E AE
  78. 78. Binary Search while (low <= high) … if (midVal < needle) … else if (midVal > needle) … else …
  79. 79. Linear Search for element : haystack if element == needle … else if (element > needle) …
  80. 80. int a; int b; a *= 3; a *= 5; a *= 3; b *= 5; 2,3 ns 2,1 ns
  81. 81. Fetch Decode Execute Write- back
  82. 82. Fetch Decode Execute Write- back Execute Execute
  83. 83. Fetch Decode Execute Write- back Execute Execute Fetch Fetch
  84. 84. Fetch Decode Write- back Fetch Fetch A B C Execute Execute Execute
  85. 85. Fetch Decode Write- back Fetch Fetch A B C A B C Execute Execute Execute
  86. 86. Fetch Decode Write- back Fetch Fetch A B C A B C Execute Execute Execute
  87. 87. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C A B C
  88. 88. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C A B C
  89. 89. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C A B C
  90. 90. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C AB C
  91. 91. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C B C
  92. 92. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C C
  93. 93. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C C
  94. 94. Fetch Decode Execute Write- back Execute Execute Fetch Fetch A B C
  95. 95. a *= 3 a *= 5 a *= 3 b *= 5 int a; int b; a *= 3; a *= 5; a *= 3; b *= 5;
  96. 96. Know thy CPU (because sometimes it matters)
  97. 97. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.

×