Your SlideShare is downloading. ×
Top 10 - Performance Folklore
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Top 10 - Performance Folklore

330
views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1esNaCc. …

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1esNaCc.

Martin Thompson discusses Java, concurrency, operating systems, and functional programming in the context of designing and testing high-performance systems. Filmed at qconsf.com.

Martin Thompson is a high-performance and low-latency specialist, with experience gained over two decades working on large scale transactional and big-data systems. He believes in Mechanical Sympathy, i.e. applying an understanding of the hardware to the creation of software as being fundamental to delivering elegant high-performance solutions.


0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
330
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Top 10 Performance Folklore Martin Thompson - @mjpt777
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /top-10-performance-myths InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. “… as I’d like to show Galileo our world, I must show him something with a great deal of shame. If we look away from the science and look at the world around us, we find out something rather pitiful: that the environment that we live in is so actively, intensely unscientific.” – Richard Feynman
  • 5. “How would you design an experiment to answer that?” – Melville Arthur Feynman
  • 6. Folklore • A body of widely held but false or unsubstantiated beliefs
  • 7. Performance Folklore
  • 8. 10 “Disk is random access”
  • 9. Zone Bit Recording (ZBR)
  • 10. Zone Bit Recording (ZBR)
  • 11. IO = Cmd + Seek + Rotation + Transfer
  • 12. IO = Cmd + Seek + Rotation + Transfer Random 4K Block ~6ms = 0.1 + 3 + 3 + 0.02
  • 13. IO = Cmd + Seek + Rotation + Transfer Random 4K Block ~6ms = 0.1 + 3 + 3 + 0.02 ~664KB/s vs. ~200MB/s !!!
  • 14. 9 “CPUs are not getting any faster”
  • 15. “CPUs are not getting any faster?” Text tokenization of “Alice in Wonderland”
  • 16. “CPUs are not getting any faster?” Text tokenization of “Alice in Wonderland” Micro Architecture Model Ops/sec Core P8600 @ 2.40GHz 1434 Nehalem Sandy Bridge E5620 @ 2.40GHz i7-2720QM @ 2.20GHz 1768 2674
  • 17. Nehalem 2.8GHz ============== $ perf stat <program> 6975.000345 2,065 126 14,348 22,952,576,506 7,035,973,150 8,778,857,971 35,420,228,726 task-clock context-switches CPU-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions 6,793,566,368 branches 285,888,040 branch-misses 5.981211788 seconds time elapsed # # # # # # # # # # # 1.166 0.296 0.018 0.002 3.291 30.65% 38.25% 1.54 0.25 973.988 4.21% CPUs utilized K/sec K/sec M/sec GHz frontend cycles idle backend cycles idle insns per cycle stalled cycles per insn M/sec of all branches
  • 18. Nehalem 2.8GHz ============== $ perf stat <program> 6975.000345 2,065 126 14,348 22,952,576,506 7,035,973,150 8,778,857,971 35,420,228,726 task-clock context-switches CPU-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions 6,793,566,368 branches 285,888,040 branch-misses 5.981211788 time elapsed 5.981211788 seconds # # # # # # # # # # # 1.166 0.296 0.018 0.002 3.291 30.65% 38.25% 1.54 0.25 973.988 4.21% CPUs utilized K/sec K/sec M/sec GHz frontend cycles idle backend cycles idle insns per cycle stalled cycles per insn M/sec of all branches
  • 19. Nehalem 2.8GHz ============== $ perf stat <program> 6975.000345 2,065 126 14,348 22,952,576,506 7,035,973,150 8,778,857,971 35,420,228,726 task-clock context-switches CPU-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions 6,793,566,368 branches 285,888,040 branch-misses 5.981211788 seconds time elapsed # # # # # # # # # # # 1.166 0.296 0.018 0.002 3.291 30.65% 38.25% 1.54 0.25 973.988 4.21% CPUs utilized K/sec K/sec M/sec GHz frontend cycles idle backend cycles idle insns per cycle stalled cycles per insn M/sec of all branches 30.65% 38.25% 1.54
  • 20. Sandy Bridge 2.4GHz =================== $ perf stat <program> 5888.817958 2,091 211 14,148 19,026,773,297 5,117,688,998 4,006,936,100 35,396,514,536 task-clock context-switches CPU-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions 6,793,131,675 branches 186,362,065 branch-misses 4.988868680 seconds time elapsed # 1.180 CPUs utilized # 0.355 K/sec # 0.036 K/sec # 0.002 M/sec # 3.231 GHz # 26.90% frontend cycles idle # 21.06% backend cycles idle # 1.86 insns per cycle # 0.14 stalled cycles per insn # 1153.565 M/sec # 2.74% of all branches
  • 21. Sandy Bridge 2.4GHz =================== $ perf stat <program> 5888.817958 2,091 211 14,148 19,026,773,297 5,117,688,998 4,006,936,100 35,396,514,536 task-clock context-switches CPU-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions 6,793,131,675 branches 186,362,065 branch-misses 4.988868680 time elapsed 4.988868680 seconds # 1.180 CPUs utilized # 0.355 K/sec # 0.036 K/sec # 0.002 M/sec # 3.231 GHz # 26.90% frontend cycles idle # 21.06% backend cycles idle # 1.86 insns per cycle # 0.14 stalled cycles per insn # 1153.565 M/sec # 2.74% of all branches
  • 22. Sandy Bridge 2.4GHz =================== $ perf stat <program> 5888.817958 2,091 211 14,148 19,026,773,297 5,117,688,998 4,006,936,100 35,396,514,536 task-clock context-switches CPU-migrations page-faults cycles stalled-cycles-frontend stalled-cycles-backend instructions 6,793,131,675 branches 186,362,065 branch-misses 4.988868680 seconds time elapsed # 1.180 CPUs utilized # 0.355 K/sec # 0.036 K/sec # 0.002 M/sec # 3.231 GHz # 26.90% frontend cycles idle # 21.06% backend cycles idle # 1.86 insns per cycle # 0.14 stalled cycles per insn # 1153.565 M/sec # 2.74% of all branches 26.90% 21.06% 1.86
  • 23. 8 “Memory is random access”
  • 24. Temporal
  • 25. Temporal Spatial
  • 26. Temporal Spatial Pattern
  • 27. Latencies measured by SiSoftware Intel i7-3960X (Sandy Bridge E) L1D Sequential In-Page Random Full Random 3 clocks 3 clocks 3 clocks L2 11 clocks 11 clocks 11 clocks L3 14 clocks 18 clocks 38 clocks Memory 6.0 ns 22.0 ns 65.8 ns
  • 28. Memory Access Patterns Really Matter!!!
  • 29. 7 “Mac’s are a good development platform”
  • 30. 6 “Garbage Collection takes away the worry of memory management”
  • 31. $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | wc -l
  • 32. $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | wc –l ... 770
  • 33. $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | wc –l ... 770 At least 272 are GC related!!!
  • 34. G1GC
  • 35. G1GC
  • 36. Native Heap – Beyond the city walls...
  • 37. 5 “Functional Programming solves the concurrency problem”
  • 38. “The runtime can optimize immutable values...” – FP Fanboi
  • 39. Persistent Data Structures
  • 40. HAMT * Market HAMT * Venue HAMT * Instrument HAMT * Order Ref
  • 41. HAMT * Market HAMT * Venue HAMT * Instrument HAMT * Order << CAS Failure? >> Ref
  • 42. HAMT << CAS Failure? >> Ref * Market HAMT * Venue Threads * Instrument HAMT * Order Model Depth HAMT
  • 43. 10 X Throughput
  • 44. Garbage Collection 10 X Throughput &&
  • 45. 4 “Domain Models do not perform...”
  • 46. “Object models do not perform...” – Translation
  • 47. “Indirection is bad...” – Distillation
  • 48. Cohesion => Cache Friendly
  • 49. Feature Envy => Cache Misses
  • 50. Clean Code => Good Performance
  • 51. Specialised Indices for access paths
  • 52. Choose your data structures well
  • 53. 3 “Go parallel to scale...”
  • 54. Locks Work Pools Map Reduce Fork Join Parallel Collections
  • 55. Single Threaded Rocks!
  • 56. ... apply messaging passing ... ... then consider pipelining ...
  • 57. Go parallel only when you really need too!
  • 58. 2 “Logging is cheap...”
  • 59. Should a logger be able to generate data faster than a disk can write?
  • 60. Should more threads be able to generate more data to write?
  • 61. Average Log Event Duration 160,000 Time (nanoseconds) 140,000 120,000 100,000 80,000 60,000 40,000 20,000 0 1 2 3 4 5 6 7 8
  • 62. Should logging be synchronous?
  • 63. Should we really be logging Strings?
  • 64. Should logging need guards?
  • 65. How reliable should logging be?
  • 66. 1 “Parsing code is highly optimised...”
  • 67. SOAP JSON XML HTTP FIX
  • 68. How much text is not UTF-8 or ASCII?
  • 69. We need primitive support for dealing with bytes and chars
  • 70. What’s horrible in the JDK when parsing?
  • 71. “Where do I start?”
  • 72. Performance Testing
  • 73. Questions? Blog: http://mechanical-sympathy.blogspot.com/ Twitter: @mjpt777 “Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage, to move in the opposite direction.” - Albert Einstein
  • 74. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/top-10performance-myths