Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification...
Maximizing Your SPARC T4Oracle Solaris ApplicationPerformance§ Darryl Gove    Senior Principal Software Engineer Copyright...
Program Agenda          § Hardware          § Correctness          § Performance          § Parallelism Copyright © 2012, ...
More Information§ Download, technical articles and more: http://oracle.com/goto/solarisstudioOpenWorld Sessions§ Mon, Oct ...
Oracle Solaris Studio                                     Compiler Suite                                         Analysis ...
Oracle Solaris Studio 12.3 Highlights                                                                           Ø   3x fas...
SPARC T4 Hardware                                                                                                         ...
SPARC T4 - Overview             § Not like T1 – T3 (only shares the T-series name)             § Single thread performance...
SPARC T4 - Details             § 1 to 4 chips per system             § 8 cores per chip                           ●       ...
SPARC T4 - Capacity              § Chip capacity: 48 B instructions / sec              § For fully active threads:        ...
Developing for T4              § Make it correct              § Remove obvious performance issues              § Make it s...
Application Correctness                                                                                                   ...
Debug information              § Always use -g              § No optimisation flags:                            ●         ...
Automatic Error Detection              § Static/compile time error detection                            ●                 ...
Code Analyzer              § Static analysis for common coding errors                            ●                        ...
Code Analyzer – example output16   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Memory Error Detection - discover              § Common memory allocation and use errors:                            ●    ...
Example of discover$ ./a.outERROR 1 (ABR): reading memory beyond array bounds at address0xffbff278 (8 bytes) on the stack ...
Application Performance                                                                                                   ...
Optimisation – the Basics              § No optimisation flags == no optimisation              § Good optimisation: -O    ...
Profiling              § Profiling with the performance analyzer                            ●                             ...
Performance Analyzer              § Demo22   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Performance Analyzer              § Demo23   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Aggressive Optimisation              § One stop flag: -fast              § Enables multiple optimisations                 ...
Profile Drives Flag Selection            Floating Point            § Significant time in floating point computation:      ...
Profile Drives Flag Selection            Flat profile            § Many hot small functions                 ●             ...
Profile Drives Flag Selection            Pointers            § Pointers inhibit compiler optimisations            § Compil...
Processor Specific Optimisations              § Default: -xtarget=generic often good enough              § T4 has useful i...
SPARC Instruction Sets29   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Multi-threaded         Applications                                                                                       ...
Multi-thread or Multi-process              § Multiprocess:                            ●                                   ...
Multi-threaded Application Development              § POSIX threads (C11, C++11)                            ●             ...
OpenMP Parallel For              § Distributes iterations across CPUs            #pragma omp parallel for            for (...
OpenMP Tasks              § Distributes work across CPUs            for (int i=0; i<length; i++)            {             ...
Parallel Program Correctness              § Distributes work across CPUs            int total=0;            #pragma omp pa...
Thread Analyzer              § Instrument application                            ●                                       C...
Thread Analyzer - Example              § Demo37   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Scaling to Many Threads              § Minimise serial code                            ●                                  ...
Scaling to Many Threads              § Demo39   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Limits of Performance              § Threads                            ●                                       vmstat    ...
Conclusion: Optimising for T4              § Step 1: Profile and remove inefficient code              § Step 2: Explore be...
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification...
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification...
Upcoming SlideShare
Loading in …5
×

Maximizing Your SPARC t4 Oracle Solaris Application Performance

2,455 views

Published on

In this presentation, learn how Oracle Solaris customers and ISV partners have reached peak performance on Oracle’s new SPARC T4 servers and engineered systems with Oracle Solaris Studio. Learn about the latest Oracle Solaris Studio development tools for analyzing, reporting, and improving runtime performance, such as:

• Autoparallelizing, high-performance compilers

• Performance Analyzer (used to find performance hotspots)

• Thread Analyzer (to expose data races and deadlocks)

• Code Analyzer (used to discover latent memory corruption issues)

Explore the ways developers have been taking advantage of the full potential of the SPARC T4 multicore architecture and Oracle Solaris 11.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,455
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Maximizing Your SPARC t4 Oracle Solaris Application Performance

  1. 1. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 131
  2. 2. Maximizing Your SPARC T4Oracle Solaris ApplicationPerformance§ Darryl Gove Senior Principal Software Engineer Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 132
  3. 3. Program Agenda § Hardware § Correctness § Performance § Parallelism Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 133
  4. 4. More Information§ Download, technical articles and more: http://oracle.com/goto/solarisstudioOpenWorld Sessions§ Mon, Oct 1, 10:45 - 11:45 AM: Maximizing Your SPARC T4 Oracle Solaris Application Performance, CON 6382 (Marriott Marquis - Golden Gate)§ Mon, Oct 1, 3:15 - 4:15 PM: Technical Panel: Developing High Performance Applications on Oracle Solaris, CON 7196 (Marriott Marquis - Golden Gate)Hands-on Lab§ Wed, Oct 3, 1:15 - 2:15 PM: Develop C/C++ Applications for the Cloud with Oracle Tuxedo and Oracle Solaris Studio, HOL 10276 (Marriott Marquis - Salon 5/6)JavaOne Sessions§ Mon, Oct 1, 8:30 – 9:30 AM: Mixed-Language Development: Leveraging Native Code from Java, CON 6714 (Hilton San Francisco -Continental Ballroom 6)§ Tues, Oct 2, 1:00 – 2:00 PM: Take Performance Tuning of Your Enterprise Java Applications to the Next Level , CON 10213 (Hilton San Francisco -Continental Ballroom 6)4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  5. 5. Oracle Solaris Studio Compiler Suite Analysis Suite C, C++ Compilers utilize advanced Performance Analyzer provides code generation technology to unparalleled insight into your app, optimize apps for highest allowing you to identify bottlenecks performance on SPARC & x86 and improve performance by orders of magnitude Fortran Compiler optimizes compute intensive app performance New Code Analyzer ensures app reliability by detecting app vulnerabilities, Debugger ensures app stability with including memory leaks and memory event handling & multi-thread access violations support Thread Analyzer simplifies complex parallel programming errors by© 2011 Oracle Corporation – Proprietary Library maximizes Performance and Confidential detecting hard to pinpoint race and 4 compute-intensive app performance deadlock conditions using advanced numeric solver libraries Integrated Development Environment increases developer efficiency 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  6. 6. Oracle Solaris Studio 12.3 Highlights Ø 3x faster code on SPARC T4 than GCC; Accelerate 40% faster than Sun Studio 12 Performance Ø 1.5x faster code on Intel x86 than GCC; 20% faster than Sun Studio 12 Ø New Code Analyzer for more reliable applications; reports common coding & memory access errors faster Gain Extreme than competitive alternatives Observability Ø Enhanced Performance Analyzer with system-wide performance analysis Ø Remote access to Solaris Studio tools from local desktop (Oracle Solaris, Linux, Microsoft Windows, Mac) Improve Ø Streamlined Oracle DB application development Productivity Ø Simplify Oracle Tuxedo development with IDE plug-in Ø IPS distribution on Solaris 11 for simplified management Ø 20% faster compile time6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  7. 7. SPARC T4 Hardware Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 137
  8. 8. SPARC T4 - Overview § Not like T1 – T3 (only shares the T-series name) § Single thread performance § Multithread throughput8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  9. 9. SPARC T4 - Details § 1 to 4 chips per system § 8 cores per chip ● Dual issue ● Out-of-order § 8 threads per core § 3.0 GHz clock ● 48B (3.0GHz * 8 * 2) instructions / sec / chip9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  10. 10. SPARC T4 - Capacity § Chip capacity: 48 B instructions / sec § For fully active threads: ● Single thread: 6 B instructions / sec ● Each of eight threads: 0.75 B instructions / sec § Threads rarely fully active: ● I/O wait ● Processor stall (fetch from memory = 300-400 cycles)10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  11. 11. Developing for T4 § Make it correct § Remove obvious performance issues § Make it scale (correctly)11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  12. 12. Application Correctness Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1312
  13. 13. Debug information § Always use -g § No optimisation flags: ● Full debug ● Lower performance § Optimised binaries: ● Best effort debug ● No/minimal performance impact § Debug what you ship!13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  14. 14. Automatic Error Detection § Static/compile time error detection ● Code Analyzer § Dynamic/runtime memory access error detection ● Discover14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  15. 15. Code Analyzer § Static analysis for common coding errors ● Uninitialised variables, etc. § Compile with: ● -xanalyze=code § View results with: ● code-analyzer <a.out>15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  16. 16. Code Analyzer – example output16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  17. 17. Memory Error Detection - discover § Common memory allocation and use errors: ● Uninitialised memory ● Access past bounds ● Memory leaks § Usage: ● discover <a.out> ● <a.out> ● Default = html output17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  18. 18. Example of discover$ ./a.outERROR 1 (ABR): reading memory beyond array bounds at address0xffbff278 (8 bytes) on the stack at: average() + 0x228 <disc.c:8> 6: for (int i=1; i<=len; i++) 7: { 8:=> total+=array[i]; 9: } _start() + 0xd8 ... double array[20]; ... printf(" Average = %fn", average(array,20) ); 18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  19. 19. Application Performance Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1319
  20. 20. Optimisation – the Basics § No optimisation flags == no optimisation § Good optimisation: -O § Advanced optimisations: ● Guided by profile of appliaction ● Knowledge of deployment systems20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  21. 21. Profiling § Profiling with the performance analyzer ● collect <a.out> ● collect -P <pid> ● analyzer test.1.er § Report generation with spot ● spot <a.out> ● spot -P <pid>21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  22. 22. Performance Analyzer § Demo22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  23. 23. Performance Analyzer § Demo23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  24. 24. Aggressive Optimisation § One stop flag: -fast § Enables multiple optimisations ● Build machine = deployment machine ● Floating point simplification and optimisation ● Pointers to different types do not alias ● Function inlining § Investigate performance gain24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  25. 25. Profile Drives Flag Selection Floating Point § Significant time in floating point computation: ● Floating point simplification ● -fsimple=2 § Significant time in floating point library code: ● Optimised floating point libraries ● -xlibmopt, -xlibmil § Use FP optimisations if performance improves and FP optimisations are acceptable25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  26. 26. Profile Drives Flag Selection Flat profile § Many hot small functions ● At least -xO4 optimisation level ● -xipo for cross-file optimisations § Conditional code or inlining ● Profile feedback ● -xprofile=collect: ● Training run of application ● -xprofile=use:26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  27. 27. Profile Drives Flag Selection Pointers § Pointers inhibit compiler optimisations § Compiler needs more information § restrict qualified pointers in C ● Localised action § Flags: ● -xrestrict (restrict qualified pointers passed into functions) ● -xalias_level=std [C] ● -xalias_level=compatible [C++] ● Actions at file level27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  28. 28. Processor Specific Optimisations § Default: -xtarget=generic often good enough § T4 has useful instructions ● Compare and branch ● Floating point multiply add § One stop flag: -xtarget=T4 § Schedules for T4, uses entire T4 instruction set § Only runs on T4 (or later) processors28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  29. 29. SPARC Instruction Sets29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  30. 30. Multi-threaded Applications Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1330
  31. 31. Multi-thread or Multi-process § Multiprocess: ● Isolation ● Independence Throughput ● Large virtual memory footprint ● Potentially high synchronisation costs § Multithread ● Low synchronisation costs Latency ● Minimal memory footprint31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  32. 32. Multi-threaded Application Development § POSIX threads (C11, C++11) ● Low level: Great control, significant complexity § OpenMP ● High abstraction: Easy to use, flexible § Automatic parallelisation ● Trivial to use: -xautopar -xreduction ● Works best for loop-intensive code (typically FP)32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  33. 33. OpenMP Parallel For § Distributes iterations across CPUs #pragma omp parallel for for (int i=0; i<length; i++) { // Do work }33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  34. 34. OpenMP Tasks § Distributes work across CPUs for (int i=0; i<length; i++) { #pragma omp task { // Do work for task ‘i’ } }34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  35. 35. Parallel Program Correctness § Distributes work across CPUs int total=0; #pragma omp parallel for for (int i=0; i<length; i++) { total += i; } § Data race: Multiple threads updating the same variable35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  36. 36. Thread Analyzer § Instrument application ● Compiler flag: -xinstrument=datarace ● Binary instrumentation: discover -i datarace <a.out> § Gather data: ● collect -r on <a.out> § View data: ● tha tha.1.er36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  37. 37. Thread Analyzer - Example § Demo37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  38. 38. Scaling to Many Threads § Minimise serial code ● Amdahl’s Law § Minimise lock contention § Minimise writes of shared data § Evenly distribute work38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  39. 39. Scaling to Many Threads § Demo39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  40. 40. Limits of Performance § Threads ● vmstat § Instruction Issue Width ● pgstat / cputrack / cpustat / ripc § Bandwidth ● busstat / bw40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  41. 41. Conclusion: Optimising for T4 § Step 1: Profile and remove inefficient code § Step 2: Explore benefits of increased optimisation § Step 3: Identify opportunities for parallelisation § Step 4: Profile and tune parallel code § Step 5: Watch for hitting hardware limits41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  42. 42. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1342
  43. 43. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1343

×