SlideShare a Scribd company logo
1 of 10
The lesser known GCC optimizations
Mark Veltzer
mark@veltzer.net
Using profile information for optimization
The CPU pipe line
● Most CPUs today are built using pipelines
● This enables the CPU to clock at a higher rate
● This also enables the CPU to use more of it's
infrastructure at the same time and so utilizing
the hardware better.
● It also speed things up because computation is
done in parallel.
● This is even more so in multi-core CPUs.
The problems with pipe lines
● The pipe line idea is heavily tied to the idea of branch
prediction
● If the CPU has not yet finished certain instructions
and sees a branch following them then it needs to be
able to guess where the branch will go
● Otherwise the pipeline idea itself becomes
problematic.
● That is because the execution of instructions in
parallel at the CPU level is halted every time there is
a branch.
Hardware prediction
● The hardware itself does a rudimentary form of
prediction.
● The assumption of the hardware is that what
happened before will happen again (order)
● This is OK and hardware does a good job even
without assistance.
● This means that a random program will make the
hardware miss-predict and will cause execution
speed to go down.
● Example: "if(random()<0.5) {"
Software prediction
● The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go using
hints left inside the assembly.
● Special instructions were created for this by the
hardware manufacturers
● The compiler decides whether to use hinted branches
or non hinted ones.
● It will use the hinted ones only when it can guess where
the branch will go.
● For instance: in a loop the branch will tend to go back
to the loop.
The problem of branching
● When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are
the chances that the condition will evaluate to
true.
● Therefor it will usually use a hint-less branch
statement.
● Unless you tell it otherwise.
● There are two ways to tell the compiler which
way the branch will go.
First way - explicit hinting in the
software
● You can use the __builtin_expect construct to
hint at the right path.
● Instead of writing "if(x) {" you write:
"if(__builtin_expect((x),1)) {"
● You can wrap this in a nice macro like the Linux
kernel folk did.
● The compiler will plant a hint to the CPU telling
it that the branch is likely to succeed.
● See example
Second way - using profile
information
● You leave your code alone.
● You compile the code with -fprofile-arcs.
● Then you run it on a typical scenario creating files which
show which way branches went (auxname.gcda for each
file).
● Then you compile your code again with -fbranch-
probabilities which uses the gathered data to plant the
right hints.
● In GCC you must compile the phases using the exact
same flags (otherwise expect problems)
● See example
Second way - PGO
● This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
● This includes the branch prediction we saw
before but also other types of optimization
(reordering of switch cases as an example).
● This is why you should use the more general
flags -fprofile-generate and -fprofile-use which
imply the previous flags and add even more
profile generated optimization.

More Related Content

What's hot

[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling
Douglas Chen
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
slandelle
 

What's hot (20)

Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tips
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
 
[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniques
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
 
C under Linux
C under LinuxC under Linux
C under Linux
 
GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?
 
Flux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance ServersFlux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance Servers
 
Test driving QML
Test driving QMLTest driving QML
Test driving QML
 
Concurrency patterns
Concurrency patternsConcurrency patterns
Concurrency patterns
 
Compiling Under Linux
Compiling Under LinuxCompiling Under Linux
Compiling Under Linux
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
 
30 Minutes To CPAN
30 Minutes To CPAN30 Minutes To CPAN
30 Minutes To CPAN
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Emulation Error Recovery
Emulation Error RecoveryEmulation Error Recovery
Emulation Error Recovery
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-Side
 
Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...
 
Gatling
GatlingGatling
Gatling
 
Three Lessons about Gatling and Microservices
Three Lessons about Gatling and MicroservicesThree Lessons about Gatling and Microservices
Three Lessons about Gatling and Microservices
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 

Viewers also liked

Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler design
Janani Parthiban
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
Alexey Smirnov
 

Viewers also liked (10)

HRM - PM in GCC
HRM - PM in GCCHRM - PM in GCC
HRM - PM in GCC
 
NetBeans para Java, C, C++
NetBeans para Java, C, C++NetBeans para Java, C, C++
NetBeans para Java, C, C++
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler design
 
Introduction to Perl - Day 1
Introduction to Perl - Day 1Introduction to Perl - Day 1
Introduction to Perl - Day 1
 
GCC, GNU compiler collection
GCC, GNU compiler collectionGCC, GNU compiler collection
GCC, GNU compiler collection
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
MinGw Compiler
MinGw CompilerMinGw Compiler
MinGw Compiler
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
 
Deep C
Deep CDeep C
Deep C
 
GCC
GCCGCC
GCC
 

Similar to Gcc opt

Similar to Gcc opt (20)

Multicore
MulticoreMulticore
Multicore
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Oversimplified CA
Oversimplified CAOversimplified CA
Oversimplified CA
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
Java under the hood
Java under the hoodJava under the hood
Java under the hood
 
Realtime
RealtimeRealtime
Realtime
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
On component interface
On component interfaceOn component interface
On component interface
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
 
Spectre and meltdown
Spectre and meltdownSpectre and meltdown
Spectre and meltdown
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
Debugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to LinuxDebugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to Linux
 

More from Mark Veltzer

More from Mark Veltzer (10)

Gcc
GccGcc
Gcc
 
Linux io
Linux ioLinux io
Linux io
 
Linux logging
Linux loggingLinux logging
Linux logging
 
Linux monitoring
Linux monitoringLinux monitoring
Linux monitoring
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
 
Linux tmpfs
Linux tmpfsLinux tmpfs
Linux tmpfs
 
Pthreads linux
Pthreads linuxPthreads linux
Pthreads linux
 
Streams
StreamsStreams
Streams
 
Volatile
VolatileVolatile
Volatile
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplus
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 

Gcc opt

  • 1. The lesser known GCC optimizations Mark Veltzer mark@veltzer.net
  • 2. Using profile information for optimization
  • 3. The CPU pipe line ● Most CPUs today are built using pipelines ● This enables the CPU to clock at a higher rate ● This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. ● It also speed things up because computation is done in parallel. ● This is even more so in multi-core CPUs.
  • 4. The problems with pipe lines ● The pipe line idea is heavily tied to the idea of branch prediction ● If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go ● Otherwise the pipeline idea itself becomes problematic. ● That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.
  • 5. Hardware prediction ● The hardware itself does a rudimentary form of prediction. ● The assumption of the hardware is that what happened before will happen again (order) ● This is OK and hardware does a good job even without assistance. ● This means that a random program will make the hardware miss-predict and will cause execution speed to go down. ● Example: "if(random()<0.5) {"
  • 6. Software prediction ● The hardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. ● Special instructions were created for this by the hardware manufacturers ● The compiler decides whether to use hinted branches or non hinted ones. ● It will use the hinted ones only when it can guess where the branch will go. ● For instance: in a loop the branch will tend to go back to the loop.
  • 7. The problem of branching ● When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. ● Therefor it will usually use a hint-less branch statement. ● Unless you tell it otherwise. ● There are two ways to tell the compiler which way the branch will go.
  • 8. First way - explicit hinting in the software ● You can use the __builtin_expect construct to hint at the right path. ● Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" ● You can wrap this in a nice macro like the Linux kernel folk did. ● The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. ● See example
  • 9. Second way - using profile information ● You leave your code alone. ● You compile the code with -fprofile-arcs. ● Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). ● Then you compile your code again with -fbranch- probabilities which uses the gathered data to plant the right hints. ● In GCC you must compile the phases using the exact same flags (otherwise expect problems) ● See example
  • 10. Second way - PGO ● This whole approach is a subset of a bigger concept called PGO – Profile Generated Optimizations ● This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). ● This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.