SlideShare a Scribd company logo
The lesser known GCC optimizations
Mark Veltzer
mark@veltzer.net
Using profile information for optimization
The CPU pipe line
● Most CPUs today are built using pipelines
● This enables the CPU to clock at a higher rate
● This also enables the CPU to use more of it's
infrastructure at the same time and so utilizing
the hardware better.
● It also speed things up because computation is
done in parallel.
● This is even more so in multi-core CPUs.
The problems with pipe lines
● The pipe line idea is heavily tied to the idea of branch
prediction
● If the CPU has not yet finished certain instructions
and sees a branch following them then it needs to be
able to guess where the branch will go
● Otherwise the pipeline idea itself becomes
problematic.
● That is because the execution of instructions in
parallel at the CPU level is halted every time there is
a branch.
Hardware prediction
● The hardware itself does a rudimentary form of
prediction.
● The assumption of the hardware is that what
happened before will happen again (order)
● This is OK and hardware does a good job even
without assistance.
● This means that a random program will make the
hardware miss-predict and will cause execution
speed to go down.
● Example: "if(random()<0.5) {"
Software prediction
● The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go using
hints left inside the assembly.
● Special instructions were created for this by the
hardware manufacturers
● The compiler decides whether to use hinted branches
or non hinted ones.
● It will use the hinted ones only when it can guess where
the branch will go.
● For instance: in a loop the branch will tend to go back
to the loop.
The problem of branching
● When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are
the chances that the condition will evaluate to
true.
● Therefor it will usually use a hint-less branch
statement.
● Unless you tell it otherwise.
● There are two ways to tell the compiler which
way the branch will go.
First way - explicit hinting in the
software
● You can use the __builtin_expect construct to
hint at the right path.
● Instead of writing "if(x) {" you write:
"if(__builtin_expect((x),1)) {"
● You can wrap this in a nice macro like the Linux
kernel folk did.
● The compiler will plant a hint to the CPU telling
it that the branch is likely to succeed.
● See example
Second way - using profile
information
● You leave your code alone.
● You compile the code with -fprofile-arcs.
● Then you run it on a typical scenario creating files which
show which way branches went (auxname.gcda for each
file).
● Then you compile your code again with -fbranch-
probabilities which uses the gathered data to plant the
right hints.
● In GCC you must compile the phases using the exact
same flags (otherwise expect problems)
● See example
Second way - PGO
● This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
● This includes the branch prediction we saw
before but also other types of optimization
(reordering of switch cases as an example).
● This is why you should use the more general
flags -fprofile-generate and -fprofile-use which
imply the previous flags and add even more
profile generated optimization.

More Related Content

What's hot

Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tips
Viller Hsiao
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
gjcross
 
[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling
Douglas Chen
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniques
Tuomas Suutari
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
Isuru Perera
 
C under Linux
C under LinuxC under Linux
GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?
Saket Pathak
 
Flux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance ServersFlux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance Servers
Emery Berger
 
Test driving QML
Test driving QMLTest driving QML
Test driving QML
Artem Marchenko
 
Concurrency patterns
Concurrency patternsConcurrency patterns
Concurrency patterns
Aaron Schlesinger
 
Compiling Under Linux
Compiling Under LinuxCompiling Under Linux
Compiling Under Linux
PierreMASURE
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
Gerald Muecke
 
30 Minutes To CPAN
30 Minutes To CPAN30 Minutes To CPAN
30 Minutes To CPAN
daoswald
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
ScyllaDB
 
Emulation Error Recovery
Emulation Error RecoveryEmulation Error Recovery
Emulation Error Recovery
somnathb1
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-Side
Tim Burks
 
Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...
Srinivasan Venkataramanan
 
Gatling
GatlingGatling
Three Lessons about Gatling and Microservices
Three Lessons about Gatling and MicroservicesThree Lessons about Gatling and Microservices
Three Lessons about Gatling and Microservices
Dragos Manolescu
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
slandelle
 

What's hot (20)

Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tips
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
 
[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniques
 
Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
 
C under Linux
C under LinuxC under Linux
C under Linux
 
GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?GNU GCC - what just a compiler...?
GNU GCC - what just a compiler...?
 
Flux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance ServersFlux: A Language for Programming High-Performance Servers
Flux: A Language for Programming High-Performance Servers
 
Test driving QML
Test driving QMLTest driving QML
Test driving QML
 
Concurrency patterns
Concurrency patternsConcurrency patterns
Concurrency patterns
 
Compiling Under Linux
Compiling Under LinuxCompiling Under Linux
Compiling Under Linux
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
 
30 Minutes To CPAN
30 Minutes To CPAN30 Minutes To CPAN
30 Minutes To CPAN
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Emulation Error Recovery
Emulation Error RecoveryEmulation Error Recovery
Emulation Error Recovery
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-Side
 
Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...Top five reasons why every DV engineer will love the latest systemverilog 201...
Top five reasons why every DV engineer will love the latest systemverilog 201...
 
Gatling
GatlingGatling
Gatling
 
Three Lessons about Gatling and Microservices
Three Lessons about Gatling and MicroservicesThree Lessons about Gatling and Microservices
Three Lessons about Gatling and Microservices
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 

Viewers also liked

HRM - PM in GCC
HRM - PM in GCCHRM - PM in GCC
HRM - PM in GCC
Muhammad Danish Azad
 
NetBeans para Java, C, C++
NetBeans para Java, C, C++NetBeans para Java, C, C++
NetBeans para Java, C, C++
Manuel Antonio
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler design
Janani Parthiban
 
Introduction to Perl - Day 1
Introduction to Perl - Day 1Introduction to Perl - Day 1
Introduction to Perl - Day 1
Dave Cross
 
GCC, GNU compiler collection
GCC, GNU compiler collectionGCC, GNU compiler collection
GCC, GNU compiler collection
Alberto Bustamante Reyes
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
Sławomir Zborowski
 
MinGw Compiler
MinGw CompilerMinGw Compiler
MinGw Compiler
Avnish Patel
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
Alexey Smirnov
 
Deep C
Deep CDeep C
Deep C
Olve Maudal
 
GCC
GCCGCC

Viewers also liked (10)

HRM - PM in GCC
HRM - PM in GCCHRM - PM in GCC
HRM - PM in GCC
 
NetBeans para Java, C, C++
NetBeans para Java, C, C++NetBeans para Java, C, C++
NetBeans para Java, C, C++
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler design
 
Introduction to Perl - Day 1
Introduction to Perl - Day 1Introduction to Perl - Day 1
Introduction to Perl - Day 1
 
GCC, GNU compiler collection
GCC, GNU compiler collectionGCC, GNU compiler collection
GCC, GNU compiler collection
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
MinGw Compiler
MinGw CompilerMinGw Compiler
MinGw Compiler
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
 
Deep C
Deep CDeep C
Deep C
 
GCC
GCCGCC
GCC
 

Similar to Gcc opt

Multicore
MulticoreMulticore
Multicore
Mark Veltzer
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
PrabhanshuKatiyar1
 
Oversimplified CA
Oversimplified CAOversimplified CA
Oversimplified CA
PrabhanshuKatiyar1
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
cscpconf
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
csandit
 
Java under the hood
Java under the hoodJava under the hood
Java under the hood
Vachagan Balayan
 
Cuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick startCuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick start
LloydMoore
 
Realtime
RealtimeRealtime
Realtime
Mark Veltzer
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
💻 Anton Gerdelan
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
Gerard Klijs
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
Tanel Poder
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Anne Nicolas
 
On component interface
On component interfaceOn component interface
On component interface
Laurence Chen
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Spectre and meltdown
Spectre and meltdownSpectre and meltdown
Spectre and meltdown
Aditya Kamat
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
BertrandDrouvot
 
Concurrency
ConcurrencyConcurrency
Concurrency
Sri Prasanna
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
Matthew Dennis
 

Similar to Gcc opt (20)

Multicore
MulticoreMulticore
Multicore
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Oversimplified CA
Oversimplified CAOversimplified CA
Oversimplified CA
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
Java under the hood
Java under the hoodJava under the hood
Java under the hood
 
Cuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick startCuda Without a Phd - A practical guick start
Cuda Without a Phd - A practical guick start
 
Realtime
RealtimeRealtime
Realtime
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
On component interface
On component interfaceOn component interface
On component interface
 
Introduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimizationIntroduction to Parallelization and performance optimization
Introduction to Parallelization and performance optimization
 
Spectre and meltdown
Spectre and meltdownSpectre and meltdown
Spectre and meltdown
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 

More from Mark Veltzer

Gcc
GccGcc
Linux io
Linux ioLinux io
Linux io
Mark Veltzer
 
Linux logging
Linux loggingLinux logging
Linux logging
Mark Veltzer
 
Linux monitoring
Linux monitoringLinux monitoring
Linux monitoring
Mark Veltzer
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
Mark Veltzer
 
Linux tmpfs
Linux tmpfsLinux tmpfs
Linux tmpfs
Mark Veltzer
 
Pthreads linux
Pthreads linuxPthreads linux
Pthreads linux
Mark Veltzer
 
Streams
StreamsStreams
Streams
Mark Veltzer
 
Volatile
VolatileVolatile
Volatile
Mark Veltzer
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplus
Mark Veltzer
 

More from Mark Veltzer (10)

Gcc
GccGcc
Gcc
 
Linux io
Linux ioLinux io
Linux io
 
Linux logging
Linux loggingLinux logging
Linux logging
 
Linux monitoring
Linux monitoringLinux monitoring
Linux monitoring
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
 
Linux tmpfs
Linux tmpfsLinux tmpfs
Linux tmpfs
 
Pthreads linux
Pthreads linuxPthreads linux
Pthreads linux
 
Streams
StreamsStreams
Streams
 
Volatile
VolatileVolatile
Volatile
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplus
 

Recently uploaded

BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
Figma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdfFigma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdf
Management Institute of Skills Development
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
LINUS PROJECTS (INDIA)
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 

Recently uploaded (20)

BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
Figma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdfFigma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdf
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 

Gcc opt

  • 1. The lesser known GCC optimizations Mark Veltzer mark@veltzer.net
  • 2. Using profile information for optimization
  • 3. The CPU pipe line ● Most CPUs today are built using pipelines ● This enables the CPU to clock at a higher rate ● This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. ● It also speed things up because computation is done in parallel. ● This is even more so in multi-core CPUs.
  • 4. The problems with pipe lines ● The pipe line idea is heavily tied to the idea of branch prediction ● If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go ● Otherwise the pipeline idea itself becomes problematic. ● That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.
  • 5. Hardware prediction ● The hardware itself does a rudimentary form of prediction. ● The assumption of the hardware is that what happened before will happen again (order) ● This is OK and hardware does a good job even without assistance. ● This means that a random program will make the hardware miss-predict and will cause execution speed to go down. ● Example: "if(random()<0.5) {"
  • 6. Software prediction ● The hardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. ● Special instructions were created for this by the hardware manufacturers ● The compiler decides whether to use hinted branches or non hinted ones. ● It will use the hinted ones only when it can guess where the branch will go. ● For instance: in a loop the branch will tend to go back to the loop.
  • 7. The problem of branching ● When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. ● Therefor it will usually use a hint-less branch statement. ● Unless you tell it otherwise. ● There are two ways to tell the compiler which way the branch will go.
  • 8. First way - explicit hinting in the software ● You can use the __builtin_expect construct to hint at the right path. ● Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" ● You can wrap this in a nice macro like the Linux kernel folk did. ● The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. ● See example
  • 9. Second way - using profile information ● You leave your code alone. ● You compile the code with -fprofile-arcs. ● Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). ● Then you compile your code again with -fbranch- probabilities which uses the gathered data to plant the right hints. ● In GCC you must compile the phases using the exact same flags (otherwise expect problems) ● See example
  • 10. Second way - PGO ● This whole approach is a subset of a bigger concept called PGO – Profile Generated Optimizations ● This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). ● This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.