The lesser known GCC optimizations
Mark Veltzer
mark@veltzer.net
Using profile information for optimization
The CPU pipe line
ā— Most CPUs today are built using pipelines
ā— This enables the CPU to clock at a higher rate
ā— This also enables the CPU to use more of it's
infrastructure at the same time and so utilizing
the hardware better.
ā— It also speed things up because computation is
done in parallel.
ā— This is even more so in multi-core CPUs.
The problems with pipe lines
ā— The pipe line idea is heavily tied to the idea of branch
prediction
ā— If the CPU has not yet finished certain instructions
and sees a branch following them then it needs to be
able to guess where the branch will go
ā— Otherwise the pipeline idea itself becomes
problematic.
ā— That is because the execution of instructions in
parallel at the CPU level is halted every time there is
a branch.
Hardware prediction
ā— The hardware itself does a rudimentary form of
prediction.
ā— The assumption of the hardware is that what
happened before will happen again (order)
ā— This is OK and hardware does a good job even
without assistance.
ā— This means that a random program will make the
hardware miss-predict and will cause execution
speed to go down.
ā— Example: "if(random()<0.5) {"
Software prediction
ā— The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go using
hints left inside the assembly.
ā— Special instructions were created for this by the
hardware manufacturers
ā— The compiler decides whether to use hinted branches
or non hinted ones.
ā— It will use the hinted ones only when it can guess where
the branch will go.
ā— For instance: in a loop the branch will tend to go back
to the loop.
The problem of branching
ā— When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are
the chances that the condition will evaluate to
true.
ā— Therefor it will usually use a hint-less branch
statement.
ā— Unless you tell it otherwise.
ā— There are two ways to tell the compiler which
way the branch will go.
First way - explicit hinting in the
software
ā— You can use the __builtin_expect construct to
hint at the right path.
ā— Instead of writing "if(x) {" you write:
"if(__builtin_expect((x),1)) {"
ā— You can wrap this in a nice macro like the Linux
kernel folk did.
ā— The compiler will plant a hint to the CPU telling
it that the branch is likely to succeed.
ā— See example
Second way - using profile
information
ā— You leave your code alone.
ā— You compile the code with -fprofile-arcs.
ā— Then you run it on a typical scenario creating files which
show which way branches went (auxname.gcda for each
file).
ā— Then you compile your code again with -fbranch-
probabilities which uses the gathered data to plant the
right hints.
ā— In GCC you must compile the phases using the exact
same flags (otherwise expect problems)
ā— See example
Second way - PGO
ā— This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
ā— This includes the branch prediction we saw
before but also other types of optimization
(reordering of switch cases as an example).
ā— This is why you should use the more general
flags -fprofile-generate and -fprofile-use which
imply the previous flags and add even more
profile generated optimization.

Gcc opt

  • 1.
    The lesser knownGCC optimizations Mark Veltzer mark@veltzer.net
  • 2.
    Using profile informationfor optimization
  • 3.
    The CPU pipeline ā— Most CPUs today are built using pipelines ā— This enables the CPU to clock at a higher rate ā— This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. ā— It also speed things up because computation is done in parallel. ā— This is even more so in multi-core CPUs.
  • 4.
    The problems withpipe lines ā— The pipe line idea is heavily tied to the idea of branch prediction ā— If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go ā— Otherwise the pipeline idea itself becomes problematic. ā— That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.
  • 5.
    Hardware prediction ā— Thehardware itself does a rudimentary form of prediction. ā— The assumption of the hardware is that what happened before will happen again (order) ā— This is OK and hardware does a good job even without assistance. ā— This means that a random program will make the hardware miss-predict and will cause execution speed to go down. ā— Example: "if(random()<0.5) {"
  • 6.
    Software prediction ā— Thehardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. ā— Special instructions were created for this by the hardware manufacturers ā— The compiler decides whether to use hinted branches or non hinted ones. ā— It will use the hinted ones only when it can guess where the branch will go. ā— For instance: in a loop the branch will tend to go back to the loop.
  • 7.
    The problem ofbranching ā— When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. ā— Therefor it will usually use a hint-less branch statement. ā— Unless you tell it otherwise. ā— There are two ways to tell the compiler which way the branch will go.
  • 8.
    First way -explicit hinting in the software ā— You can use the __builtin_expect construct to hint at the right path. ā— Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" ā— You can wrap this in a nice macro like the Linux kernel folk did. ā— The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. ā— See example
  • 9.
    Second way -using profile information ā— You leave your code alone. ā— You compile the code with -fprofile-arcs. ā— Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). ā— Then you compile your code again with -fbranch- probabilities which uses the gathered data to plant the right hints. ā— In GCC you must compile the phases using the exact same flags (otherwise expect problems) ā— See example
  • 10.
    Second way -PGO ā— This whole approach is a subset of a bigger concept called PGO – Profile Generated Optimizations ā— This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). ā— This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.