The lesser known GCC optimizations
Mark Veltzer
mark@veltzer.net
Using profile information for optimization
The CPU pipe line
● Most CPUs today are built using pipelines
● This enables the CPU to clock at a higher rate
● This also...
The problems with pipe lines
● The pipe line idea is heavily tied to the idea of branch
prediction
● If the CPU has not ye...
Hardware prediction
● The hardware itself does a rudimentary form of
prediction.
● The assumption of the hardware is that ...
Software prediction
● The hardware manufacturers allow the software layer
to tell the hardware which way a branch could go...
The problem of branching
● When the compiler sees a branch not as part of
a loop (if(condition)) it does not know what are...
First way - explicit hinting in the
software
● You can use the __builtin_expect construct to
hint at the right path.
● Ins...
Second way - using profile
information
● You leave your code alone.
● You compile the code with -fprofile-arcs.
● Then you...
Second way - PGO
● This whole approach is a subset of a bigger
concept called PGO – Profile Generated
Optimizations
● This...
Upcoming SlideShare
Loading in...5
×

Gcc opt

857

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
857
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Gcc opt

  1. 1. The lesser known GCC optimizations Mark Veltzer mark@veltzer.net
  2. 2. Using profile information for optimization
  3. 3. The CPU pipe line ● Most CPUs today are built using pipelines ● This enables the CPU to clock at a higher rate ● This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. ● It also speed things up because computation is done in parallel. ● This is even more so in multi-core CPUs.
  4. 4. The problems with pipe lines ● The pipe line idea is heavily tied to the idea of branch prediction ● If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go ● Otherwise the pipeline idea itself becomes problematic. ● That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.
  5. 5. Hardware prediction ● The hardware itself does a rudimentary form of prediction. ● The assumption of the hardware is that what happened before will happen again (order) ● This is OK and hardware does a good job even without assistance. ● This means that a random program will make the hardware miss-predict and will cause execution speed to go down. ● Example: "if(random()<0.5) {"
  6. 6. Software prediction ● The hardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. ● Special instructions were created for this by the hardware manufacturers ● The compiler decides whether to use hinted branches or non hinted ones. ● It will use the hinted ones only when it can guess where the branch will go. ● For instance: in a loop the branch will tend to go back to the loop.
  7. 7. The problem of branching ● When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. ● Therefor it will usually use a hint-less branch statement. ● Unless you tell it otherwise. ● There are two ways to tell the compiler which way the branch will go.
  8. 8. First way - explicit hinting in the software ● You can use the __builtin_expect construct to hint at the right path. ● Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" ● You can wrap this in a nice macro like the Linux kernel folk did. ● The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. ● See example
  9. 9. Second way - using profile information ● You leave your code alone. ● You compile the code with -fprofile-arcs. ● Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). ● Then you compile your code again with -fbranch- probabilities which uses the gathered data to plant the right hints. ● In GCC you must compile the phases using the exact same flags (otherwise expect problems) ● See example
  10. 10. Second way - PGO ● This whole approach is a subset of a bigger concept called PGO – Profile Generated Optimizations ● This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). ● This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×