Your SlideShare is downloading. ×
  • Like
Gcc opt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Gcc opt

  • 803 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
803
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The lesser known GCC optimizations Mark Veltzer mark@veltzer.net
  • 2. Using profile information for optimization
  • 3. The CPU pipe line ● Most CPUs today are built using pipelines ● This enables the CPU to clock at a higher rate ● This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. ● It also speed things up because computation is done in parallel. ● This is even more so in multi-core CPUs.
  • 4. The problems with pipe lines ● The pipe line idea is heavily tied to the idea of branch prediction ● If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go ● Otherwise the pipeline idea itself becomes problematic. ● That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.
  • 5. Hardware prediction ● The hardware itself does a rudimentary form of prediction. ● The assumption of the hardware is that what happened before will happen again (order) ● This is OK and hardware does a good job even without assistance. ● This means that a random program will make the hardware miss-predict and will cause execution speed to go down. ● Example: "if(random()<0.5) {"
  • 6. Software prediction ● The hardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. ● Special instructions were created for this by the hardware manufacturers ● The compiler decides whether to use hinted branches or non hinted ones. ● It will use the hinted ones only when it can guess where the branch will go. ● For instance: in a loop the branch will tend to go back to the loop.
  • 7. The problem of branching ● When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. ● Therefor it will usually use a hint-less branch statement. ● Unless you tell it otherwise. ● There are two ways to tell the compiler which way the branch will go.
  • 8. First way - explicit hinting in the software ● You can use the __builtin_expect construct to hint at the right path. ● Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" ● You can wrap this in a nice macro like the Linux kernel folk did. ● The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. ● See example
  • 9. Second way - using profile information ● You leave your code alone. ● You compile the code with -fprofile-arcs. ● Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). ● Then you compile your code again with -fbranch- probabilities which uses the gathered data to plant the right hints. ● In GCC you must compile the phases using the exact same flags (otherwise expect problems) ● See example
  • 10. Second way - PGO ● This whole approach is a subset of a bigger concept called PGO – Profile Generated Optimizations ● This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). ● This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.