Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How it's made: C++ compilers (GCC)

Presentation slides about internals of GCC C++ compiler. It covers transformation from source code to output binary, compiler optimizations, register transfer language, etc.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

How it's made: C++ compilers (GCC)

  1. 1. How it's madeC++ compilers Createdby SławomirZborowski
  2. 2. - MariuszMax Kolonko Average journalistdocuments whathas happened. Good journalist explains why thathappened.
  3. 3. Agenda GCC Preprocessor Compiler Front-end, AST Middle-end, optimization passes Back-end, RTL Linker Tests
  4. 4. GCC - compilation controller WhyGCC? Because we use it Multiple languages:C, C++, Fortran, Java, Mercury, … Multiple architectures:ARM, MN10300, PDP-10, AVR32, …
  5. 5. Before we go Whathappens when developers design a logo? "Do whatyou do bestand outsource the rest"
  6. 6. GCC - compilation controller cc1- preprocessor and compiler Output→ AT&T/Intel assembler file (*.s) Use ­Eflag to preprocess only Use ­Sflag to preprocess and compile as- assembler (from binutils) Output→ objectfile (*.o) Use ­cflag to ignore the linker collect2- linker Output→ shared object/ELF(*.so, *)
  7. 7. The preprocessor Entry-point Almostno safety C++ standard defines interresting requirements Min. #includenesting levels - 15 Min. number of macros in one translation unit- 4095 Min. number of character in line - 4096 GCC preprocessor is limited bymemory
  8. 8. Preprocessor on steroids People use preprocessor to do varietyof things Usually, itis justbad habit Some people uses more than one preprocessor :-) @Gynvael Coldwind 1floatfast_sin(intdeg){ 2 staticconstfloatsin_table[]={<?php 3 for($i=0;$i<359;$i++) 4 echo(sin($i).","); 5 echo(sin($i)); 6 ?>}; 7 returnsin_table[deg%360]; 8}; php my.c | gcc ­x c ­ Hmm... good idea, butkind of naïve. Surelywe can do better!
  9. 9. Let's replace the preprocessor Example motivation:diab &#pragma once
  10. 10. Time to hack 1#!/usr/bin/envpython 2importrandom,re,subprocess,sys;x=sys.argv 3 4try: 5 i,o=x[x.index('-D_GNU_SOURCE')+1],x[x.index('-o')+1]+'_' 6'.hp?p?$',i):raiseRuntimeError 7 g='_{0}_{1}'.format(random.randrange(2**32),i.replace('.','_')) 8 withopen(i)ash,open(o,'w')asf: 9 f.write('#ifndef{0}n#define{0}n{1}n#endif'.format( 10 g,'#pragmaonce',''))) 11 n=[[e,o][e==i]foreinx[1:]] 12except(ValueError,RuntimeError):n=x[1:][:] 13p=subprocess.Popen(['/usr/lib/gcc/x86_64-linux-gnu/4.8/cc1plus']+n) 14p.communicate();sys.exit(p.returncode)
  11. 11. Let's use it! g++ ­no­integrated­cpp ­std=c++11 ­ B/path/to/script example.cpp 1#ifndef_3121294961_example_cpp 2#define_3121294961_example_cpp 3 4template<typenameT> 5Tadd(Ta,Tb){returna+b;} 6 7#endif 8 9intmain(void){returnadd(1,2);} Okay, getback to the topic
  12. 12. cc1 - From input to output IN → Front-end → Middle-end → Back-end → OUT
  13. 13. Frontend overview C/C++ → AST → Generic Itallstarts with lexer &parser Immediate representation - AST Atthe end - language-independent
  14. 14. Parsing Simple example: Basic lexers base on regular expressions Statements are tokenized x can be mapped to {id, 1}, where 1 is an index in symbol table a, b → {id, 2}, {id, 3} +, *can be mapped to token table 3 can be mapped to constanttable The lexer does notdefine anyorder It's justtokenization 1x=a+b*3;
  15. 15. AST Eventuallyparser emits AST AST stands for AbstractSyntax Tree Example expression:a + (b * 3)
  16. 16. AST
  17. 17. AST
  18. 18. AST
  19. 19. AST
  20. 20. AST
  21. 21. AST
  22. 22. AST
  23. 23. AST
  24. 24. AST
  25. 25. AST
  26. 26. AST
  27. 27. AST
  28. 28. AST
  29. 29. AST
  30. 30. AST
  31. 31. Semantic analysis Compiler needs to check syntax tree with language definition This analysis saves type information in symboltable Type checking is also performed (e.g. array[1.f]is ill- formed) Implicitconversions are likelyto happen
  32. 32. Symbol table GCC mustrecord variables in so-called symboltable Itcontains information abouttype, storage, scope, etc. Itis builtincrementallybyanalysing phases Scopes are veryimportant
  33. 33. Generic The code is correctin regards to syntax &language semantics Itis also stored as AST Although AST is abstract, itis notgeneric enough Language-specific AST nodes are replaced Rightfrom now, middle-end kicks in
  34. 34. Middle-end overview → GIMPLE → SSA → Optimize → RTL → Generic → GIMPLE SSAtransformation Optimization passes Un-SSAtransformation RTL, suitable for back-end
  35. 35. GIMPLE Modified GENERIC form Only3 operands per expression Why3? Three-address instructions Function calls are exception No nested function calls Some controlstructures are represented with ifs and gotos
  36. 36. GIMPLE Too complex expressions are breaked down to expression temporaries Example: a = b + c + d becomes T1 = b + c a = T1 + d
  37. 37. GIMPLE Another example: a = b ? c : d becomes if (b == 1)   T1 = c else   T1 = d a = T1
  39. 39. Static Single Assignment (SSA) Everyvariable is assigned onlyonce Can be used as a read-onlyvalue multiple times In ifstatemens merging takes place PHIfunction GCC performs over 20 optimizations on SSAtree
  40. 40. GIMPLE vs SSA 1a=3; 2b=9; 3c=a+b; 4a=b+1; 5d=a+c; 6returnd; 1a_1=3; 2b_2=9; 3c_3=a_1+b_2; 4a_4=b_2+1; 5d_5=a_4+c_3; 6_6=d_5; 7return_6;
  41. 41. Optimizations Whyoptimize? Whyin this phase? Requirements Optimization mustnotchange program behaviour Itmustimprove program overallperformance Compilation time mustbe keptreasonable Engineering efforthas to be feasible
  42. 42. Optimizations & middle-end Dead code ellimination Constantpropagation Strength reduction Tailrecursion ellimination Inlining Vectorization
  43. 43. Dead code elimination The task is simple:simplyremove unreachable code Simplifyif statements with constantconditions Remove exception handling constructs surrounding non- throwing code …
  44. 44. Constant propagation 1a_1=3; 2b_2=9; 3c_3=a_1+b_2; 4a_4=b_2+1; 5d_5=a_4+c_3; 6_6=d_5; 7return_6; 1a_1=3; 2b_2=9; 3c_3=12; 4a_4=b_2+1; 5d_5=a_4+c_3; 6_6=d_5; 7return_6; 1a_1=3; 2b_2=9; 3c_3=12; 4a_4=10; 5d_5=a_4+c_3; 6_6=d_5; 7return_6; 1a_1=3; 2b_2=9; 3c_3=12; 4a_4=10; 5d_5=22; 6_6=d_5; 7return_6; Itcould justbe SSAhelps here a lot 1return22;
  45. 45. Strength reduction Goal:reduce the strength of an expression Example: 1unsignedfoo(unsigneda){ 2 returna/4; 3} 1shrl $2,%edi …and less intuitive one: 1unsignedbar(unsigneda){ 2 returna*9+17; 3} 1leal 17(%rdi,%rdi,8),%eax
  46. 46. Tail recursion elimination 1intfactorial(intx){ 2 return(x>1) 3 ?x*factorial(x-1) 4 :1; 5} 1intfactorial(intx){ 2 intresult=1; 3 while(x>1){ 4 result*=x--; 5 } 6 returnresult; 7} Why? Recursion running in constantspace.
  47. 47. Inlining Based on mem-space/time costs Notpossible when: ­fno­inlineswitch is used conflicting __attribute__`s Forbidden when: callto alloca, setjmp, or longjmp non-localgoto instruction recursion variadic argumentlist
  48. 48. Vectorization One of GCC's concurrencymodel Compiler uses sse, sse2, sse3, …to make program faster Enabled by­O3or ­ftree­vectorize There are more than 25 cases where vectorization can be done e.g. backward access, multidimensionalarrays, conditions, nested loops, … With ­ftree­vectorizer­verbose=Nswitch, vectorization can be debugged
  49. 49. Vectorization 1inta[256],b[256],c[256]; 2voidfoo(){ 3 for(inti=0;i<256;i++){ 4 a[i]=b[i]+c[i]; 5 } 6} Scalar: 1.L3: 2 movl -4(%rbp),%eax 3 cltq 4 movl b(,%rax,4),%edx 5 movl -4(%rbp),%eax 6 cltq 7 movl c(,%rax,4),%eax 8 addl %eax,%edx 9 movl -4(%rbp),%eax 10 cltq 11 movl %edx,a(,%rax,4) 12 addl $1,-4(%rbp) Vectorized: 1.L3: 2 movdqa b(%rax),%xmm0 3 addq $16,%rax 4 paddd c-16(%rax),%xmm0 5 movdqa %xmm0,a-16(%rax) 6 cmpq $1024,%rax 7 jne .L3
  50. 50. Outsmarting GCC 1unsignedintfoo(unsignedchari){ 2 returni|(i<<8)|(i<<16)|(i<<24); 3}//3*SHL,3*OR Human GCC 5unsignedintbar(unsignedchari){ 6 unsignedintj=i|(i<<8); 7 returnj|(j<<16); 8}//2*SHL,2*OR 10unsignedintbaz(unsignedchari){ 11 returni*0x01010101; 12}//1*IMUL
  51. 51. Outsmarting GCC 1intfsincos_(doublearg){ 2 returnsin(arg)+cos(arg); 3} 1leaq 8(%rsp),%rdi 2movq %rsp,%rsi 3call sincos 4movsd 8(%rsp),%xmm0 5addsd (%rsp),%xmm0 6addq $24,%rsp 7cvttsd2si %xmm0,%eax Onlyon architectures with FPU Actually, this is FPU+ SSE
  52. 52. Outsmarting GCC Which wayis the bestto resetaccumulator? 1mov $0,%eax 2add $0,%eax 3sub %eax,%eax 4xor %eax,%eax #b800000000 #83e000 #2900 #3100 Answer:sub. Did you know it? GCCdid.
  53. 53. Outsmarting GCC Compilers are gootatoptimization Letthem optimize Programmer should focus on writing readable code
  54. 54. Back-end
  55. 55. Register Transfer Language (RTL) Inspired byLisp Itdescribes instructions to be output
  56. 56. GIMPLE → RTL GIMPLE: 1unsignedintbaz(unsignedchar)(unsignedchari){ 2 unsignedintD.2202; 3 intD.2203; 4 intD.2204; 5 6 D.2203=(int)i; 7 D.2204=D.2203*16843009; 8 D.2202=(unsignedint)D.2204; 9 returnD.2202; 10} RTL: (insn#002(parallel[ (set(reg:SI0ax[orig:60D.2207][60]) (mult:SI(reg:SI0ax[orig:59D.2207][59]) (const_int16843009[0x1010101]))) (clobber(reg:CC17flags)) ])rtl.cpp:2#{*mulsi3_1} (expr_list:REG_DEAD(reg:SI0ax[orig:59D.2207][59]) (expr_list:REG_UNUSED(reg:CC17flags) (nil))))
  57. 57. RTL Objects There are multiple types of RTLobjects: Expressions Integers, wide integers Strings Vectors
  59. 59. Register allocation The task:ensure thatmachine resources (registers) are used optimally. There are two types of register allocators: LocalRegister Allocator GlobalRegister Allocator Since GCC 4.8 messyreload.c was replaced with LRA
  60. 60. Register allocation The problem:interference-graph-coloring Colors == registers Assign registers (colors) to temporaries Finding k-coloring graph is NP-complete, so GCC uses heurestic method In case of failure some of variables are stored in memory Two variables can share registers onlywhen onlyone of them live atanypointof the program
  61. 61. Register allocation - example Instructions Live variables a b = a + 2 b, a c = b *b a, c b = c + 1 a, b return a *b
  62. 62. We can mess with compiler 1registerintvariableasm("rbx"); However…this is nota good idea (unless you have a verygood reason) Variable can be optimized Register stillcan be used byother variables
  63. 63. Instruction scheduling Goal:minimize length of the criticalpath Goal:maximize parallelism opportunities How does itwork? 1. Build the data dependence graph 2. Calculate priorities for each instruction 3. Iterativelyschedule readyinstructions Used before and after register allocation
  64. 64. Instruction scheduling Works wellin case of unrelated expressions 1a=x+1; 2b=y+2; 3c=z+3; IF RF EX ME WB Software pipelining IF RF EX ME WB IF RF EX ME WB
  65. 65. Instruction selection GCC picks instruction from the setavailable for given target Each instruction has its cost Addressing mode is also selected
  66. 66. RTL → ASM Registers - allocated Expressions - ordered Instructions - selected
  67. 67. RTL Optimizations Optimizations performed on RTLform
  68. 68. Rematerialization Re-compute value of particular variable multiple times Smaller register pressure, more CPUwork Should happen onlywhen time of the computation is lesser than load Expression mustnothave side effects Experimentalresults show 1-6%execution performance _
  69. 69. Common Subexpression Elimination Finds subexpressions thatoccurs in multiple places Decides whether additionaltemporarywould make program faster Example: Becomes: CSE works also with functions 1k=i+j+10; 2r=i+j+30; 1movl 8(%rsp), %esi 2addl 12(%rsp),%esi 3xorl %eax, %eax 4leal 30(%rsi),%edx 5addl $10, %esi
  70. 70. Loop-invariant code motion Move variables thatdo notdepend on the loop outside its body Benefits:less calculations &constants in registers Example: Becomes: Can introduce high register pressure → rematerialization 1for(inti=0;i<n;i++){ 2 x=y+z; 3 a[i]=6*i+x*x; 4} 1x=y+z; 2t1=x*x; 3for(inti=0;i<n;i++){ 4 a[i]=6*i+t1; 5}
  71. 71. More RTL optimizations Jump bypassing Controlflow graph cleanup Loop optimizations Instruction combination …
  72. 72. Linker (collect2) collect2reallyuses ld Performs consolidation of multiple objectfiles gold- better linker, butonlyfor ELF
  73. 73. Link time optimizations GCC optimizations are constrained to single translation unit When LTO is enabled objectfiles include GIMPLE trees Localoptimizations are applied globally: Dead code ellimination Constantpropagation …
  74. 74. GCC test suites Gcc is tested byover 19k of tests Testsuites employDejaGnu, Tcl, and expecttools Each testis a C file with specialcomments Testresults are PASS:the testpassed as expected XPASS:the testunexpectedlypassed FAIL:the testunexpectedlyfailed XFAIL:the testfailed as expected ERROR:the testsuite detected an error WARNING:the testsuite detected a possible problem UNSUPPORTED:the testis notsupported on this platform
  75. 75. string-1.C 1//Testlocationofdiagnosticsforinterpretingstrings. Bug17964. 2//Origin:JosephMyers<> 3//{dg-docompile} 4 5constchar*s="q";//{dg-error"unknownescapesequence"} 6 7constchar*t=" ";//{dg-error"unknownescapesequence"} 8 9constchar*u="";
  76. 76. ambig2.C 1//PRc++/57948 2 3structBase{ }; 4structDerived:Base 5{ 6 structDerived2:Base 7 { 8 structConvertibleToBothDerivedRef 9 { 10 operatorDerived&(); 11 operatorDerived2&(); 12 voidbind_lvalue_to_conv_lvalue_ambig(ConvertibleToBothDerivedRef both) 13 { 14 Base&br1=both;//{dg-error"ambiguous"} 15 } 16 }; 17 }; 18};
  77. 77. dependend-name3.C 1//{dg-docompile} 2 3//Dependentarraysofinvalidsizegenerateappropriateerrormessages 4 5template<intI>structA 6{ 7 staticconstintzero=0; 8 staticconstintminus_one=-1; 9}; 10 11template<intN>structB 12{ 13 intx[A<N>::zero]; //{dg-error"zero"} 14 inty[A<N>::minus_one]; //{dg-error"negative"} 15}; 16 17B<0>b;
  78. 78. DG commands dg­do preprocess, compile, assemble, link, run dg­options dg­error dg­warning dg­bogus …
  79. 79. Auxilliary tools Tools everydeveloper should be aware of… nm- helps examinating symbols in objectfiles objdump- displays information from objectfiles c++filt- demangles C++ symbols addr2line- converts offsets to lines and filenames …, see binutils
  80. 80. Bonus slide Which came first, the chicken or the egg? Firstcompilers were written in…assembly Itwas challenging because of poor hardware resources Itis believed thatfirstcompiler was created byGrace Hopper, for A-0 Firstcomplete compiler - FORTRAN, IBM, 1957 Firstmulti-architecture compiler - COBOL, 1960
  81. 81. Register Allocation - Graph coloring Compilers - Principles, Techniques &Tools Resources Fromsources tobinary,RedHat mag GCC:howtopreparea test case Parallel Programming andOptimizationwithGCC Sourcecodeoptimization RegisterrematerializationinGCC [1] [2] [3] TreegionInstructionScheduling inGCC Introductiontoinstructionscheduling Addressing modeselectioninGCC Link TimeOptimizationinGCC [1]
  82. 82. areThereAnyQuestions() ? pleaseAsk() : thankYouForYourAttention();

    Be the first to comment

    Login to see the comments

  • apaznikov

    Jul. 3, 2014
  • piliwilliam

    Oct. 30, 2014
  • pzz2011

    Jan. 22, 2015
  • WeitaoHe

    May. 2, 2015
  • tthtlc

    Feb. 22, 2016
  • hankpeng

    Nov. 24, 2016
  • sanmathigb

    Mar. 19, 2019

Presentation slides about internals of GCC C++ compiler. It covers transformation from source code to output binary, compiler optimizations, register transfer language, etc.


Total views


On Slideshare


From embeds


Number of embeds