GNU GCC - what just a compiler...?


Published on

A quick look-up of overview in reference of GCC that is GNU Compiler Collection.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

GNU GCC - what just a compiler...?

  1. 1. 2012Saket Kr. PathakSoftware developer3D GraphicsGNU GCC - what just a compiler...?A quick lookup of overview in reference of GCC that is GNU Compiler Collection.
  2. 2. GNU GCC - what just a compiler...?Among all of us, we had learnt or studied about compilers and languages supported bythese compilers from last few days (*whatever days might be in multiple of 365 ... :) )and most of us had specific paper entitled as "Compiler Design" ... or any other namehaving similar sense or content of syllabus. Thats really great mate, because I never gotthat type of fortunate chance to study "Compiler Designing" and all. Whatever ... its allmy interest and fortunate time that I found some thing valuable as well as sensible tolearn and study.How many of times any one asked you in which compiler you work ... this questionbelong to all studying professional and stud fellows? Even I had asked a few times, andmost of the time, I replied the answer to the question.And my answer was, GNU GCC or VC++ as per environment matters. Today I realizedthis is quite foolish answer as ... If some one asked you about the flavor of coffee (inreference of dating friend) ... and you replied ... The coffee was good ... what a sense ofhumor ... haaa haa. :)So I realized it and studied my answer in reference of... GNU GCC compiler.Its basically GCC, stands for GNU Compiler Collection. Originally named the GNUC Compiler, because it only handled the C programming language, GCC 1.0 wasreleased in 1987, and the compiler was extended to compile C++ in December of thatyear. Later on it is embed with compilers concern to languages like Objective-C,Objective-C++, Fortran, Java, Ada, and Go etc.Now since its a collection of compilers so It cant be the exact answer as the question isabout the type and version of your compiler. So In reference of C++ we have a specificname for the compiler embed within this GC Collection and that is G++, similarly inreference of C we have a specific name as GNU C Compiler (i.e GCC). A lot of otherlanguages with there compiler are supported by GCC and are listed as bellow:Seq. Language Compiler1. C gcc2. C++ g++3. Objective-C gobjc4. Fortran gFortran5. Java gcj6. Ada GNAT7. Go gccgo8. Pascal gpc9. Mercury Mercury10. PL/I PL/111. VHDL GHDLSaket Kr. Pathak Page 2
  3. 3. GNU GCC - what just a compiler...?Basically GNU Project has some component modules, that I found to discuss here assome add-on, because. I thought how this big list of Compilers is going to handle withina single GNU tool chain for all compiler logic, programming libraries and their syntax.Then I found quite intellectual overview of GCC architecture that is basically categorizedinto 3 hierarchical modules and each compiler includes the following three components:a Front End, a Middle End, and a Back End. GCC compiles one file at a time. A sourcefile goes through all three components one after another. These three components arediscussed in bit details as follows:GCC basic componentsFront-End The purpose of the front end is to read the source file, parse it, and convert it into the standard Abstract Syntax Tree (AST) representation. There is one front end for each programming language. Because of the differences in languages, the format of the generated ASTs is slightly different for each language. The next step after AST generation is the unification step in which the AST tree is converted into a unified form called Generic.Middle-End The middle end part of the compiler takes control. First, the tree is converted into another representation called GIMPLE. In this form, each expression contains no more than three operands, all control flow constructs are represented as combinations of conditional statements and goto operators, arguments of a function call can only be variables. GIMPLE is a convenient representation for optimizing the source code. After GIMPLE, the source code is converted into the Static Single Assignment (SSA) representation i.e. each variable is assigned to only once, but can be used at the right hand side of an expression any time. GCC performs more than 20 different optimizations on SSA trees. The tree is converted back to the GIMPLE form which is then used to generate a Register-Transfer Language (RTL) form of a tree. RTL is a hardware-based representation that corresponds to abstract target architecture with an infinite number of registers. An RTL optimization pass optimizes the tree in the RTL form.Back-End Finally, a GCC back-end generates the assembly code for the target architecture using the RTL representation. Examples of back-ends are x86 back end, mips back end, etc.Saket Kr. Pathak Page 3
  4. 4. GNU GCC - what just a compiler...?Hence from the above short-descriptions we have the overview of all the threecomponents.Front-End:Frontends vary internally, having to produce trees that can be handled by the backend.Currently, the parsers are all hand-coded recursive descent parsers, though there isno reason why a parser generator could not be used for new front-ends in the futurehence, version 2 of the C compiler used a bison based grammar. Here a recursivedescent parser is a top-down parser built from a set of mutually-recursive procedures(or a non-recursive equivalent) where each such procedure usually implements one ofthe production rules of the grammar, whereas GNU bison, commonly known as Bison,is a parser generator that is part of the GNU Project. Bison reads a specification of acontext-free language, warning about any parsing ambiguities, and generates a parser(either in C, C++, or Java) which reads sequences of tokens and decides whether thesequence conforms to the syntax specified by the grammar.Then as it converts the source file to abstract syntax tree which has somewhat differentmeaning for different language front-ends, and front-ends could provide their own treecodes. This was simplified with the introduction of GENERIC and GIMPLE, two newforms of language-independent trees that were introduced with the advent of GCC 4.0.GENERIC is more complex, based on the GCC 3.x Java front-ends intermediaterepresentation. GIMPLE is a simplified GENERIC, in which various constructs arelowered to multiple GIMPLE instructions. The C, C++ and Java front ends produceGENERIC directly in the front end. Other front ends instead have different intermediaterepresentations after parsing and convert these to GENERIC.Middle-end:As it takes control, GENERIC that is an intermediate representation language used as a"middle-end" while compiling source code into executable binaries. A subset, calledGIMPLE, is targeted by all the front-ends of GCC. So it’s responcible for all the codeanalysis and optimization, working independently of both the compiled language andSaket Kr. Pathak Page 4
  5. 5. GNU GCC - what just a compiler...?the target architecture, starting from the GENERIC representation and expanding it toRegister Transfer Language. The GENERIC representation contains only the subset ofthe imperative programming constructs optimized by the middle-end. In transformingthe source code to GIMPLE, complex expressions are split into a three address codeusing temporary variables. This representation was inspired by the SIMPLErepresentation proposed in the McCAT compiler by Laurie J. Hendren for simplifyingthe analysis and optimization of imperative programs.As it performs optimization that occurs during any phase of compilation, however thebulk of optimizations are performed after the syntax and semantic analysis of the front-end and before the code generation of the back-end. The exact set of GCC optimizationsvaries from release to release as it develops, but includes the standard algorithms, suchas loop optimization, jump threading, common sub-expression elimination,instruction scheduling, and so forth. The RTL optimizations are of less importancewith the addition of global SSA-based optimizations on GIMPLE trees. Some of theseoptimizations performed at this level include dead code elimination, partialredundancy elimination, global value numbering, sparse conditionalconstant propagation, and scalar replacement of aggregates. Array dependencebased optimizations such as automatic vectorization and automaticparallelization are also performed.Back-end:The behavior of GCCs back end is partly specified by preprocessor macros and functionsspecific to a target architecture, for instance to define the endianness, word size, andcalling conventions. The front part of the back end uses these to help decide RTLgeneration, so although GCCs RTL is nominally processor-independent, the initialsequence of abstract instructions is already adapted to the target. At any moment, theactual RTL instructions forming the program representation have to comply with themachine description of the target architecture. At the end of compilation, valid RTL isfurther reduced to a strict form in which each instruction refers to real machineregisters and real instructions from the targets instruction set. Forming strict RTL is avery complicated task, done mostly by the register allocation first but completed only bya separate "reloading" phase which must account for the vagaries of all of GCCs targets.The final phase is somewhat anticlimactic, because the patterns to match were generallychosen during reloading, and so the assembly code is simply built by runningsubstitutions of registers and addresses into the strings specifying the instructions.Compatible IDEsIntegrated development environments written for GNU/Linux and some for otheroperating systems support GCC. These include:  Anjuta  Code::Blocks  CodeLite  Dev-C++  Eclipse  geanySaket Kr. Pathak Page 5
  6. 6. GNU GCC - what just a compiler...?  KDevelop  Net Beans  Qt Creator  XcodeHmmm ... So please never tell any one like fool ... "I use to work on GNU GCC" ... bespecific with GNU C Compiler/G++ for C/C++ receptively. A few links I would like tomention here, If any of you people like to read about GNU project in bit detail candefinitely enjoy your time with these all ... But I know ... you are Quite busy ... :) ...whatever ...References: Kr. Pathak Page 6