4 compiler lab - Syntax Ana


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The syntax analyzer would first look at the string "int", check it against defined keywords, and find that it is a type for integers. *The analyzer would then look at the next token as an identifier, and check to make sure that it has used a valid identifier name.It would then look at the next token. Because it is an opening parenthesis it will treat "main" as a function, instead of a declaration of a variable if it found a semicolon or the initialization of an integer variable if it found an equals sign.After the opening parenthesis it would find a closing parenthesis, meaning that the function has 0 parameters.Then it would look at the next token and see it was an opening brace, so it would think that this was the implementation of the function main, instead of a declaration of main if the next token had been a semicolon, even though you can not declare main in c++. It would probably create a counter also to keep track of the level of the statement blocks to make sure the braces were in pairs. *After that it would look at the next token, and probably not do anything with it, but then it would see the :: operator, and check that "std" was a valid namespace.Then it would see the next token "cout" as the name of an identifier in the namespace "std", and see that it was a template.The analyzer would see the << operator next, and so would check that the << operator could be used with cout, and also that the next token could be used with the << operator.The same thing would happen with the next token after the ""hello world"" token. Then it would get to the "std" token again, look past it to see the :: operator token and check that the namespace existed again, then check to see if "endl" was in the namespace.Then it would see the semicolon and so it would see that as the end of the statement.Next it would see the keyword return, and then expect an integer value as the next token because main returns an integer, and it would find 0, which is an integer.Then the next symbol is a semicolon so that is the end of the statement.The next token is a closing brace so that is the end of the function. And there are no more tokens, so if the syntax analyzer did not find any errors with the code, it would send the tokens to the compiler so that the program could be converted to machine language.This is a simple view of syntax analysis, and real syntax analyzers do not really work this way, but the idea is the same.
  • 4 compiler lab - Syntax Ana

    2. 2. LEXICAL ANALYSIS SUMMARY 1. Start New Token 2. Read 1st character to start recognizing its type according to the algorithm specified in 3. Slide 3 3. Pass its Token (Lexeme Type) and Value Attribute  send to Parser 4. Repeat Steps (1-3) 5. Repeat Until End Department of Computer Science -10-14/3/12 2 Compiler Engineering Lab
    3. 3. Start New TOKEN Read 1st Character If is Digit? TOKEN = NUM If is Letter? Read Following Characters If any is digit or _? TOKEN = ID Is a If all letters? Keyword ? Is RELOP? Is 2nd TOKEN= TOKEN=KEYWOR >, <, !, = Char (=)? RELOP D Is AROP? +, -. /, *, = TOEKN = AROP Department of Computer Science - 10-14/3/12 3 Compiler Engineering Lab
    4. 4. SYNTAX ANALYSIS (PARSING) • is the process of analyzing a text, made of a sequence of tokens • to determine its grammatical structure with respect to a given (more or less) formal grammar. • Builds Abstract Syntax Tree (AST) • Part from an Interpreter or a Compiler • Creates some form of Internal Representation (IR) • Programming Languages tend to be written in Context-free grammar Efficient + fast Parsers can be written for them Department of Computer Science -10-14/3/12 4 Compiler Engineering Lab
    5. 5. PHASE 2 : SYNTAX ANALYSIS • also called sometimes Syntax Checking • Ensures that: • the code is valid grammatically (without worrying about the meaning) • and will sequence into an executable program. • The syntactical analyzer applies rules to the code; For example: • checking to make sure that each opening brace has a corresponding closing brace, • and that each declaration has a type, • and that the type exists .. etc Department of Computer Science -10-14/3/12 5 Compiler Engineering Lab
    6. 6. CONTEXT-FREE GRAMMAR • Defines the components that forms an expression + defines the order they must appear in • A context-free grammar is a set of rules specifying how syntactic elements in some language can be formed from simpler ones • The grammar specifies allowable ways to combine tokens(called terminals), into higher-level syntactic elements (called non-terminal) Department of Computer Science -10-14/3/12 6 Compiler Engineering Lab
    7. 7. CONTEXT-FREE GRAMMAR • Ex.: • Any ID is an expression (Preferred to say TOKEN) • Any Number is an expression (Preferred to say TOKEN) • If Expr1 and Expr2 are expressions then: • Expr1+ Expr2  are expressions • Expr1* Expr2  are expressions • If id1 and Expr2 are expressions then: • Id1 = Expr2  is a statement • If Expr1and Statement 2 then • While (Expr1) do Statement 2, • If (Expr1) then Statement 2 are statements Department of Computer Science -10-14/3/12 7 Compiler Engineering Lab
    8. 8. GRAMMAR & AST TOKEN (terminals) = Leaf Expressions, Statements (Non-Terminals) = Nodes Lexical Stream of TOKENs Stream of Characters Analysis Syntax Stream of TOKENs Abstract Syntax Tree (AST) Analysis Department of Computer Science -10-14/3/12 8 Compiler Engineering Lab
    9. 9. PHASE 2 : SYNTAX ANALYSIS Department of Computer Science -10-14/3/12 9 Compiler Engineering Lab
    10. 10. PHASE 2 : SYNTAX ANALYSIS Token Syntax Token Analyzer Tokens (Parser) Department of Computer Science -10-14/3/12 10 Compiler Engineering Lab
    11. 11. SYMBOL TABLE • A Symbol Table is a data structure containing a record for each identifier with fields for the attributes of an ID • Tokens formed are recorded in the ST • Purpose: • To analyze expressionsstatements, that is a hierarchal or nesting structure is required • Data structure allows us to: find, retrieve, store a record for an ID quickly. • For example: in Semantic Analysis Phase + Code Generation phase  retrieve ID Type to Type Check and Implementation purposes Department of Computer Science -10-14/3/12 11 Compiler Engineering Lab
    12. 12. SYMBOL TABLE MANAGEMENT • The Symbol Table may contain any of the following information: • For an ID: • The storage allocated for an ID, • its TYPE, • Its Scope (Where it’s valid in the program) • For a function also: • Number of Arguments • Types of Arguments • Passing Method (By Reference or By Value) • Return Type • Identifiers will be added if they don’t already exist Department of Computer Science -10-14/3/12 12 Compiler Engineering Lab
    13. 13. SYMBOL TABLE MANAGEMENT • Not all attributes can always be determined by a lexical analyzer  because of its linear nature • E.g. dim a, x as integer • In this example the analyzer at the time when seeing the IDs has still unreached the type keyword • So, following phases will complete filling IDs attributes and using them as well • For example: the storage location attribute is assigned by the code generator phase Department of Computer Science -10-14/3/12 13 Compiler Engineering Lab
    14. 14. ERROR DETECTION & REPORTING • In order the Compilation process proceed correctly, Each phase must: • Detect any error • Deal with detected error(s) • Errors detection: • Most in Syntax + Semantic Analysis • In Lexical Analysis: if characters aren’t legal for token formation • In Syntax Analysis: violates structure rules • In Semantic Analysis: correct structure but wrong invalid meaning (e.g. ID = Array Name + Function Name) Department of Computer Science -10-14/3/12 14 Compiler Engineering Lab
    15. 15. COMPILER PHASES Department of Computer Science -10-14/3/12 15 Compiler Engineering Lab
    16. 16. LEXICAL ANALYZER & Lexical Analyzer SYMBOL TABLE Token Token Token Location ID Type Value Id1 ID position expr1 AROP ASS 1d2 ID Initial Expr2 AROP SUM Id3 ID Rate Expr3 AROP MUL N1 Num 60 Department of Computer Science -10-14/3/12 16 Compiler Engineering Lab
    17. 17. SYNTAX ANALYZER & SYMBOL TABLE Department of Computer Science -10-14/3/12 17 Compiler Engineering Lab
    18. 18. SYNTAX ANALYZER & SYMBOL TABLE A LEAF is a record with two or more fields One to identify the TOKEN and others to identify info attributes Token Token Token Location ID Type Value Id1 ID position expr1 AROP ASS 1d2 ID Initial Expr2 AROP SUM Id3 ID Rate Expr3 AROP MUL N1 NUM 60 Department of Computer Science -10-14/3/12 18 Compiler Engineering Lab
    19. 19. SYNTAX ANALYZER & SYMBOL TABLEAn interior NODE is a record with a field for the operator and two fields of pointers to the left and right children Left Child Right Child Operator (Pointer) (Pointer) Expr1 id1 Expr2 Expr2 id2 Expr3 Expr3 id3 N1 Department of Computer Science - 10-14/3/12 19 Compiler Engineering Lab
    20. 20. TASK 1: THINK AS A COMPILER! • Analyze the following program syntactically: int main() { std::cout << "hello world" << std::endl; return 0; } Department of Computer Science -10-14/3/12 20 Compiler Engineering Lab
    21. 21. LEXICAL ANALYZER OUTPUT • 1 = string "int” • 2 = string "main” • 3 = opening parenthesis • 4 = closing parenthesis • 5 = opening brace • 6 = string "std” • 7 = namespace operator 8 = string "cout” • 9 = << operator • 10 = string ""hello world"” • 11 = string "endl” • 12 = semicolon • 13 = string "return” • 14 = number 0 • 15 = closing brace Department of Computer Science -10-14/3/12 21 Compiler Engineering Lab
    22. 22. TASK 2: A STATEMENT AST • Create an abstract syntax tree for the following code for the Euclidean algorithm: while b ≠ 0 if a > b a := a − b else b := b − a return a Department of Computer Science -10-14/3/12 22 Compiler Engineering Lab
    23. 23. TASK 2: A STATEMENT AST Department of Computer Science -10-14/3/12 23 Compiler Engineering Lab
    24. 24. LAB ASSIGNMENT Write the Syntax Analyzer Components and Ensure fulfilling the following : • Create a Symbol Table (for all types including IDs, Functions, .. Etc) • Fill the Symbol Table with Tokens extracted from the Lexical Analysis phase •Differentiate between Node and Leaf • Applying grammar rules (tokens, expressions, statements) Department of Computer Science -10-14/3/12 24 Compiler Engineering Lab
    25. 25. QUESTIONS? Thank you for listening  Department of Computer Science -10-14/3/12 25 Compiler Engineering Lab