Definition A computer program use just to evaluate scripts according to the respective grammar. In computer Science, its a text analyzing process to determine if it belongs to a certain language or not. Mostly used as a part of High-Level-Language translators(Interpreter or Compiler). But some Standalone Parsers are available for specific purposes.(for e.g. http://nlp.stanford.edu/software/lex-parser.shtml)
What and How it works? It first receives input in the form of sequential source program or Stream of characters (String). Break them into small chunks or parts (tokens). For Example in English Language: He is running to school Subject Verb Object And produces a data-structure that is usually a tree.
What is not Parsing? Parser does not evaluate one thing into another. If just change its representation from one form to another. Parser is not responsible for summarizing or extract the original body of text.Word Processing Programs also uses a parser to checkspelling and grammatical mistakes.
But What is Compiler ??? Compiler is a Language Translator or a computer program that translates a Human understandable instruction set into Computer understandable form Or simply, the output of compiler is .exe file,Object code or RTL(Register Transfer Language)
Compiler VS Parser From the previous discussion it can be concluded that: The output of Parser is a Data-Structure that is useless for the for both Liveware and hardware. But this will be a very useful input for some component of the compiler. Whereas, the output of the compiler is understandable and can be executed by Computer hardware
Phases of Compiler These 3 phases can be called as they are actually parsing the source Program
Lexical Analysis As the name shows, Its done by Lexical Analyzer. Reads the characters from the input stream or source program. Group them in tokens. Each token describe some important element. CONFUSED….!!! Lets see an exmaple.
Token Example int a = 10 ; That’s the code for declaring and initializing a variable “a”. When this LOC passed through t Lexical Box following thing happens Identifier Constant int a = 10 ; Data Type Assignment Symbol Operator Each of this term is called as Token. For e.g. “a” is a token of “Identifier type”. The Character Sequence “int” is called as the its Lexeme. http://en.wikipedia.org/wiki/Token_(parser)#Token
Syntax Analyzer According to many sources, this phase is actually responsible for Parsing. So, the input of this phase is the Tokens formed in the previous phase i.e. Lexical Analysis. In this phase, the recently formed token acquire a Hierarchical Data Structure, that’s called Tree in Computer Science.
Before Moving Further First Explore this New Buzz Word Tree. Definition: Its an hierarchical Data Structure that is also known as the Collection of some related nodes.
Terminologies Node is structure that contains some value or condition. In the tree on right side “2” is a parent node that has two child nodes “7” and “5”.