SlideShare a Scribd company logo
1 of 7
Regular Expression
R.Rajkumar
Asst.Professor
CSE
Lexical analyzer
• Lexical analysis, also called scanning, is the phase of the compilation
process which deals with the actual program being compiled, character by
character. The higher level parts of the compiler will call the lexical
analyzer with the command "get the next word from the input", and it is
the scanner's job to sort through the input characters and find this word.
• The types of "words" commonly found in a program are:
• programming language keywords, such as if, while, struct, int etc.
• operator symbols like =, +, -, &&, !, <= etc.
• other special symbols like: ( ), { }, [ ], ;, & etc.
• constants like 1, 2, 3, 'a', 'b', 'c', "any quoted string" etc.
• variable and function names (called identifers) such as x, i, t1 etc.
• Some languages (such as C) are case sensitive, in that they differentiate
between eg. if and IF; thus the former would be a keyword, the latter a
variable name.
Tokens
• Also, most languages would insist that identifers cannot be any of the keywords, or
contain operator symbols (versions of Fortran don't, making lexical analysis quite
difficult).
• In addition to the basic grouping process, lexical analysis usually performs the
following tasks:
• Since there are only a finite number of types of words, instead of passing the actual
word to the next phase we can save space by passing a suitable representation. This
representation is known as a token.
• If the language isn't case sensitive, we can eliminate differences between case at this
point by using just one token per keyword, irrespective of case; eg. #define IF-
TOKEN 1 #define WHILE-TOKEN 2 ..... ..... if we meet "IF", "If", "iF", "if" then return
IF_TOKEN if we meet "WHILE, "While", "WHile", ... then return WHILE-TOKEN
• We can pick out mistakes in the lexical syntax of the program such as using a
character which is not valid in the language. (Note that we do not worry about the
combination of patterns; eg. the pattern of characters"+*" would be returned
as PLUS-TOKEN, MULT-TOKEN, and it would be up to the next phase to see that
these should not follow in sequence.)
• We can eliminate pieces of the program that are no longer relevant, such as spaces,
tabs, carriage-returns (in most languages), and comments.
• In order to specify the lexical analysis process, what we need is some method of
describing which patterns of characters correspond to which words.
Regular Expressions
• Regular expressions are used to define patterns of characters; they are used in UNIX tools
such as awk, grep, vi and, of course, lex.
• A regular expression is just a form of notation, used for describing sets of words. For any
given set of characters , a regular expression over is defined by:
• The empty string, , which denotes a string of length zero, and means ``take nothing from
the input''. It is most commonly used in conjunction with other regular expressions eg. to
denote optionality.
• Any character in may be used in a regular expression. For instance, if we write a as a
regular expression, this means ``take the letter a from the input''; ie. it denotes the
(singleton) set of words {``a''}
• The union operator, ``|'', which denotes the union of two sets of words. Thus the regular
expression a|b denotes the set {``a'', ``b''}, and means ``take either the letter a or the
letter b from the input''
• Writing two regular expressions side-by-side is known as concatenation; thus the regular
expression ab denotes the set {``ab''} and means ``take the character a followed by the
character b from the input''.
• The Kleene closure of a regular expression, denoted by ``*'', indicates zero or more
occurrences of that expression. Thus a* is the (infinite) set {, ``a'', ``aa'', ``aaa'', ...} and
means ``take zero or more as from the input''.
• Brackets may be used in a regular expression to enforce precedence or increase clarity.
Thompson Algorithm
for converting RE to NFA
Lexical1
Lexical1

More Related Content

What's hot

4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysis
jigeno
 
System Programming Unit IV
System Programming Unit IVSystem Programming Unit IV
System Programming Unit IV
Manoj Patil
 

What's hot (20)

4 lexical and syntax analysis
4 lexical and syntax analysis4 lexical and syntax analysis
4 lexical and syntax analysis
 
role of lexical anaysis
role of lexical anaysisrole of lexical anaysis
role of lexical anaysis
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
4 lexical and syntax
4 lexical and syntax4 lexical and syntax
4 lexical and syntax
 
Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzer
 
Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyser
 
System Programming Unit IV
System Programming Unit IVSystem Programming Unit IV
System Programming Unit IV
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical analysis-using-lex
Lexical analysis-using-lexLexical analysis-using-lex
Lexical analysis-using-lex
 
Chap 1-language processor
Chap 1-language processorChap 1-language processor
Chap 1-language processor
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code Generation
 
Lecture3 lexical analysis
Lecture3 lexical analysisLecture3 lexical analysis
Lecture3 lexical analysis
 

Similar to Lexical1

COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
COMPILER DESIGN LECTURES -UNIT-2 ST.pptxCOMPILER DESIGN LECTURES -UNIT-2 ST.pptx
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
Ranjeet Reddy
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
Raghu nath
 

Similar to Lexical1 (20)

Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
001 Lecture-11-C-Traps-and-Pitfalls-part-1.pdf
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnf
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
COMPILER DESIGN LECTURES -UNIT-2 ST.pptxCOMPILER DESIGN LECTURES -UNIT-2 ST.pptx
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
 
Lexical Analysis.pdf
Lexical Analysis.pdfLexical Analysis.pdf
Lexical Analysis.pdf
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
6. describing syntax and semantics
6. describing syntax and semantics6. describing syntax and semantics
6. describing syntax and semantics
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
Module4 lex and yacc.ppt
Module4 lex and yacc.pptModule4 lex and yacc.ppt
Module4 lex and yacc.ppt
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Lexical
LexicalLexical
Lexical
 
Lexical analysis
Lexical analysisLexical analysis
Lexical analysis
 
A Quick Taste of C
A Quick Taste of CA Quick Taste of C
A Quick Taste of C
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Lexical1

  • 2. Lexical analyzer • Lexical analysis, also called scanning, is the phase of the compilation process which deals with the actual program being compiled, character by character. The higher level parts of the compiler will call the lexical analyzer with the command "get the next word from the input", and it is the scanner's job to sort through the input characters and find this word. • The types of "words" commonly found in a program are: • programming language keywords, such as if, while, struct, int etc. • operator symbols like =, +, -, &&, !, <= etc. • other special symbols like: ( ), { }, [ ], ;, & etc. • constants like 1, 2, 3, 'a', 'b', 'c', "any quoted string" etc. • variable and function names (called identifers) such as x, i, t1 etc. • Some languages (such as C) are case sensitive, in that they differentiate between eg. if and IF; thus the former would be a keyword, the latter a variable name.
  • 3. Tokens • Also, most languages would insist that identifers cannot be any of the keywords, or contain operator symbols (versions of Fortran don't, making lexical analysis quite difficult). • In addition to the basic grouping process, lexical analysis usually performs the following tasks: • Since there are only a finite number of types of words, instead of passing the actual word to the next phase we can save space by passing a suitable representation. This representation is known as a token. • If the language isn't case sensitive, we can eliminate differences between case at this point by using just one token per keyword, irrespective of case; eg. #define IF- TOKEN 1 #define WHILE-TOKEN 2 ..... ..... if we meet "IF", "If", "iF", "if" then return IF_TOKEN if we meet "WHILE, "While", "WHile", ... then return WHILE-TOKEN • We can pick out mistakes in the lexical syntax of the program such as using a character which is not valid in the language. (Note that we do not worry about the combination of patterns; eg. the pattern of characters"+*" would be returned as PLUS-TOKEN, MULT-TOKEN, and it would be up to the next phase to see that these should not follow in sequence.) • We can eliminate pieces of the program that are no longer relevant, such as spaces, tabs, carriage-returns (in most languages), and comments. • In order to specify the lexical analysis process, what we need is some method of describing which patterns of characters correspond to which words.
  • 4. Regular Expressions • Regular expressions are used to define patterns of characters; they are used in UNIX tools such as awk, grep, vi and, of course, lex. • A regular expression is just a form of notation, used for describing sets of words. For any given set of characters , a regular expression over is defined by: • The empty string, , which denotes a string of length zero, and means ``take nothing from the input''. It is most commonly used in conjunction with other regular expressions eg. to denote optionality. • Any character in may be used in a regular expression. For instance, if we write a as a regular expression, this means ``take the letter a from the input''; ie. it denotes the (singleton) set of words {``a''} • The union operator, ``|'', which denotes the union of two sets of words. Thus the regular expression a|b denotes the set {``a'', ``b''}, and means ``take either the letter a or the letter b from the input'' • Writing two regular expressions side-by-side is known as concatenation; thus the regular expression ab denotes the set {``ab''} and means ``take the character a followed by the character b from the input''. • The Kleene closure of a regular expression, denoted by ``*'', indicates zero or more occurrences of that expression. Thus a* is the (infinite) set {, ``a'', ``aa'', ``aaa'', ...} and means ``take zero or more as from the input''. • Brackets may be used in a regular expression to enforce precedence or increase clarity.