More on LEX
LEX
   In addition to scanner development, Lex/Flex is also used for
    some other applications in system administration where text
    processing is needed.
Patterns
   r?       zero or one occurrence of r

   r{2,5}   two to five occurrences of r

   r{2,}    two or more occurrences of r

   r{4}     exactly 4 occurrence of r

   0       a NUL character (ASCII code 0)

   123     the character with octal value 123

   x2a     the character with hexadecimal value 2a
Patterns
   <s>r             an r, but only in start condition s

   <s1,s2,s3>r      same, but in any of start conditions s1, s2,
                     or s3

   <*>r             an r in any start condition, even an
                     exclusive one.

   <<EOF>>          an end-of-file

   <s1,s2><<EOF>>   an end-of-file when in start condition s1

                     or s2
Special Directives
   ECHO copies yytext to the scanner's output.

   BEGIN followed by the name of a start condition places the
    scanner in the corresponding start condition.

   REJECT directs the scanner to proceed on to the second best rule
    which matched the input (or a prefix of the input). Multiple
    REJECT's are allowed, each one finding the next best choice to
    the currently active rule.
Example   %{     int s=0;

          %}

          %%

          she {s++; REJECT;}

          he {s++}

          %%

          main() { yylex();

          printf(“%dn",s);

          return 0; }
Lex Variables
   yyin       Of the type FILE*. This points to the current file
               being parsed by the lexer.
   yyout      Of the type FILE*. This points to the location where
               the output of the lexer will be written. By default,
               both yyin and yyout point to standard input and
               output.
   yytext     The text of the matched pattern is stored in this
               variable (char*).
   yyleng     Gives the length of the matched pattern.
   yylineno   Provides current line number
Lex Functions
   yylex()    The function that starts the analysis.

   yywrap()   In the C program generated by the lex
               specification file, when yylex() finished with the
               first file, it calls yywrap(), which opens the next file,
               and yylex() continues. When yywrap() has
               exhausted all the command line arguments, it
               returns 1, and yylex() returns with value 0.

   unput(c)   puts the character c back onto the input stream. It
               will be the next character scanned.

   input()    reads the next character from the input stream
Lex functions- yyless(n)
   Returns all but the first n characters of the current token back
    to the input stream, where they will be rescanned when the
    scanner looks for the next match.

   yytext and yyleng are adjusted appropriately

   An argument of 0 to yyless will cause the entire current input
    string to be scanned again.
For example, on the input "foobar" the following will write out
  "foobarbar“

%% foobar       ECHO; yyless(3);

    [a-z]+      ECHO;
Lex functions- yymore()
   yymore() tells the scanner that the next time it matches a
    rule, the corresponding token should be appended onto the
    current value of yytext rather than replacing it.
For example, given the input "mega-kludge" the following will
  write "mega-mega- kludge" to the output


%% mega-       ECHO; yymore();
    kludge     ECHO;


First "mega-" is matched and echoed to the output. Then
  "kludge" is matched, but the previous "mega-" is still
  hanging around at the beginning of yytext so the ECHO for
  the "kludge" rule will actually write "mega-kludge".
Start Conditions
   Flex provides a mechanism for conditionally activating rules.
   Any rule whose pattern is prefixed with "<sc>" will only be
    active when the scanner is in the start condition named "sc".


    <STRING>[^"]*        { /* eat up the string body ... */ ... }
    will be active only when the scanner is in the "STRING" start
    condition
    <INITIAL,STRING,QUOTE>. { /* handle an escape ... */ ... } will
    be active only when the current start condition is either
    "INITIAL", "STRING", or "QUOTE".
   A start condition is activated using the BEGIN action.

   Until the next BEGIN action is executed, rules with the given start
    condition will be active and others are inactive.

   Start conditions declared using %s are inclusive, then rules with
    no start conditions at all will also be active.

   Start conditions declared using %x are inclusive, then only rules
    qualified with the start condition will be active.

   Exclusive start conditions make it easy to specify "mini-scanners"
    which scan portions of the input that are syntactically different
    from the rest (e.g., comments).
Begin
   BEGIN(0)       returns to the original state where only the rules with no
    start conditions are active.

   BEGIN(INITIAL) is equivalent to BEGIN(0).

    int enter_special;

    %x SPECIAL

    %%

    if ( enter_special )

    BEGIN(SPECIAL);

    <SPECIAL>[a-z]          {printf(“letter”);} %%
Example   %s TABL REC DATA

          %%

          <INITIAL>"<table>"       BEGIN TABL;

          <TABL>"</table>"        BEGIN INITIAL;

          <TABL><tr>               BEGIN REC;

          <REC></tr>              BEGIN TABL;

          <REC><td>                BEGIN DATA;
                                                    %%
          <DATA></td>             BEGIN REC;       main()
          <DATA>.                  ECHO;            {
          <DATA,REC,TABL>[ tn]   ;                yylex();
                                                    }
Line no in text file
%{
/*
*/
int lineno=1;
%}
line .*n
%%
{line} { printf("%5d %s",lineno++,yytext); }
%%
main(int argc, char*argv[])
{
extern FILE *yyin;
yyin=fopen(argv[1],"r");
yylex();
return 0;
}
A Lex program that finds a word from the given file and replaces
  it with other word.
%{                                  {eol} {fprintf(fr,yytext);}
#include<stdio.h>                   .        {fprintf(fr,yytext);}
#include<string.h>                  %%
FILE *ff,*fr;                       int main(int argc,char
char p[20],q[20],r[20],fname[20];   *argv[])
%}                                  {
word [a-zA-Z]+                      strcpy(fname,argv[1]);
%%                                  strcpy(p,argv[2]);
{word} {                            strcpy(q,argv[3]);
         if(strcmp(p,yytext)==0)    ff=fopen(fname,"r+");
         fprintf(fr,q);             fr=fopen("rep.txt","w+");
         else fprintf(fr,yytext);   yyin=ff; yylex();
         }                          return(0);
                                    }
Replace Arabic Nos with Roman
%{
#include<stdio.h>
#include<ctype.h>
FILE *ff,*fr;
char fname[20];
int z;
%}
%%
1 {fprintf(fr,"I ");}
2|3|4 {           z=atoi(yytext);
        while(z!=0)
        {
                  {fprintf(fr,"I "); z--;
        }
    }
5 {fprintf(fr,"V ");}
6|7|8 {               z=atoi(yytext);
          {fprintf(fr,"V ");}
          while(z!=5)
          {
                      {fprintf(fr,"I "); z--;
          }
    }
9 {fprintf(fr,"IX ");}
10 {fprintf(fr,"X ");}
. {fprintf(fr,yytext);}
%%
int main(int argc,char *argv[])
{
strcpy(fname,argv[1]);
ff=fopen(fname,"r+");
fr=fopen("rep.txt","w+");
yyin=ff;
yylex();
return(0);
}
Date Format Change
%{
#include<stdio.h>
#include<string.h>
FILE *ff,*fr;
char fname[20];
int z;
%}
digit [0-9]
%%
{digit}{digit}""{digit}{digit}"" {yymore();}
[5-9][1-9]                 {fprintf(fr,"19"); fprintf(fr,yytext);}
[0-5]{digit}      {fprintf(fr,"20"); fprintf(fr,yytext);}
%%
int main(int argc,char *argv[])
{
strcpy(fname,argv[1]);
ff=fopen(fname,"r+");
fr=fopen("rep.txt","w+");
yyin=ff;
yylex();
return(0);
}
The following lex specification file is used to generate a C
    program which is used to generate a html file from a data file.

A0 = "Name"

B0 = "SSN"

C0 = "HW1"

D0 = "HW2”

…

Name SSN                   HW1               HW2

Scott Smith 123445678      82                44.2

Sam Sneed 999887777        92                84
%%
^A0                     {printf("<table>n<tr>n");}
^A[09]*                 {printf("</tr>n<tr>n");}
^[BZ][09]*      ;
[09]*           {printf("<td> "); ECHO; printf("</td>n");}
[09]*"."[09]*   {printf("<td> "); ECHO; printf("</td>n");}
"[^"]*"       {printf("<td> "); yytext[yyleng1]= ' ';
                         printf("%s </td>n",yytext+1);}
=               ;
[ n]           ;
.               ECHO;
%%
main() {
printf("<html>n");
yylex();
printf("</tr>n</table>n</html>n");
}
This lex specification file is used to generate a C program

which counts how many times a given alphabet is next to
another alphabet in the given input
%{
#include<stdlib.h>
int F[26][26];
%}
%%
[AZaz][AZaz]
{
F[toupper(yytext[0])65][toupper(yytext[1])65]
++;REJECT;
}
.|n ;
%%
main(int argc, char*argv[])
{
extern FILE *yyin;
int i,j;
for(i=0;i<26;i++)
for(j=0;j<26;j++)F[i][j]=0;
yyin=fopen(argv[1],"r");
yylex();
for(i=0;i<26;i++){
for(j=0;j<26;j++) printf("%d", F[i][j]);
printf("n");
}
return 0;
}
%{
#include <stdio.h>
#include <errno.h>
                                  Yywrap() Example
int file_num;
int file_num_max;
char **files;
extern int errno;
%}
%%
(ftp|http)://[^ n<>"]* printf("%sn",yytext);
.|n ;
%%
int main(int argc, char *argv[]) {
file_num=1;
file_num_max = argc;
files = argv;
if ( argc > 1 ) {
if ( (yyin = fopen(argv[file_num],"r")) ==
0){
perror(argv[file_num]);
exit(1);
}
}
while( yylex() )
;
return 0;
}
int yywrap() {
fclose(yyin);
if ( ++file_num < file_num_max ) {
if ( (yyin = fopen(files[file_num],"r")) ==
0){
perror(files[file_num]);
exit(1);
}
return 0;
} else {
return 1;
A lex-based squid redirector
%x SKIP
%x COPY
%option always-interactive
%%
"http://www.debian.org/"            |
"http://www.yahoo.com/"                     {
                 yytext[yyleng-1] = '0';
                 printf("%s.au/",yytext);
                 BEGIN COPY;
                 }
"http://"(www.)?"altavista.digital.com/" {
                 printf("http://www.altavista.yellowpages.com.au/");
                 BEGIN COPY;
                 }
"ftp://sunsite.anu.edu.au/pub/linux/"       |
"ftp://sunsite.unc.edu/pub/Linux/"          |
"ftp://ftp.funet.fi/pub/Linux/"             {
                printf("ftp://ftp.monash.edu.au/pub/linux/");
                         BEGIN COPY;
                         }
"http://weather/"        {
                         printf("http://www.bom.gov.au/");
                         BEGIN COPY;
                         }
"http://bypass."                  { printf("http://"); BEGIN COPY;
   }
"ftp://bypass."          { printf("ftp://"); BEGIN COPY; }
<COPY>[^ n]             ECHO;
<COPY>" "                BEGIN SKIP;
.                        BEGIN SKIP;
<SKIP>.                           ;
<*>n                    { putchar('n'); fflush(stdout); BEGIN 0; }
%%
Write a LEX program to count the
 number of palindrome string in a given
 file and write them into a separate file.
Write a LEX program to read 3 Roman
  numbers and output its equivalent
  numbers and sum in decimal.
Ex:
Input: VI IX L
Output: 6 + 9 + 50 = 65
Write a LEX program to read an
 expression in postfix form and
 evaluate the value of the expression
Ex: 342+* is 18
Write a lex program to read a text file
 and reverse all words that begin with
 letter ‘a’ and end with letter ‘d’. Display
 the longest of such words.
Write a LEX program to count number
 of even and odd numbers. Output the
 odd numbers to odd.txt and even
 numbers to even.txt files. Also display
 the largest odd and even numbers
Write a lex program to read a text file
 and remove all words that begin with
 letter ‘a’ and end with letter‘d’. Write
 the remaining words to a separate file.
Write a lex program to read a text file
 and do the following
Add 1 to all integers.
Add 0.5 to all fractional numbers.
Convert all words to uppercase words
Delete all alphanumeric words
Write resulting content to a separate
 file.
   Encryption and decryption
Write a Lex program that converts a file
  to "Pig latin."
Specifically, assume the file is a sequence
  of words (groups of letters) separated by
  whitespace. Every time you encounter a
  word:
1. If the first letter is a consonant, move
  it to the end of the word and then add
  ay.
2. If the first letter is a vowel, just add
  ay to the end of the word.
All nonletters are copied intact to the
  output.
Thank yu…!



   3 September 2012   42

More on Lex

  • 1.
  • 2.
    LEX  In addition to scanner development, Lex/Flex is also used for some other applications in system administration where text processing is needed.
  • 3.
    Patterns  r? zero or one occurrence of r  r{2,5} two to five occurrences of r  r{2,} two or more occurrences of r  r{4} exactly 4 occurrence of r  0 a NUL character (ASCII code 0)  123 the character with octal value 123  x2a the character with hexadecimal value 2a
  • 4.
    Patterns  <s>r an r, but only in start condition s  <s1,s2,s3>r same, but in any of start conditions s1, s2, or s3  <*>r an r in any start condition, even an exclusive one.  <<EOF>> an end-of-file  <s1,s2><<EOF>> an end-of-file when in start condition s1 or s2
  • 5.
    Special Directives  ECHO copies yytext to the scanner's output.  BEGIN followed by the name of a start condition places the scanner in the corresponding start condition.  REJECT directs the scanner to proceed on to the second best rule which matched the input (or a prefix of the input). Multiple REJECT's are allowed, each one finding the next best choice to the currently active rule.
  • 6.
    Example %{ int s=0; %} %% she {s++; REJECT;} he {s++} %% main() { yylex(); printf(“%dn",s); return 0; }
  • 7.
    Lex Variables  yyin Of the type FILE*. This points to the current file being parsed by the lexer.  yyout Of the type FILE*. This points to the location where the output of the lexer will be written. By default, both yyin and yyout point to standard input and output.  yytext The text of the matched pattern is stored in this variable (char*).  yyleng Gives the length of the matched pattern.  yylineno Provides current line number
  • 8.
    Lex Functions  yylex() The function that starts the analysis.  yywrap() In the C program generated by the lex specification file, when yylex() finished with the first file, it calls yywrap(), which opens the next file, and yylex() continues. When yywrap() has exhausted all the command line arguments, it returns 1, and yylex() returns with value 0.  unput(c) puts the character c back onto the input stream. It will be the next character scanned.  input() reads the next character from the input stream
  • 9.
    Lex functions- yyless(n)  Returns all but the first n characters of the current token back to the input stream, where they will be rescanned when the scanner looks for the next match.  yytext and yyleng are adjusted appropriately  An argument of 0 to yyless will cause the entire current input string to be scanned again.
  • 10.
    For example, onthe input "foobar" the following will write out "foobarbar“ %% foobar ECHO; yyless(3); [a-z]+ ECHO;
  • 11.
    Lex functions- yymore()  yymore() tells the scanner that the next time it matches a rule, the corresponding token should be appended onto the current value of yytext rather than replacing it.
  • 12.
    For example, giventhe input "mega-kludge" the following will write "mega-mega- kludge" to the output %% mega- ECHO; yymore(); kludge ECHO; First "mega-" is matched and echoed to the output. Then "kludge" is matched, but the previous "mega-" is still hanging around at the beginning of yytext so the ECHO for the "kludge" rule will actually write "mega-kludge".
  • 13.
    Start Conditions  Flex provides a mechanism for conditionally activating rules.  Any rule whose pattern is prefixed with "<sc>" will only be active when the scanner is in the start condition named "sc". <STRING>[^"]* { /* eat up the string body ... */ ... } will be active only when the scanner is in the "STRING" start condition <INITIAL,STRING,QUOTE>. { /* handle an escape ... */ ... } will be active only when the current start condition is either "INITIAL", "STRING", or "QUOTE".
  • 14.
    A start condition is activated using the BEGIN action.  Until the next BEGIN action is executed, rules with the given start condition will be active and others are inactive.  Start conditions declared using %s are inclusive, then rules with no start conditions at all will also be active.  Start conditions declared using %x are inclusive, then only rules qualified with the start condition will be active.  Exclusive start conditions make it easy to specify "mini-scanners" which scan portions of the input that are syntactically different from the rest (e.g., comments).
  • 15.
    Begin  BEGIN(0) returns to the original state where only the rules with no start conditions are active.  BEGIN(INITIAL) is equivalent to BEGIN(0). int enter_special; %x SPECIAL %% if ( enter_special ) BEGIN(SPECIAL); <SPECIAL>[a-z] {printf(“letter”);} %%
  • 16.
    Example %s TABL REC DATA %% <INITIAL>"<table>" BEGIN TABL; <TABL>"</table>" BEGIN INITIAL; <TABL><tr> BEGIN REC; <REC></tr> BEGIN TABL; <REC><td> BEGIN DATA; %% <DATA></td> BEGIN REC; main() <DATA>. ECHO; { <DATA,REC,TABL>[ tn] ; yylex(); }
  • 17.
    Line no intext file %{ /* */ int lineno=1; %} line .*n %% {line} { printf("%5d %s",lineno++,yytext); } %% main(int argc, char*argv[]) { extern FILE *yyin; yyin=fopen(argv[1],"r"); yylex(); return 0; }
  • 18.
    A Lex programthat finds a word from the given file and replaces it with other word.
  • 19.
    %{ {eol} {fprintf(fr,yytext);} #include<stdio.h> . {fprintf(fr,yytext);} #include<string.h> %% FILE *ff,*fr; int main(int argc,char char p[20],q[20],r[20],fname[20]; *argv[]) %} { word [a-zA-Z]+ strcpy(fname,argv[1]); %% strcpy(p,argv[2]); {word} { strcpy(q,argv[3]); if(strcmp(p,yytext)==0) ff=fopen(fname,"r+"); fprintf(fr,q); fr=fopen("rep.txt","w+"); else fprintf(fr,yytext); yyin=ff; yylex(); } return(0); }
  • 20.
    Replace Arabic Noswith Roman %{ #include<stdio.h> #include<ctype.h> FILE *ff,*fr; char fname[20]; int z; %} %% 1 {fprintf(fr,"I ");} 2|3|4 { z=atoi(yytext); while(z!=0) { {fprintf(fr,"I "); z--; } }
  • 21.
    5 {fprintf(fr,"V ");} 6|7|8{ z=atoi(yytext); {fprintf(fr,"V ");} while(z!=5) { {fprintf(fr,"I "); z--; } } 9 {fprintf(fr,"IX ");} 10 {fprintf(fr,"X ");} . {fprintf(fr,yytext);} %% int main(int argc,char *argv[]) { strcpy(fname,argv[1]); ff=fopen(fname,"r+"); fr=fopen("rep.txt","w+"); yyin=ff; yylex(); return(0); }
  • 22.
    Date Format Change %{ #include<stdio.h> #include<string.h> FILE*ff,*fr; char fname[20]; int z; %} digit [0-9] %% {digit}{digit}""{digit}{digit}"" {yymore();} [5-9][1-9] {fprintf(fr,"19"); fprintf(fr,yytext);} [0-5]{digit} {fprintf(fr,"20"); fprintf(fr,yytext);}
  • 23.
    %% int main(int argc,char*argv[]) { strcpy(fname,argv[1]); ff=fopen(fname,"r+"); fr=fopen("rep.txt","w+"); yyin=ff; yylex(); return(0); }
  • 24.
    The following lexspecification file is used to generate a C program which is used to generate a html file from a data file. A0 = "Name" B0 = "SSN" C0 = "HW1" D0 = "HW2” … Name SSN HW1 HW2 Scott Smith 123445678 82 44.2 Sam Sneed 999887777 92 84
  • 25.
    %% ^A0 {printf("<table>n<tr>n");} ^A[09]* {printf("</tr>n<tr>n");} ^[BZ][09]* ; [09]* {printf("<td> "); ECHO; printf("</td>n");} [09]*"."[09]* {printf("<td> "); ECHO; printf("</td>n");} "[^"]*" {printf("<td> "); yytext[yyleng1]= ' '; printf("%s </td>n",yytext+1);} = ; [ n] ; . ECHO; %% main() { printf("<html>n"); yylex(); printf("</tr>n</table>n</html>n"); }
  • 26.
    This lex specificationfile is used to generate a C program which counts how many times a given alphabet is next to another alphabet in the given input
  • 27.
  • 28.
    main(int argc, char*argv[]) { externFILE *yyin; int i,j; for(i=0;i<26;i++) for(j=0;j<26;j++)F[i][j]=0; yyin=fopen(argv[1],"r"); yylex(); for(i=0;i<26;i++){ for(j=0;j<26;j++) printf("%d", F[i][j]); printf("n"); } return 0; }
  • 29.
    %{ #include <stdio.h> #include <errno.h> Yywrap() Example int file_num; int file_num_max; char **files; extern int errno; %} %% (ftp|http)://[^ n<>"]* printf("%sn",yytext); .|n ; %% int main(int argc, char *argv[]) { file_num=1; file_num_max = argc; files = argv; if ( argc > 1 ) { if ( (yyin = fopen(argv[file_num],"r")) == 0){ perror(argv[file_num]); exit(1);
  • 30.
    } } while( yylex() ) ; return0; } int yywrap() { fclose(yyin); if ( ++file_num < file_num_max ) { if ( (yyin = fopen(files[file_num],"r")) == 0){ perror(files[file_num]); exit(1); } return 0; } else { return 1;
  • 31.
    A lex-based squidredirector %x SKIP %x COPY %option always-interactive %% "http://www.debian.org/" | "http://www.yahoo.com/" { yytext[yyleng-1] = '0'; printf("%s.au/",yytext); BEGIN COPY; } "http://"(www.)?"altavista.digital.com/" { printf("http://www.altavista.yellowpages.com.au/"); BEGIN COPY; } "ftp://sunsite.anu.edu.au/pub/linux/" | "ftp://sunsite.unc.edu/pub/Linux/" |
  • 32.
    "ftp://ftp.funet.fi/pub/Linux/" { printf("ftp://ftp.monash.edu.au/pub/linux/"); BEGIN COPY; } "http://weather/" { printf("http://www.bom.gov.au/"); BEGIN COPY; } "http://bypass." { printf("http://"); BEGIN COPY; } "ftp://bypass." { printf("ftp://"); BEGIN COPY; } <COPY>[^ n] ECHO; <COPY>" " BEGIN SKIP; . BEGIN SKIP; <SKIP>. ; <*>n { putchar('n'); fflush(stdout); BEGIN 0; } %%
  • 33.
    Write a LEXprogram to count the number of palindrome string in a given file and write them into a separate file.
  • 34.
    Write a LEXprogram to read 3 Roman numbers and output its equivalent numbers and sum in decimal. Ex: Input: VI IX L Output: 6 + 9 + 50 = 65
  • 35.
    Write a LEXprogram to read an expression in postfix form and evaluate the value of the expression Ex: 342+* is 18
  • 36.
    Write a lexprogram to read a text file and reverse all words that begin with letter ‘a’ and end with letter ‘d’. Display the longest of such words.
  • 37.
    Write a LEXprogram to count number of even and odd numbers. Output the odd numbers to odd.txt and even numbers to even.txt files. Also display the largest odd and even numbers
  • 38.
    Write a lexprogram to read a text file and remove all words that begin with letter ‘a’ and end with letter‘d’. Write the remaining words to a separate file.
  • 39.
    Write a lexprogram to read a text file and do the following Add 1 to all integers. Add 0.5 to all fractional numbers. Convert all words to uppercase words Delete all alphanumeric words Write resulting content to a separate file.
  • 40.
    Encryption and decryption
  • 41.
    Write a Lexprogram that converts a file to "Pig latin." Specifically, assume the file is a sequence of words (groups of letters) separated by whitespace. Every time you encounter a word: 1. If the first letter is a consonant, move it to the end of the word and then add ay. 2. If the first letter is a vowel, just add ay to the end of the word. All nonletters are copied intact to the output.
  • 42.
    Thank yu…! 3 September 2012 42