Introducing of Flex<br />天官<br />2011-08-25<br />1<br />
Introducing<br />Flex and Bison are tools <br />for building programs that handle structure input.<br />originally designe...
Flex and Bison<br />One of the key insights was to break the job into two parts: <br />lexical analysis: Scanning divides ...
Structure of a Flex Specification<br />	... definition section ...<br />%%<br />	... rules section ...<br />%%<br />	... u...
Structure of a Flex Specification<br />... definition section ...<br />	%%<br />	... rules section ...<br />	%%<br />	... ...
Definition Section<br />Option<br />Flex的编译选项<br />the literal block  <br />可以包含函数头和变量定义等<br />会被逐字的拷贝到输出中<br />Definition...
Options<br />%option yylinenotells flex to define an integer variable called yylineno and to maintain the current line num...
Definitions<br />space		 [ tnrf]<br />newline		 [nr]<br />comment		 ("--"{non_newline}*)<br />whitespace		({space}+|{comme...
Regular Expressions<br />. 	Matches any sigle character except the newline character(n).<br />[] 	A character class that m...
Regular Expressions<br />() 	Groups a series of regular expressions together into a new regular expression.<br />/ 	Traili...
Structure of a Flex Specification<br />	... definition section ...<br />	%%<br />... rules section ...<br />	%%<br />	... ...
Rule Section<br />Pattern lines<br />See definitions from definition section<br />Pattern lines contain a pattern followed...
Rule Section Example<br />%{<br />		/* code to execute during start of each call of yylex() */<br />token_start = NULL;<br...
start states and Nested Input Files<br />The BEGIN macro switches among start states<br />The scanner starts in state 0, a...
REJECT<br />If an action executes REJECT, flex conceptually puts back the text matched by the pattern and finds the next b...
Conflict<br />Flex 可以处理有歧义的规定。当多于一个表达式可以匹配当前输入的时候, Flex按如下来选择:<br />首选最长匹配。<br />在匹配相同数目字符的规则中,首选最先给出的规则。<br />bird	{ ; }<...
Structure of a Flex Specification<br />	... definition section ...<br />	%%<br />	... rules section ...<br />	%%<br />... ...
User Subroutines<br />The contents of the user subroutines section are copied verbatim by flex to the C file.<br />This se...
Input Buffer<br />/* Input Buffer */<br />YY_BUFFER_STATE bp;<br />FILE *f;<br />f = fopen(..., "r");<br />// new buffer r...
Stdio File Chaining<br />You can tell the lexer to read from any stdio file by calling yyrestart(file). <br />Also, when a...
Reentrant Scanners<br />The normal code for a flex scanner palces its state information in static variables so that each c...
Warnning<br />Flex对C++支持的不好<br />“Although flex has an option to create a C++ scanner, the manual says it‘s experimental a...
references<br />B. W. Kernighan and D. M. Ritchie, The C Programming Language, Prentice-Hall, N. J. (1978). <br />A. V. Ah...
Discuss Everything<br />
Upcoming SlideShare
Loading in …5
×

Introduction of flex

11,939 views

Published on

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
11,939
On SlideShare
0
From Embeds
0
Number of Embeds
10,195
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Introduction of flex

  1. 1. Introducing of Flex<br />天官<br />2011-08-25<br />1<br />
  2. 2. Introducing<br />Flex and Bison are tools <br />for building programs that handle structure input.<br />originally designed for writers of compilers and interpreters.<br />replacements for the classic lex and yacc(developed at Bell Laboratories in the 1970s).<br />http://www.gnu.org/software/bison/<br />http://flex.sourceforge.net/<br />
  3. 3. Flex and Bison<br />One of the key insights was to break the job into two parts: <br />lexical analysis: Scanning divides the input into meaningful chunks, called tokens<br />syntax analysis: parsing figures out how the tokens relate to each other.<br />词法规则<br />语法规则<br />Flex<br />源码<br />yylex()<br />yylex()<br />输入<br />输出<br />Flex<br />yylex()<br />输入<br />输出<br />Bison<br />yyparse()<br />
  4. 4. Structure of a Flex Specification<br /> ... definition section ...<br />%%<br /> ... rules section ...<br />%%<br /> ... user subroutines ...<br />/* example.l : just like Unix wc */<br />%{<br />int chars = 0;<br />int words = 0;<br />int lines = 0;<br />%}<br />%%<br />[a-zA-Z]+ { words++; chars += strlen(yytext); }<br />n { chars++; lines++; }<br />. { chars++; }<br />%%<br />main(intargc, char **argv)<br />{<br />yylex();<br />printf("%8d%8d%8dn", lines, owrds, chars);<br />}<br />$ flex example.l<br />$ cc lex.yy.c -lfl<br />$ ./a.out<br />
  5. 5. Structure of a Flex Specification<br />... definition section ...<br /> %%<br /> ... rules section ...<br /> %%<br /> ... user subroutines ...<br />
  6. 6. Definition Section<br />Option<br />Flex的编译选项<br />the literal block <br />可以包含函数头和变量定义等<br />会被逐字的拷贝到输出中<br />Definitions<br />基于正则表达式的token定义<br />start conditions<br />translations<br />参看mysql的词法源码<br />
  7. 7. Options<br />%option yylinenotells flex to define an integer variable called yylineno and to maintain the current line number in it.<br />%option case-insensitive tells flex to build a canner that treats upper- and lowercase the same.<br />%option noyywrapno yywrap() function definition<br />%option batch it always looks ahead.<br />%option interactive it looks ahead only when it needs to do so (which is slightly slower).<br />%option prefix=“foo” yy开头的变量替换成foo开头<br />%option outfile=“foolex.c” 输出文件不为yylex.c,而是foolex.c<br />%option nodefault不默认调用ECHO<br />%option warn输出警告<br />%option noinput禁止input()调用<br />%option nounput禁止unput()调用<br />……<br />
  8. 8. Definitions<br />space [ tnrf]<br />newline [nr]<br />comment ("--"{non_newline}*)<br />whitespace ({space}+|{comment})<br />digit [0-9]<br />ident_start [A-Za-z200-377_]<br />ident_cont [A-Za-z200-377_0-9$]<br />identifier {ident_start}{ident_cont}*<br />self [,()[].;:+-*/%^<>=]<br />integer {digit}+<br />decimal (({digit}*.{digit}+)|({digit}+.{digit}*))<br />real ((({digit}*.{digit}+)|({digit}+.{digit}*)|({digit}+))([Ee][-+]?{digit}+))<br />param ${integer}<br />other .<br />
  9. 9. Regular Expressions<br />. Matches any sigle character except the newline character(n).<br />[] A character class that matches any character within the brackets.<br />[]{-}[] A differenced character class, with the characters in the first class omitting the characters in the second class.<br />^ Matches the beginning of the line as the first character of a regular expression.<br />$ Matches the end of a line as the last character of a regular expression.<br />{} If the braces contain one or two numbers, indicate the minimum and maximum number of times the previous pattern can match.<br /> Used to escape metacharacters.<br />* Matches zero or more copies of the preceding expression.<br />+ Matches one or more occurrences of the preceding regular expression.<br />? The alternation operator<br />"..." Anything within the quotation marks is treated literally.<br />
  10. 10. Regular Expressions<br />() Groups a series of regular expressions together into a new regular expression.<br />/ Trailing context,k which means to match the regular expression preceding the slash but only if followed by the regular expression after the slash.<br /><> A name or list of names in angle brackets at the beginning of a pattern makes that pattern apply only in the given start states.<br /><<EOF>> matches the end of file.<br />(?# comment) Perl-style expression comments.<br />(?a:pattern) or (?a-x:pattern) Perl-style modifiers.<br />
  11. 11. Structure of a Flex Specification<br /> ... definition section ...<br /> %%<br />... rules section ...<br /> %%<br /> ... user subroutines ...<br />
  12. 12. Rule Section<br />Pattern lines<br />See definitions from definition section<br />Pattern lines contain a pattern followed by some C code to execute when the input matches the pattern<br />C code<br />We be copied verbatim to the generated C file<br />Lines at the beginning of the rules section are placed near the beginning of the generated yylex() function<br />C code must be enclosed in braces ({ } or %{ %}).<br />When an input character matches no pattern, the lexer acts as though it matched a pattern whose code is ECHO, except nodefault is defined.<br />
  13. 13. Rule Section Example<br />%{<br /> /* code to execute during start of each call of yylex() */<br />token_start = NULL;<br />%}<br />{self} { return yytext[0]; }<br />{integer} { yylval.ival = val;<br /> return ICONST;<br /> }<br />{decimal} { yylval.str = pstrdup(yytext);<br /> return FCONST;<br /> }<br />{identifier} { const ScanKeyword *keyword;<br /> char *ident;<br /> /* Is it a keyword? */<br /> keyword = ScanKeywordLookup(yytext);<br /> if (keyword != NULL)<br /> {<br />yylval.keyword = keyword->name;<br />return keyword->value;<br /> }<br /> /*<br /> * No. Convert the identifier to lower * * case, and truncateif necessary.<br /> */<br />ident = downcase_truncate_identifier(yytext, yyleng, true);<br /> yylval.str = ident;<br /> return IDENT;<br /> }<br />{other}<br />
  14. 14. start states and Nested Input Files<br />The BEGIN macro switches among start states<br />The scanner starts in state 0, also known as INITIAL. All other states must be named in %s or %x lines in the definition section.<br />/* Example: */<br />%x xc<br />...<br />xcstart /*{op_chars}*<br />xcstop *+/<br />...<br />%%<br />...<br />{xcstart} { BEGIN(xc); }<br /><xc>{xcstart} { /* do nothing */ }<br /><xc>{xcstop} { BEGIN(INITIAL); }<br />...<br />%%<br />…<br />BEGIN statename;<br />
  15. 15. REJECT<br />If an action executes REJECT, flex conceptually puts back the text matched by the pattern and finds the next best match for it.<br />/* Example: */<br />...<br />%%<br />pink { npink++; REJECT; }<br />ink { nink++; REJECT; }<br />pin { npin++; REJECT; }<br />. | n ; /* discard other characters */<br />...<br />%%<br />…<br />
  16. 16. Conflict<br />Flex 可以处理有歧义的规定。当多于一个表达式可以匹配当前输入的时候, Flex按如下来选择:<br />首选最长匹配。<br />在匹配相同数目字符的规则中,首选最先给出的规则。<br />bird { ; }<br />[a-z]+ { ; }<br />/* <br />* 1. 如果输入是 birds,它被接受为第二个pattern,因<br />* 为 [a-z]+ 匹配 5 个字符而 bird只匹配 4 个。<br />* 2. 如果输入是 bird,两个规则都匹配4个字符,它被<br />* 接受为第一个pattern<br />*/<br />.* 是危险pattern<br />
  17. 17. Structure of a Flex Specification<br /> ... definition section ...<br /> %%<br /> ... rules section ...<br /> %%<br />... user subroutines ...<br />
  18. 18. User Subroutines<br />The contents of the user subroutines section are copied verbatim by flex to the C file.<br />This section typically includes routines called from the rules.<br />
  19. 19. Input Buffer<br />/* Input Buffer */<br />YY_BUFFER_STATE bp;<br />FILE *f;<br />f = fopen(..., "r");<br />// new buffer reading from f<br />bp = yy_create_buffer(f, YY_BUF_SIZE ); <br />// use the buffer we just made<br />yy_switch_to_buffer(bp); <br />...<br />// discard buffer contents<br />yy_flush_buffer(bp); <br />...<br />// free buffer<br />void yy_delete_buffer (bp); <br />/* Input from Strings*/<br />// scan a copy of bytes<br />1. bp = yy_scan_bytes(char *bytes, len); <br />// scan a copy of null-terminated string<br />2. bp = yy_scan_string("string"); <br />// scan (size-2) bytes in place, <br />// last two bytes of the buffer must be <br />// nulls (0)<br />3. bp = yy_scan_buffer(char *base, yy_size_t size); <br />
  20. 20. Stdio File Chaining<br />You can tell the lexer to read from any stdio file by calling yyrestart(file). <br />Also, when a lexer built with %option yywrap reaches the end of the input file, it calls yywrap(), which can swith to a different input file.<br />When a lexer encounters an end of file, it optionally calls the routine yywrap() to find out what to do next.<br />If yywrap() returns 0, the scanner continues scanning, while if it returns 1, the scanner returns a zero token to report the end-of-file.<br />If your lexer doesn't use yywrap() to switch files, the option %option noyywrap removes the calls to yywrap().<br />
  21. 21. Reentrant Scanners<br />The normal code for a flex scanner palces its state information in static variables so that each call to yylex() resumes where the previous one left off, using the existing input buffer, input file, start state, and so forth.<br />In some situations, it can be useful to have multiple copies of the scanner active at once, typically in threaded programs that handle multiple independent input sources.<br />…<br />%%<br />%option reentrant<br />…<br />%%<br />yyscan_t scanner;<br />if(yylex_init(&scanner)) <br /> { <br />printf("no scanning todayn"); <br /> abort(); <br /> }<br />while((yylex(scanner))<br />... do something ...;<br />yylex_destroy(scanner);<br />
  22. 22. Warnning<br />Flex对C++支持的不好<br />“Although flex has an option to create a C++ scanner, the manual says it‘s experimental and the code is buggy and doesn’t work very well. But you can tell a C++ bison parser to call a C lexer”<br /> ——from 《 flex and bison 》<br />Then how to cope with C++?<br />
  23. 23. references<br />B. W. Kernighan and D. M. Ritchie, The C Programming Language, Prentice-Hall, N. J. (1978). <br />A. V. Aho and M. J. Corasick, Efficient String Matching: An Aid to Bibliographic Search, Comm. ACM 18, 333-340 (1975). <br />B. W. Kernighan, D. M. Ritchie and K. L. Thompson, QED Text Editor, Computing Science Technical Report No. 5, 1972, Bell Laboratories, Murray Hill, NJ 07974. <br />D. M. Ritchie, private communication. See also M. E. Lesk, The Portable C Library, Computing Science Technical Report No. 31, Bell Laboratories, Murray Hill, NJ 07974.<br />Lex And Yacc 2Ed <br />flex and bison<br />
  24. 24. Discuss Everything<br />

×