Documente Academic
Documente Profesional
Documente Cultură
state machines.
lex.yy.c y.tab.c
call
Input yylex() yyparse() Parsed Input
return token
1
Versions and Reference Books General Format of Lex Source
2
Default Rules and Actions Default Input and Output
• The first and second part must exist, but • If you don’t write your own main() to deal
may be empty, the third part and the with the input and the output of yylex(), the
second %% are optional. default input of default main() is stdin and
• If the third part dose not contain a main(), - the default output of default main() is
ll will link a default main() which calls stdout.
yylex() then exits.
– stdin usually is to be keyboard input
• Unmatched patterns will perform a default stdout usually is to be screen output
action, which consists of copying the input – cs20: %./a.out < inputfile > outputfile
to the output
{bin_digit}* { %%
/* match all strings of 0's and 1's */ /*
/* Print out message with matching * Now this is where you want your main
* text program
*/ */
printf("BINARY: %s\n", yytext); int main(int argc, char *argv[]) {
} /*
([ab]*aa[ab]*bb[ab]*)|([ab]*bb[ab]*aa[ab]*) { * call yylex to use the generated lexer
/* match all strings over */
* (a,b) containing aa and bb yylex();
*/ /*
printf("AABB\n"); * make sure everything was printed
} */
\n ; /* ignore newlines */ fflush(yyout);
exit(0);
}
3
Token Definitions • Elementary Operations (cont.)
( Extended Regular Expression ) – NOTE: . matches any character except the
newline
• Elementary Operations
– * -- Kleene Closure
– single characters
• except “ \ . $ ^ [ ] - ? * + | ( ) / { } % < >
– + -- Positive Closure
– concatenation (put characters together)
– alternation (a|b|c) • Examples:
• [ab] == a|b – [0-9]+"."[0-9]+
• [a-k] == a|b|c|...|i|j|k • note: without the quotes it could be any
character
• [a-z0-9] == any letter or digit
• [^a] == any character but a – [ \t]+ -- is whitespace
• (except CR).
• There is a blank space character before the \t
4
Transition Rules Tokens and Actions
• ERE <one or more blanks> { program statement • Example:
program – {real} return FLOAT;
statement }
– begin return BEGIN;
• A null statement ; will ignore the input
– {newline} linecount++;
• Four special options:
– {integer} {
| ECHO; REJECT; BEGIN;
• printf("I found an integer\n");
• The unmatched token is using a default action
• return INTEGER;
that ECHO from the input to the output
• }
• | indicates that the action for this rule is from the
action for the next rule
5
User Written Code More Example 1
int lengs[100];
%%
• The actions associated with any given [a-z]+ lengs[yyleng]++;
token are normally specified using .|
statements in C. But occasionally the \n ;
actions are complicated enough that it is %%
yywrap()
better to describe them with a function call,
{
and define the function elsewhere. int i;
• Definitions of this sort go in the last section printf("Length No. words\n");
of the Lex input. for(i=0; i<100; i++)
if (lengs[i] > 0)
printf("%5d%10d\n",i,lengs[i]); return(1);
}