Sunteți pe pagina 1din 42

Ch.

Crlgoras 1
Curs 8 - plan
8emember: Condl(ll de promovare
Compllare/lnLerpreLare
lazele compllrll
lmplemenLarea llmba[elor - lsLorle
rolecLarea unul anallzor lexlcal
1okens, auerns, Lexemes
Speclcarea pauern - urllor penLru Loken - url
!"#$%&'' $%)*+,-%
.%/0'1'' $%)*+,-%
23-,1'' #$%&4*$-,-%
.%/0'1'' $%)*+,-% 5' 6$,7,84'
.',)$,7% 9% :$,0;'1'% <=*-37,-% /0'-%>
?7#+%7%0-,$%@ !"%7#+*
CeneraLorul Lex (llex). Lxemple
Condl(ll de promovare
AS = acuvlLaLea la semlnar (max. 10 puncLe)
AL = acuvlLaLea la laboraLor (max. 10 puncLe)
11,12 LesLe scrlse ln spLmnlle 8, 13-16,
uncLa[ul nal se ob(lne asuel: 3*AS+3*AL+2*11+2*12
Condl(ll obllgaLorll de promovare: AS >=3, AL >=3
noLa nal se va sLablll conform crlLerlllor LC1S (mlnlm
30 puncLe penLru promovare)
Ch. Crlgoras 2
AcuvlLaLea de laboraLor
Se va sus(lne un LesL de laboraLor, noLaL cu o noLa de la 1 la 10 sl se vor rezolva 2
Leme, ecare dln acesLea lnd noLaLa cu o noLa de la 1 la 10.
AL se ob(lne fcnd medla arlLmeuca a celor 3 noLe.
1emele se vor prezenLa la grupele dln care sLuden(ll fac parLe
1emele prezenLaLe cu inLrzlere vor depuncLaLe cu mlnlm 2 puncLe (2 puncLe
penLru ecare spLmna de inLrzlere, lar Lemele se poL prezenLa cu maxlm 2
spLmnl de inLrzlere).
rlma Lem nu se poaLe prezenLa cu inLrzlere
Crlce LenLauv de coplere a unel Leme se va penallza cu scderea unul puncL dln
noLa nala la laboraLor.
hup://profs.lnfo.ualc.ro/~ouo/lfac.hLml
Ch. Crlgoras 3
Ch. Crlgoras 4
Curs 8 - 8lbllograe
hup://www.sLanford.edu/class/cs143/
hup://www.cs.fsu.edu/~engelen/courses/CC3621/
hup://dlckgrune.com/8ooks/1AC_2nd_Ldluon/
Compllare sl lnLerpreLare
A37#'+,$%"
1raducerea unul program scrls inLr-un llmba[ surs
inLr-unul echlvalenL scrls inLr-un llmba[ (lnL.
Compiler
Error messages
Source
Program
Target
Program
Input
Output
3 Ch. Crlgoras
Compllare sl lnLerpreLare(conL)
Interpreter
Source
Program
Input
Output
Error messages
?0-%$#$%-,$%"
LfecLuarea opera(lllor descrlse in programul surs
6 Ch. Crlgoras
Compllare - Modelul Anallz - SlnLez
Compllarea presupune dou marl faze:
=0,+';BC deLermln opera(llle lmpllcaLe de
programul surs - opera(ll ce se inreglsLreaz inLr-
o sLrucLur de arbore
D'0-%;BC consldernd la lnLrare arborele ob(lnuL
Lraduce opera(llle inreglsLraLe in acesLa in
programul (lnL
7 Ch. Crlgoras
reprocessors, Compllers,
Assemblers, and Llnkers
Preprocessor
Compiler
Assembler
Linker
Skeletal Source Program
Source Program
Target Assembly Program
Relocatable Object Code
Absolute Machine Code
Libraries and
Relocatable Object Files
8 Ch. Crlgoras
1he hases of a Compller
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string A, =, B, +, C, ;
And symbol table with names
Parser (performs syntax analysis
based on the grammar of the
programming language)
Parse tree or abstract syntax tree
;
|
=
/ \
A +
/ \
B C
Semantic analyzer (type checking,
etc)
Annotated parse tree or abstract
syntax tree
Intermediate code generator Three-address code, quads, or
RTL
int2fp B t1
+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or
RTL
int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A
9 Ch. Crlgoras
Cruparea fazelor unul compllaLor
Compller E$30- %09 and F,4G %09&:
lronL end: ,0,+H&'& (7,4I'0% '09%#%09%0-)
8ack end: &H0-I%&'& (7,4I'0% 9%#%09%0-)
Compller #,&&%&C
unele faze poL execuLaLe doar odaL(&'0)+% #,&&) sau de
mal mulLe orl (7*+8 #,&&)
Slngle pass: usually requlres everyLhlng Lo be dened before belng
used ln source program
Mulu pass: compller may have Lo keep enure program
represenLauon ln memory
10 Ch. Crlgoras
Compller-ConsLrucuon 1ools
Soware developmenL Lools are avallable Lo
lmplemenL one or more compller phases
D4,00%$ )%0%$,-3$& <J%"K L+%"KM>
N,$&%$ )%0%$,-3$& <O,44K P'&30K M>
DH0-,"Q9'$%4-%9 -$,0&+,830 %0)'0%&
=*-37,84 439% )%0%$,-3$&
.,-,QR3S %0)'0%&
11 Ch. Crlgoras
lmplemenLarea llmba[elor - lsLorle
1937 lorLran: prlmul compllaLor (expresll arlLmeuce,
lnsLruc(lunl, procedurl)
1960 Algol: prlma uullzare a denl(lllor formale (gramaucl,
8nl, sLrucLura de bloc, recursle)
1970 ascal: upurl uullzaLor, ma;lnl vlrLuale (-code)
1972 C: varlabllele dlnamlce, muluLasklng, gesuonarea
inLreruperllor
1983 AuA: prlmul llmba[ sLandardlzaL
1983 C++: orlenLare-oblecL, excep(ll, LemplaLe-url
1993 !ava: [usL-ln-ume compllauon
2000 C#: 1ehnologla .nL1
rezenL: Language lrameworks (k , LanCompS, 8ascal,
L1 8edex, Spoofax)
Ch. Crlgoras 12
Anallza lexlcal - de ce faz separaL?
Slmpllc prolecLarea unul compllaLor
ellmlnarea spa(lllor sl a comenLarlllor
conversla daLelor
meLodele de parsare LL(1), L8(1) au nevole de 1 Loken
"lookahead"
Aslgur o lmplemenLare eclenL
LxlsL Lehnlcl slsLemauce penLru a lmplemenLa anallzoare
lexlcale manual sau auLomaL pornlnd de la speclca(ll
MeLode "sLream buerlng " penLru scanarea lnpuL-urllor
MresLe porLablllLaLea
Codlcarea slmbolurllor non-sLandard sl a caracLerelor
speclale poaLe normallzaL
13 Ch. Crlgoras
lnLerac(lunea Anallzorulul Lexlcal cu
arserul
Lexical
Analyzer
Parser
Source
Program
Token,
tokenval
Symbol Table
Get next
token
error error
14 Ch. Crlgoras
1okens, auerns, Lexemes
un -3G%0 esLe o grupare de unlL(l lexlcale (lexeme)
Lxemple: !", #$% (acesLea sunL numele unor Loken-url)
un +%"%7 esLe un slr de caracLere care alcLulesLe un
Loken parucular
Lxemple: abc , 123
N,T%$0: regul ce descrle mul(lmea lexemelor ce
apar(ln unul Loken
Lxemple: +'-%$B *$7,-B 9% +'-%$% &,* 4'E$%U , &%4V%01B
0%0*+B 9% 4'E$%"
13 Ch. Crlgoras
Sarclnlle unul anallzor lexlcal
1ransformarea unul program surs inLr-un slr de Loken - url
Asoclerea unul lexem ecrul Loken
unele Loken - url poL avea aLrlbuLe
5lrul de Loken - url va foloslL de parser penLru a ob(lne
sLrucLura programulul
AmblgulL(l:
lC818An: DO 5 I = 1,25 vs DO5I = 1.25
C++ : vector<vector<int>> myVector
vector < vector < int >> myVector
(vector < (vector < (int >> myVector)))
16 Ch. Crlgoras
ALrlbuLele Loken - urllor
Lexical analyzer
<id, y> <assign, > <num, 31> <+, > <num, 28> <*, > <id, x>
y := 31 + 28*x
Parser
token
tokenval
(token attribute)
17 Ch. Crlgoras
Speclcarea pauern - urllor penLru
Loken - url: !"#$%&'' $%)*+,-%
8aza:
! esLe expresle regulaL ce denoL llmba[ul [!}
, " # esLe expresle regulaL ce denoL llmba[ul [,}
uac $ sl & sunL expresll regulaLe ce denoL llmba[ele
J($) sl J(&) respecuv, aLuncl
$$& esLe expresle regulaL ce denoL llmba[ul J($) % J(&)
$& esLe expresle regulaL ce denoL llmba[ul J($)J(&)
$
*
esLe expresle regulaL ce denoL llmba[ul J($)
*
($) esLe expresle regulaL ce denoL llmba[ul J($)

un llmba[ denlL de o expresle regulaL se numesLe
+'7F,W $%)*+,-
18 Ch. Crlgoras
19
Speclcarea pauern - urllor penLru
Loken - url: .%/0'1'' $%)*+,-%
.%/0'1''+% $%)*+,-% lnLroduc nume conven(lonale d
l
:
9
1
& $
1
9
2
& $
2
.

9
0
& $
0

unde ecare $
'
esLe o expresle regulaL pesLe
# % [9
1
, 9
2
, ., 9
'-1
}
Crlce 9
W
in $
'
poaLe subsuLulL in $
'
ob(lnnd o
mul(lme echlvalenL de denl(ll
uescrlere lexlcal: un seL de denl(ll regulaLe
Ch. Crlgoras
20
Speclcarea pauern - urllor penLru
Loken - url: .%/0'1'' $%)*+,-%
Lxemple:
'()(* & A$B$.$Z$a$b$.$z
"!+!, & 0$1$.$9
!" & '()(* ( '()(*$"!+!, )
*
.%/0'1''+% $%)*+,-% nu sunL recurslve:
"!+!,- & "!+!, "!+!,-$"!+!, )$%5'-X
Ch. Crlgoras
21
Speclcarea pauern - urllor penLru
Loken - url: 23-,1'' #$%&4*$-,-%
Se folosesc deseorl:
$
+
= $$
*

$? = $$!
[a-z] = a$b$c$.$z
Lxemple:
"!+!, & [0-9]
#$% & "!+!,
+
(. "!+!,
+
)? ( / (0$1)? "!+!,
+
)?
Ch. Crlgoras
AmblgulL(l
T_For & for
T_Identifier & [A-Za-z_][A-Za-z0-9_]*
fort
fort f ort
for t f or t
for t f o rt
fo rt f o r t
fo r t
8egula "maxlmal munch": Se conslder cel mal lung prex ce
se poLrlvesLe cu o regul
8egula prlorlL(ll: se apllc regullle in ordlnea in care au fosL
scrlse
Ch. Crlgoras 22
.%/0'1'' $%)*+,-% 5' 6$,7,84'
stmt & if expr then stmt
$ if expr then stmt else stmt
$ !
expr & term relop term
$ term
term & id
$ num
if & if
then & then
else & else
relop & < $ <= $ <> $ > $ >= $ =
id & letter ( letter | digit )
*

num & digit
+
(. digit
+
)? ( E (+$-)? digit
+
)?
Gramatica
Defini!ii regulate
23 Ch. Crlgoras
.%/0'1'' $%)*+,-% -> .',)$,7% 9% :$,0;'1'%
0 2 1
6
3
4
5
7
8
return(relop, LE)
return(relop, NE)
return(relop, LT)
return(relop, EQ)
return(relop, GE)
return(relop, GT)
start <
=
>
=
>
=
other
other
*
*
9
start letter
10 11
* other
letter or digit
return(gettoken(),
install_id())
relop & <$<=$<>$>$>=$=
id & letter ( letter$digit )
*
24 Ch. Crlgoras
Codul
token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) {
state = 0;
lexeme_beginning++;
}
else if (c==<) state = 1;
else if (c===) state = 5;
else if (c==>) state = 6;
else state = fail();
break;
case 1:

case 9: c = nextchar();
if (isletter(c)) state = 10;
else state = fail();
break;
case 10: c = nextchar();
if (isletter(c)) state = 10;
else if (isdigit(c)) state = 10;
else state = 11;
break;

int fail()
{ forward = token_beginning;
swithc (start) {
case 0: start = 9; break;
case 9: start = 12; break;
case 12: start = 20; break;
case 20: start = 25; break;
case 25: recover(); break;
default: /* error */
}
return start;
}
Decides the
next start state
to check
23 Ch. Crlgoras
uaL o descrlere lexlcal L pesLe alfabeLul #,
prolecLarea unul anallzor lexlcal cuprlnde
urmLoarele procedurl:
1. Se consLrule;Le auLomaLul nlL (cu ! - Lranzl(ll) A,
asuel ca L(L) = L(A)
2. Se apllc AlgorlLmul descrls ;l se ob(lne auLomaLul
deLermlnlsL echlvalenL cu L, e acesLa A'.
3. (Cp(lonal) Se apllc o procedur adecvaL penLru a
ob(lne auLomaLul mlnlmal echlvalenL cu A'.
4. Se scrle un program care lmplemenLeaz evolu(la
auLomaLulul ob(lnuL.
Ch. Crlgoras 26
/2(%3'$4 S conslderm descrlerea lexlcal:
llLera & a b .z
clfra & 0 1 . 9
ldenucaLor & llLera (llLera clfra)*
semn & + -
numar & (semn ! ) clfra
+
operaLor & + - * / > = >= >
aslgnare & :=
doua_puncLe & :
cuvlnLe_rezervaLe & lf Lhen else
paranLeze & ) (
Ch. Crlgoras 27
Ch. Crlgoras 28
q
0
A
i

A
a
A
o
A
n

A
d
A
c
A
p
Ch. Crlgoras 29
litera
cifra
#
p

#
d

#
a

#
o

#
o

#
n

#
i
sau #
c

litera, cifra
), (
=
:
operator-{+,-}
+, -
cifra
cifra
0
1
7
6 5
4
3
2
lmplemenLare
buffer zona de memorle unde se incarc o unlLaLe lexlcal,
getnext(), store(c)

c = getnext(); empty(); stare = 0;
while(1){
switch(stare){
case 0:
if(isspace(c)){
c = getnext(); stare = 0;
}
if(isalpha(c)){
store(c); c = getnext(); stare = 1;
}
else if(isdigit(c)){
store(c); c = getnext(); stare = 2;
}
else if(c == '+' || c == '-'){
store(c); c = getnext(); stare = 3;
}

Ch. Crlgoras 30
Lex (llex)
8ell LaboraLorles 1973 M.L. Lesk ;l L. SchmldL
lnsLrumenL sLandard unl incepnd cu verslunea a 7-a
rolecLul Cnu al lunda(lel lree Soware" dlsLrlbule lLL
(lasL LLlcal Analyzer CeneraLor)
exlsL verslunl penLru slsLemele de operare uCS ;l lnuCS,
una dlnLre acesLea esLe CLL lansaL de Abraxax Soware lnc.
ooLex (eL anoLher Cb[ecL-CrlenLed Lex)
hup://yoolex.sourceforge.neL/
L+%"YY: hup://www.kohsuke.org/ex++blson++/ (varlanLele
8lson, llex care produc cod C++)
Ch. Crlgoras 31
Anallzor lexlcal cu Lex (llex)
lex or flex
compiler
lex
source
program
lex.l
lex.yy.c
input
stream
C
compiler
a.out
sequence
of tokens
lex.yy.c
a.out
32 Ch. Crlgoras
Speclca(ll Lex
C &#%4'/4,1'% +%" consL dln 3 pr(l:
9%/0'1'' $%)*+,-%K 9%4+,$,1'' A Z0 F+34*+ %{ %}
%%
$%)*+' 9% -$,9*4%$%
%%
#$34%9*$' ,*"'+',$% 9%/0'-% 9% *8+';,-3$
8egullle de Lraducere:
#
1
[ ,4830
1
}
#
2
[ ,4830
2
}
.
#
0
[ ,4830
0
}
33 Ch. Crlgoras
Lxpresll regulaLe in Lex
x match the character x
\. match the character .
string match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r
1
r
2
match r
1
then r
2
(concatenation)
r
1
|r
2
match r
1
or r
2
(union)
( r ) grouping
r
1
\r
2
match r
1
when followed by r
2

{d} match the regular expression defined by d
34 Ch. Crlgoras
Lxemple de expresll
/23*(-!5 65#"!"57! 8( -( 39,*!:(-8
abc abc
abc* ab abc abcc abccc ...
abc+ abc abcc abccc ...
a(bc)+ abc abcbc abcbcbc ...
a(bc)? a abc
[abc] unul dlnLre caracLerele: a, b, c
[a-z] orlce llLer, a-z
[a\-z] unul dln caracLerele: a, -, z
[-az] unul dln caracLerele: -, a, z
[A-Za-z0-9]+ unul sau mal mulLe caracLere alfanumerlce
[ \t\n]+ spa(ll
[^ab] orlce cu excep(la caracLerelor: a, b
[a^b] unul dln caracLerele : a, ^, b
[a|b] unul dln caracLerele : a, |, b
a|b unul dln caracLerele : a, b
Ch. Crlgoras 33
arlablle lex predenlLe
nume lunc(la
lnL yylex(vold) lexerul, esLe apelaL ln maln()
char *yyLexL polnLer la unlLaLea lexlcal(Loken) gslL
yyleng lunglmea Lokenulul gslL
yylval valoarea asoclaL Lokenulul
llLL *yyouL ;lerul de le;lre
llLL *yyln ;lerul de lnLrare
Ch. Crlgoras 36
Lxemplul 1
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf(%s\n, yytext); }
.|\n { }
%%
main()
{ yylex();
}
Contains
the matching
lexeme
Invokes
the lexical
analyzer
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
Translation
rules
37 Ch. Crlgoras
Lxemplul 2
%{
#include <stdio.h>
int ch = 0, wd = 0, nl = 0;
%}
delim [ \t]+
%%
\n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8d\n", nl, wd, ch);
}
Regular
definition
Translation
rules
38 Ch. Crlgoras
Lxemplul 3
%{
#include <stdio.h>
%}
digit [0-9]
letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(number: %s\n, yytext); }
{id} { printf(ident: %s\n, yytext); }
. { printf(other: %s\n, yytext); }
%%
main()
{ yylex();
}
Regular
definitions
Translation
rules
39 Ch. Crlgoras
Lxemplul 4
%{
# include <stdio.h>
%}
litera [a-zA-Z]
cifra [0-9]
cifre ({cifra})+
semn [+-]
operator [+*/<>=-]
spatiu [' \t\n]
%%
"if" | "then" | "else" {printf("%s cuvant rezervat\n", yytext);}
({litera})({litera}|{cifra})* {printf("%s identificator\n", yytext);}
{cifre}|({semn})({cifre}) {printf("%s numar intreg\n", yytext);}
{operator} {printf("%c operator\n", yytext[0]);}
\:\= {printf("%s asignare\n", yytext);}
\: {printf("%c doua puncte\n", yytext[0]);}
(\()|(\)) {printf("%c paranteza\n", yytext[0]);}
{spatiu} {}
. {printf("%c caracter ilegal\n", yytext[0]);}
%%
int main( ){
yylex( );
return 0;
}
Ch. Crlgoras 40
Lxemplul 3
%{ /* definitions of manifest constants */
#define LT (256)

%}
delim [ \t\n]
ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
%%
{ws} { }
if {return IF;}
then {return THEN;}
else {return ELSE;}
{id} {yylval = install_id(); return ID;}
{number} {yylval = install_num(); return NUMBER;}
< {yylval = LT; return RELOP;}
<= {yylval = LE; return RELOP;}
= {yylval = EQ; return RELOP;}
<> {yylval = NE; return RELOP;}
> {yylval = GT; return RELOP;}
>= {yylval = GE; return RELOP;}
%%
int install_id()

Return
token to
parser
Token
attribute
Install yytext as
identifier in symbol table
41 Ch. Crlgoras
Lxemplul 6
%{
int yylineno;
%}
%%
^(.*)\n printf("%4d\t%s", ++yylineno, yytext);
%%
int main(int argc, char *argv[]) {
yyin = fopen(argv[1], "r");
yylex();
fclose(yyin);
}
Ch. Crlgoras 42

S-ar putea să vă placă și