Compiler Khata

Compiler Design
Problem
No
Name of the problem
01.
Write a C program for developing a lexical analyzer (LA) that will eliminate white
spaces form a source program in c and collect numbers.
02.
spaces form a source program in c and collect numbers as token and then also
display the token value as attribute.
03.
Write a C program for developing a lexical analyzer (LA) that will recognize all
basic data type of C.
04.
Write a C program for developing a lexical analyzer (LA) that will recognize all
Keywords of C.
05.
spaces and comments form a C program.
06.
Write a C program for developing a lexical analyzer (LA) that will recognize
Variables of C a source program.
07.
Write a C program for developing a lexical analyzer (LA) that will generate token
for a given statement of C source program.
08.
Design a compiler front-end based on syntax-directed translation technique that will

function as an infix translator for a language consists of sequence of expressions
terminated by semicolon.
Problem No.01
Problem Name:
Write a C program for developing a lexical analyzer(LA) that will eliminate
white spaces from a source program in C and collect numbers.
Problem analysis:
Linear analysis is called lexical analysis or scanning. For example ,in lexical analysis the
character in the assignment statement
Position :=initial + rate *60
Would be grouped into the following tokens:
1.
2.
3.
4.
5.
6.
7.
The identifier position

The assignment symbol:=
The identifier initial
The plus sign.
The identifier the rate
The multiplication sign
The number 60.
The blanks separating the characters of these tokens would normally be eliminated during
lexical analysis.
The lexical analyzer is the first phase of compiler .Its main task is to read the input character
and produce as output a sequence of tokens that the parser uses for syntax analysis. This
interaction, summarized schematically in fig(a), is commonly implemented by making the
lexical analyzer be a subroutine or a coroutine of the parser. Upon receiving a get next
token command from the parser, the lexical analyzer reads input characters until it can
identify the next token.
source program
Lexical
anlyzer
parser
Symbol table
analyzer
Fig.(a): Interaction of lexical analyzer with parser.
Since the lexical analyzer is the part of the compiler that reads the source text, it may also
perform certain secondary tasks at the user interface. One such task is stripping out from the
source program comments and white space in the form of blank ,tab, and newline characters. The
lexical analyzer may keep track of the number of newline characters seen ,so that line number
can be associated with an error message.
The purpose of the lexical analyzer is to allow white space and numbers to appear within
expressions.
uses getchar()
to read character
Lexan()
Lexical
analyzer
pushes back c using

ungetc(c,stdin)
returns token to caller
Tokenval
Fig(b):implementing the interaction of source program

Figure (b):suggest how the lexical analyzer ,written as the function lexan in C. The routine
getchar and ungetc from standards include-file <stdio.h> take care of input buffering ; lexan
reads and pushes back input characters by calling the routines getchar and ungetc
respectively. With c declared to be a character, the pair of statements
c-getchar(); ungetc(c,stdin);
leaves the input stream undisturbed. The call of getchar assigns the next input character to c
; the call of ungetc pushes back the value c onto the standard stdin.
If the implementation language does not allow data structure to be returned from functions
,then tokens and their attributes have to be passed separately. The function lexan an integer
encoding of a tokens. A token , such as num , can then be encoded by an integer larger than
any integer encoding a character, say 256. We define the statement :
#define NUM 256
The function lexan returns NUM when a sequence of digits is seen in the input. A global
variable tokenval is set to the value of the sequence of digits. Thus , if a 7 is followed
immediately by a 6 in the input , tokenval is assigned the integer value 76.
Code
#include<stdio.h>
#include<ctype.h>
#include<conio.h>
void main()
{
char t,f;
int n;
FILE *f1, *f2;
f1=fopen("c:\\compile\\input.txt","r");
f2=fopen("c:\\compile\\output.txt","w");
while( (t=getc(f1)) !=EOF)

{
if(t==' ') ;
else if(isdigit(t) && f!='_')
if(65<=f && f<=90 || 97<=f && f<=122 )
{
while(isdigit(t)||65<=t && t<=90 || 97<=t && t<=122)
{
putc(t,f2);
t=getc(f1);
}
}
else
{
n=0;
while(isdigit(t))
{
putc(t,f2);
n=n*10+(t-48);
t=getc(f1);
}
printf("%d\n",n);
}
else putc(t,f2);
}
fclose(f1);
fclose(f2);
return(0);
}
INPUT:
void main()
{
FILE *f1,*f2;
long int a;
char c[100];
f1=fopen ("testinput.cpp","r");
f2=fopen("testoutput.cpp","w");
while(fscanf(f1,"%s",c)!=EOF)
{
int line=1;
if(c[0]=='\n')
{fprintf(f2,"\n",);line++;}
else if(!isdigit(c[0]))
/* reading value from file
*/
fprintf(f2,"%s",c);
else if(isdigit(c[0])
{
a=c[0]-'10';
int i=1;
j=120;
while(isdigit(c[i]))
{
a=a*10+c[i]-'0';
i++;
}
printf("Number %ld in line no. %d\n",a,line);
}}}
OUTPUT:
voidmain()
{
FILE*f1,*f2;
longinta;
charc[100];
f1=fopen("testinput.cpp","r");
while(fscanf(f1,"%s",c)!=EOF)/*readingvaluefromfile*/
{
intline=Num(1);
if(c[0]=='\n')
elseif(!isdigit(c[0]))
fprintf(f2,"%s",c);
elseif(isdigit(c[0])
{
a=c[0]-'Num(10)';
inti=Num(1);
j=Num(120);
while(isdigit(c[i]))
{
a=a*Num(10)+c[i]-'Num(0)';
i++;
}
printf("Number%ldinlineno.%d\n",a,line);
}}}
Result and Discussion:

This program has been written in C/C++ language and successfully eliminate white space from
a source program and collect number as Num.
Problem No.02
Problem Name:
Write a C program for developing a lexical analyzer(LA) that will eliminate white
spaces from a source program in C and collect numbers as token and then also
display the token and token value attribute.
Problem analysis:
1.
2.
3.
4.
5.
6.
7.

The plus sign.
The number 60.
The blanks separating the characters of these tokens would normally be eliminated during
lexical analysis.
source program
Lexical
anlyzer
parser
Symbol table
analyzer
Tokens: The smallest individual unit in a source program are known as token.
CODE
#include<stdio.h>
#include<ctype.h>
#include<conio.h>
void main()
{
clrscr();
char t,f;
int n;
FILE *f1, *f2;
printf(" Token
Token value as attributes\n-----------------------------------------");

{
if(t==' ') ;
else if(isdigit(t) && f!='_')
if(65<=f && f<=90 || 97<=f && f<=122 )
{
while(isdigit(t)||65<=t && t<=90 || 97<=t && t<=122)
{ putc(t,f2); t=getc(f1); }
putc(t,f2);
}
else
{
n=0;
while(isdigit(t))
{
putc(t,f2);
n=n*10+(t-48);
t=getc(f1);
}
printf("\n
num
if(t!=' ') putc(t,f2);

}
else putc(t,f2);
f=t;
}
fclose(f1);
fclose(f2);
getch();
}
%d",n);
INPUT:
void main()
{
FILE *f1,*f2;
long int a;
char c[100];
f1=fopen ("testinput.cpp","r");
{
int line=1;
if(c[0]=='\n')
else if(!isdigit(c[0]))
fprintf(f2,"%s",c);
else if(isdigit(c[0])
{
a=c[0]-'10';
int i=1;
j=120;
}
}
}
OUTPUT:
voidmain()
{
FILE*f1,*f2;
longinta;
charc[100];
f1=fopen("testinput.cpp","r");
{
intline=1;
if(c[0]=='\n')
elseif(!isdigit(c[0]))
fprintf(f2,"%s",c);
elseif(isdigit(c[0])
{
a=c[0]-'10';
inti=1;
j=120;
}
}
}
NUM
NUM
NUM
NUM
NUM
NUM
1
10
1
120
10
0

a source program and collect numbers as token and then also display the token and token value
as attributes.
Problem Name.03
Write a C program for developing a lexical analyzer(LA) that will recognize all
basic data types of C.
Problem analysis:
source program
Lexical
anlyzer
parser
Symbol table
analyzer
can be associated with an error message. The basic data types in a c program are int, float, char
,double, longint .
CODE
#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
clrscr();
char *ch;
FILE *f1;
while((fscanf(f1,"%s",ch)) !=EOF)
{
if(strcmp("int",ch)==0
||
strcmp("double",ch)==0)
strcmp("char",ch)==0
||
strcmp("float",ch)==0
||
printf("%s\n",ch);
}
fclose(f1);
getch();
}
INPUT:
int main()
{
int a,b,c;
float s;
chart s;
}
OUTPUT:
int
float
char
This program has been written in C/C++ language and that will successfully recognize all basic
data types of C.
Problem No.04
Problem Name:
Write a C program for developing a lexical analyzer(LA) that will recognize all
Keywords of C.
Problem analysis:
source program
Lexical
anlyzer
parser
Symbol table
analyzer
can be associated with an error message.The keyword of C language are
For ,auto ,if,else,break,case,char ,const,continue,default,do,double,enum,float,
goto,int,long,register,return,short,signed,sizeof,static,stuct,switch,typedef,union,unsigned,void,
volatile,while.
CODE
#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
clrscr();
char *t;
char *k[]={"auto","break","case","void","char","int","const","continue","default",
"do","double","else","enum","extren","float","if","while","for"};
int n,i;
FILE *f1, *f2;
while( (fscanf(f1,"%s",t)) !=EOF)
{
for(i=0;i<18;i++)
{
if(strcmp(t,k[i])==0);
printf("%s\n",t);
}
}
fclose(f1);
getch();}
INPUT:
#include <stdlib.h>
#include <stdio.h>
#include <values.h>
#include <time.h>
int main(void)
{
int i,j;
for(j=0;j<150;j++)
{
for(i=0;i<200;i++)
printf("%d\n", rand() % MAXINT);
}
return 0;
}
OUTPUT:
int
void
int
for
for
return

This program has been written in C/C++ language and that will successfully recognize all
keywords of C .
Problem No.05
Problem Name:
Write a C program for developing a lexical analyzer(LA) that will eliminate white
spaces and comments from a source program in C .
Problem analysis:
The plus sign.
The number 60.
The blanks separating the characters of these tokens would normally be eliminated during lexical
analysis.
interaction, summarized schematically in fig(a), is commonly implemented by making the lexical
analyzer be a subroutine or a coroutine of the parser. Upon receiving a get next token
command from the parser, the lexical analyzer reads input characters until it can identify the next
token.
source program
Lexical
anlyzer
parser
Symbol table
analyzer
CODE
#include<stdio.h>
#include<ctype.h>
#include<conio.h>
void main()
{
clrscr();
char t,t1;
int n,s;
FILE *f1, *f2;

{
if(t==' ')
;
else if(t=='/')
{
t=getc(f1);
if(t=='*')
{ s=5;
while(s)
{
t=getc(f1);
if(t=='*')
{t=getc(f1); if(t=='/') s=0;}
}
}
}
else putc(t,f2);
}
fclose(f1);
fclose(f2);
getch();
}
INPUT:
#include<stdio.h>
#include<conio.h>
void main()
{
clrscr();
int p,q,m,n;
printf("How many line ");
scanf("%d",&n);/* n is the number of input*/
printf("\n\n");
for(p=1;p<=n;p++)
{
for(q=1;q<=(n-p);q++)
printf(" ");
m=p;
for(q=1;q<=p;q++)
printf("%2d",(m++%10));
m-=2;
for(q=1;q<p;q++)
printf("%2d",(m--%10));
printf("\n");
}
getch();
}
OUTPUT:
#include<stdio.h>
#include<conio.h>
voidmain()
{
clrscr();
intp,q,m,n;
printf("Howmanyline");
scanf("%d",&n);
printf("\n\n");
for(p=1;p<=n;p++)
{
for(q=1;q<=(n-p);q++)
printf("");
m=p;
for(q=1;q<=p;q++)
printf("%2d",(m++%10));
m-=2;
for(q=1;q<p;q++)
printf("%2d",(m--%10));
printf("\n");
}
getch();
}

a source program and comments of C source program.
Problem No.06
Problem Name:
Write a C program for developing a lexical analyzer(LA) that will generate token
for a given statement of C source program.
Problem analysis:
source program
Lexical
anlyzer
parser
Symbol table
analyzer
can be associated with an error message. The smallest individual unit in a source program are
known as token.
CODING:
#include<stdio.h>
#include<string.h>
#include<ctype.h>
int keyword(char buf[]);
char
*key[]={"auto","break","case","char","const","continue","default","do","double","else","enum",
"extern","float","for","goto","if","int","long","register","return","short","signed","sizeof","static"
,"struct","switch","typedef","union","unsigned","void","volatile","while","\0"};
void main()
{
char c,buf[100];
FILE *f;
f=fopen("c6input.cpp","r");
c=getc(f);
printf("Token
Attribute value:\n");
while(c!=EOF)
{
int i=0;
if(isalpha(c))
{
buf[i]=c;i++;
c=getc(f);
while(isalpha(c)||isdigit(c)||c=='_')
{
buf[i]=c;
c=getc(f);
i++;
}
buf[i]='\0';
if(keyword(buf)==0)
printf("ID
%s\n",buf);
else
printf("%s
%s\n",buf,buf);
}
else if(isdigit(c))
{
int a=c-'0';
c=getc(f);
while(isdigit(c))
{
a=a* 10 +c-'0';
c=getc(f);
}
if(c=='.')
{
c=getc(f);
char b[10];int i=0;
while(isdigit(c))
{
b[i]=c;i++;
c=getc(f);
}
b[i]='\0';
printf("Num
%d.%s\n",a,b);
}
else
printf("Num
%d\n",a);
}
else if(c=='<'||c=='>'||c=='=')
{
char k=c;
c=getc(f);
if(c=='=')
{
printf("RE
%c%c\n",k,c);
c=getc(f);
}
else
printf("RE
%c\n",k);
}
else
{
if(c!='\n'&&c!=' ')
printf("Punchuation %c\n",c);
c=getc(f);
}
//c=getc(f);
}
fclose(f);
}
int keyword(char buf[])
{
int i=0;
while(*(key+i)!='\0')
{
if(strcmp(*(key+i),buf)==0)
return 1;
i++;
}
return 0;
}
INPUT:
(i<j)
do{
s=s+20.04;
}
int a_1,a4;
for(i=0;i<n;i++)
a_first+s;
OUTPUT:
Token
Attribute value:
Punchuation (
ID
i
RE
<
ID
j
Punchuation )
do
do
Punchuation {
ID
s
RE
=
ID
s
Punchuation +
Num
20.04
Punchuation ;
Punchuation }
int
int
ID
a_1
Punchuation ,
ID
a4
Punchuation ;
for
for
Punchuation (
ID
i
RE
=
Num
0
Punchuation ;
ID
i
RE
<
ID
n
Punchuation ;
ID
i
Punchuation +
Punchuation +
Punchuation )
ID
a_first
Punchuation +
ID
s
Punchuation ;

This program has been written in C/C++ language and successfully generate token for a given
statement of C source program.
Problem No.07
Problem Name:
Write a C program for developing a lexical analyzer(LA) that will recognize
variables of C a source program.
Problem analysis:
source program
Lexical
anlyzer
parser
Symbol table
analyzer
can be associated with an error message. The smallest individual unit in a source program are
known as token.
CODING:
#include<stdio.h>
#include<string.h>
#include<ctype.h>
int keyword(char buf[]);
char
*key[]={"auto","break","case","char","const","continue","default","do","double","else","enum","extern",
"float","for","goto","if","int","long","register","return","short","signed","sizeof","static","struct","switch",
"typedef","union","unsigned","void","volatile","while","\0"};
void main()
{
char c,buf[100];
FILE *f;
f=fopen("c7input.cpp","r");
c=getc(f);
while(c!=EOF){
int i=0;
if(isalpha(c))
{
buf[i]=c;i++;
c=getc(f);
while(isalpha(c)||isdigit(c)||c=='_')
{
buf[i]=c;
c=getc(f);
i++;
}
buf[i]='\0';
if(keyword(buf)==0)
printf("%s\n",buf);
}
c=getc(f);
}
fclose(f);
}
int keyword(char buf[])
{
int i=0;
while(*(key+i)!='\0')
{
if(strcmp(*(key+i),buf)==0)
{return 1;}
i++;
}
return 0;
}
INPUT:
int a_1,a4;
for(i=0;i<n;i++)
a_first+s;
OUTPUT:
a_1
a4
i
i
n
i
a_first
s

This program has been written in C/C++ language and successfully that will recognize variables
of C a source program.
Problem No.08
Problem Name:
Design a compiler front-end based on syntax-directed translation technique that
will function as an infix to postfix translator for a language consist of sequence of
expressions terminated by semicolon.
Problem analysis: In a compiler, linear analysis is called lexical analysis or scanning. The
character in the assignment
Pos := init +rate *60
The identifier
pos
The assignment symbol
:=
The identifier
init
The plus sign
The identifier
rate
The number
60
The blanks separating the characters of this tokens would normally be eliminated during
lexical analysis.
Description of the Translator:
The translator is designed using the syntax-directed translation scheme in Fig.6. the token
id represents a nonempty sequence of letters and digits beginning with a letter, num a
sequence of digits, and eof end-of-file
character. Tokens are separated by sequence of blanks, tabs, and newlines ( white space). The
attribute lexeme of the token id gives the character string forming the token; the attribute the
value of the token num gives he integer represented by the num.
start
list eof
list
expr; list
|E
expr
expr+term {print(+)}
| Expr-term {print(-)}
| term
term
term * factor {print(*)}

| term / factor {print(/)}
| term div factor {print(DIV)}
| term mod factor {print(MOD)}
| factor
factor
(expr)
| id
| num
Fig.6 Specification for infix-to- postfix translator.
The code for the translator is arranged into seven modules, each stored in a separate file.
Execution begins in the module main .c that consists of a call to init () for initialization followed
by a call to parse () for the translation. The remaining six modules are shown in fig.7 . There is
also a global header file global.h.
infix expression
init.c
symbol.c
lexer.c
parser.c
error.c
emitter
Postfix expression
Fig.7: Modules of infix to postfix translator.
The Lexical Analysis Module lexer.c
The lexical analyzer is a routine called lexan() that is called by the parser to find tokens. The
value of the attribute associated with the tokens is assigned to a global variable tokenval.
The following tokens are expected by the parser:
+ - * / DIV MOD () ID NUM DONE
Here ID represent an identifier, NUM a number, and DONE the end- of- file character. White
space is silently stripped out by the lexical analyzer. The following table shows the tokens and
attribute value for the corresponding lexeme
LEXEME
TOKEN
ATTRIBUTE VALUE
White space---Numeric value of sequence
Sequence of digits --NUM
Div --DIV
Mod --MOD
Other sequence of a letter then letters and
digits
End of- file character
Any other character ---
ID
DONE
That character
index into symbol table

NONE
Fig.8. Description of tokens

The lexical analyzer uses the symbol table routine lookup to determine whether an identifier
lexeme has been previously seen and the routine insert into store a new lexeme into the symbol
table. It also increment a global variable lineno every time it sees a new line character.
The parser module parser.c
We first eliminate left recursion from the translation scheme of fig.6. So that the under lining
grammar can be parsed with a recursive descent parser. The transformed scheme is shown in
fig.9.
Start
List
list eof
expr; list
|E
Expr
term moreterms
Moreterms
+ term {print (+)} more terms
| - term {print (-)} more terms
|E
Term
factor morefactors
Morefactors * factor{print (*)} morefactors
| / factor{print (/)} morefactors
| div factor{print (DIV)} morefactors
| mod factor{print (MOD)} morefactors
|E
Factor
(expr)
| id {print (id. lexeme)}
| num{print(num.value)}
Fig.9 Syntax-directed translation scheme after eliminating left recursion.
The Emitter Module emitter.c:
The emitter module consists of a single function emit (t,val) that generates the output for token t
with attribute value tval.
The Symbol Table Modules symbol.c and init.c:

The symbol table module symbol.c implements the data structure shown in fig.10
ARRAY symtable
lexptr
token
attributes
div
mod
id
id
EOS
EOS
EOS
EOS
ARRAY LEXEME
Fig.10: Symbol table and array for storing string.
The entries in the array symtable are pairs consisting of a pointer to the lexemes array and
an integer denoting the token stored there.
The operation insert(s,t) returns the symtable index for the lexeme s forming the
token t. the function lookup (s) return the index of the entry in symtable for the lexeme s or 0 if s
is not there.
The module init.c used to preload symtable with keywords. The lexeme and token
representations for all the keywords are stored in the array keywords, which has the same type as
the sytable array. The function init() goes sequentially througt the keyword array, using the
function insert to pnt the keywords in the symbol table. This arrangement allows us to change the
representation of the tokensa for keywords in a convenient way.
Postfix Notation:
The postfix notation for an expression E can be defined inductively as follows:
1. If E is a variable or constant, the postfix
notation for E is E itself.
2. If E is an expression of the form E1 op E2,
where op is any binary operator, then the
postfix notation for E is E1 E2 op, where
E1 and E2 are the postfix for E1 and E2,
respectively.
3. If E is an expression of the form (E1), then the
postfix notation for E1 is also the postfix
notation for E.
CODING:
#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
clrscr();
char *infix,*stack;
int len,top=0,i;
printf("Enter infix = ");
scanf("%s",infix);
printf("Postfix is = ");
len=strlen(infix);
for(i=0;i<len;i++)
{
if( 65<=infix[i]&& infix[i]<=90 || 97<=infix[i]&&infix[i]<=122)
printf("%c",infix[i]);
if(infix[i]=='(' )
{
top++;
stack[top]=infix[i];
}
if(infix[i]=='*' )
{
while(stack[top]=='*')
{
printf("%c",stack[top]);
top--;
}
top++;
}
if(infix[i]=='/')
{
while(stack[top]=='*' || stack[top]=='/')
{
top--;
}
top++;
}
if(infix[i]=='+' || infix[i]=='-')
{
while(stack[top]=='*' || stack[top]=='/' || stack[top]=='+' || stack[top]=='-')
{
top--;
}
top++;
}
if(infix[i]==')')
{
while(stack[top]!='(')
{
top--;
}
top--;
}
}
while(top!=0)
{
top--;
}
getch();
}
INPUT:
a+(b*c)
OUTPUT:
abc*+
This program has been written in C/C++ language and successfully that will function as an infix
to postfix translator of C a source program.

Compiler Khata

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Compiler Khata

Încărcat de

Drepturi de autor:

Formate disponibile

Compiler Design

Name of the problem

Design a compiler front-end based on syntax-directed translation technique that will

The identifier position

Fig.(a): Interaction of lexical analyzer with parser.

pushes back c using

returns token to caller

Fig(b):implementing the interaction of source program

while( (t=getc(f1)) !=EOF)

/* reading value from file

Result and Discussion:

The identifier position

Token value as attributes\n-----------------------------------------");

while( (t=getc(f1)) !=EOF)

if(t!=' ') putc(t,f2);

Result and Discussion:

Result and Discussion:

while( (t=getc(f1)) !=EOF)

Result and Discussion:

Result and Discussion:

Result and Discussion:

term * factor {print(*)}

Fig.6 Specification for infix-to- postfix translator.

index into symbol table

Fig.8. Description of tokens

Fig.9 Syntax-directed translation scheme after eliminating left recursion.

The Emitter Module emitter.c:

The Symbol Table Modules symbol.c and init.c:

Fig.10: Symbol table and array for storing string.

S-ar putea să vă placă și