Documente Academic
Documente Profesional
Documente Cultură
COMPILER DESIGN
1. Answer the following questions:
2*10
a) What is Syntax directed translation scheme? What are the different forms of
intermediate code used in compilation process?
Ans- A syntax directed translation scheme:
Describes the order and timing of the attribute computation
Embeds semantic rules to the grammar
Each semantic rule can only use information computed by already executed
semantic rules
It is a convenient way of describing an L- attributed definition
The different forms of Intermediate codes used in the compilation process are:
EaaE/abbaE
I. MAKE LIST (j): creates a new list containing only j an index into the array of
quadruples is being generated. MAKE LIST returns a pointer to the list it has made.
II. MERGE LIST (q1, q2): take the lists pointed to by q1 and q2, concatenates them into one
list and returns a pointer to the concatenated list.
III. BACKPATCH (q,j): insert j as the target label for each of the statements on the list
pointed by q.
j) Differentiate between Phase and a pass in compiler construction?
Ans: Conceptually a compiler operates in phases, each of which transforms the source program from
one representation to another. There are six phases in compiler construction.
Whereas Pass means a group of phases. Compilers are broken into several passes and each pass
of the compiler communicate with each other via a temporary file. The process of creating executable
code from a source code can involve several stages. This means when a source program is inputted to
the compiler it reads the source program, stores the value, variables, and functions etc in a temporary
file. This is done in one pass. Other passes of the compiler reads the data from the previous passes for
execution.
It depends on the designer regarding the number of passes to be created.
id
id
>
>
>
>
>
<
>
>
>
>
<
<
>
>
>
<
<
<
<
>
<
<
<
<
>
>
>
>
>
3.
a) Discuss the construction of LR parser. What are the various data structures used in LR
parser design? Discuss the construction of ACTION [] and GOTO [] table?
Ans) construction of LR parser:
An LR parser is a general non- back tracking shift reduce parser.
An LR parser is a parser for context free grammars that reads input from Left to right and produces a
Right most Derivation. The term LR (k) parser is also used; here the k refers to the number of
unconsumed look ahead input symbols that are used in making parsing decisions. Usually k is 1 and
is often omitted. A context-free grammar is called LR (k) if there exists an LR (k) parser for it.
An LR parser is said to perform bottom up parsing because it attempts to deduce the top level
grammar productions by building up from the leaves.
The LR Parser consists of Input tape, Stack, Parsing program and a parsing table.
Construction of ACTION [] and GOTO [] table:
GOTO []:
GOTO contains only non terminals.
GOTO part will be characterized as:
b) Write the role of an error detector in compilation process? Discuss different errors in
lexical-phase?
Ans) The role of error detector is that when it encounters an error in any phase of the compiler it does
not halt the parsing process rather it continues with the parsing process. The role of the error detector
in the compilation process is:
Detect the errors
Handle and react.
Notify the calling module
Notify the user
Easy to program and maintain
The different errors in lexical phase are:
Character streams that do not match the token patterns.
Ill formed numeric literals and identifiers.
4.a) What is the necessity of optimization in compilation? Discuss the factors influencing
optimization?
Ans) The aim of code optimization is to rearrange the instructions given in a program so as to gain
the execution speed without changing the basic meaning or semantic of he source program. There
are two types of code optimization.
Machine independent optimizations can be performed independently of the target machine for
which the compiler is generating code; that is, the optimizations are not tied to the target machines
specific language or platform. Examples of machine independent optimizations are: elimination of
loop invariant computation, induction variable elimination and elimination of common sub
expression.
Machine dependent optimization requires knowledge of the target machine. An
attempt to generate object code that will utilize the target machines registers more efficiently is an
example of machine dependent code optimization.
The factors influencing code optimization are:
The machine
Architecture of the CPU i.e RISC or CISC
Sometimes,
the
time
taken
to
undertake
optimization
in
itself
may
be
an
issue.
Optimizing existing code usually does not add new features, and worse, it might add new bugs in
previously working code (as any change might). Because manually optimized code might sometimes
have less 'readability' than un optimized code, optimization might impact maintainability of it also.
Optimization comes at a price and it is important to be sure that the investment is worthwhile.
An automatic optimizer (or optimizing compiler, a program that performs code optimization) may
itself have to be optimized, either to further improve the efficiency of its target programs or else speed
up its own operation. A compilation performed with optimization 'turned on' usually takes longer,
although this is usually only a problem when programs are quite large.
In particular, for just-in-time compilers the performance of the run time compile component, executing
together with its target code, is the key to improving overall execution speed.
b) Explain the symbol table construct for the block structure programming language?
Ans) Scoping is one of the applications of the symbol table. There are two types of scopes:
Local
Global
Symbol table decides whether the particular symbol is local or global
For example:
Int X;
Main ( )
{
Int Y;
Add ( );
}
Add ( )
{
Int Z;
}
{ } indicates the block i.e the life span of the variable is limited to that block
For individual blocks symbol table is created
The global value is stored in the top block so that it can be accessed by all the blocks
Leaf nodes of every block stores the address of the next block.
Void main ( )
{
Int x;
C.out<< enter x;
c.in>> x;
{
Int y;
C.out<< enter y;
c.in>> y;
{
Int z;
c.out<< enter z;
c.in>> z;
}
}
}
statement
x
MAIN BLOCK
BLOCK 1
X
Y
BLOCK 2
x,y
Z
I 0:
EE.
L E.
PROCESS (
I2 = GOTO (I0, ( )
I2:
E( . L )
L.L, E
L .E
PROCESS a
I3 = GOTO (I0, a )
I3:
Ea .
PROCESS L
I4 = GOTO (I0, L )
I4:
LL . , E
E( L . )
LL.,E
PROCESS E
GOTO ( I2, E )
Already processed in I 1
In item set (3) no symbols are to be processed
In item set (4) symbols to be processed is ,
PROCESS ,
I6 = GOTO (I4, ,)
I6:
LL , . E
E( L ) .
PROCESS ,
Already processed in I 6
In item set (6) symbols to be processed is E
PROCESS E
I8 = GOTO (I6, E )
I8:
LL , E .
E
I1
11
(
I2
I3
I4
I5
I7
I6
I8
b)
State
I0
I1
I2
I3
I4
I5
I6
I7
I8
(
2
ACTION
)
a
3
GOTO
E
1
L
4
6
6
8
ACTION
State
I0
I1
I2
I3
I4
I5
I6
I7
I8
(
S2
R4
S5
R2
R1
R3
R4
S7
R4
S4
R2
S6
S6
R1
R3
R1
R3
R2
a
S3
R4
R2
R2
R1
R3
R1
R3
Accepted
c)
d) Yes the grammar is a LR (0 ) grammar
6
a) What is an activation record? Explain clearly the components of an activation record?
Ans) The information needed by a single information or single activation of a procedure is
managed using a contiguous block of storage called an Activation Record or Activation
Frame consisting of the collection of the fields.
The components of an activation record are:
Ans)
*
W
7.
a) Consider the following context free grammar where S is the start symbol and the terminals
are a, ( )
S ( )
S a
S (A)
A S
AA, S
Show precisely why this grammar is not LL (1). Rewrite this grammar to make it
suitable for recursive descent parsing.
5
b) Discuss the importance of symbol table in compiler design. How is the symbol table
manipulated at various phase of compilation?
Ans) A Symbol table is a data structure used by a compiler to keep track of the scope, life and
binding information about names. These names are used to identify the various program elements
like variables, constants, procedures and the labels of statements. The symbol table is searched
every time a name is encountered in the source next. When a new name or new information about
an existing name is discovered the content of the symbol table changes.
Exactly what information is stored in the symbol table depends on many things. The programming
language will determine much of the information that is stored, but the target architecture will also
influence what data is stored. In fact some assumptions about how to produce code can affect what
values are stored in the table. Different information will need to be stored for constants, variables,
procedures, enumerations, type definitions and so on. What follows is a description of various
common declarative language constructs and typical classes of information symbol table would
record for those constructs.
CONSTANTS:
Constants are identifiers that represent a fixed value- one that can never be changed. Since
programmers will wish to access these values by name, the name must be stored. Finally, since the
values must be used properly in the type system, type information is also included. No run time
location needs to be stored for constants. These are typically stored right into the code stream by
the compiler at compilation time.
VARIABLES:
Variables are identifiers whose value may change between executions and during a single
execution of a program. They represent the contents of some memory location. The symbol table
needs to record the variables name as well as its allocated storage space at runtime. Typically this
location is stored as an offset relative to some position.
TYPES (user defined):
A user defined type is typically a conglomeration of 1 or more existing types. Types are accessed
by name and reference a type definition structure. Each structure will record important information
about itself, like its size, the name of its members or its upper or lower bounds. What information
is stored will depend on what type is being defined.
SUBPROGRAMS:
Procedures, functions and methods are named segments of code. Naturally, the symbol table
should record a procedures name. The type they return if any should be noted. When subprograms
are accessed at run time it is typically by their location in the code stream, thus the location of the
code generated for a given procedure should also be recorded. The formal parameters and local
variables of a function are separate identifiers in their own right, and should be stored in separate
records. Thus, they are treated much like the fields of a user defined record. They are stored as a
list of variable records separated, but accessible from, the main procedure record.
CLASSES:
Classes are abstract data types which restrict access to its members and provide convenient
language level polymorphism. They are really a special case of user defined types, and are
structurally no different. But it may be convenient to store information about classes above and
beyond that required for other user defined types. This includes the location of the default
constructor and destructor, and the address of the virtual function table.
INHERITANCE:
There may be different ways to perform inheritance, and a symbol table record is needed to keep
track of which classes are being inherited and exactly how inheritance is performed. A compiler
might consider whether shared or non shared inheritance is performed. A compiler might
consider whether keywords public, private and protected modify the visibility of inherited items,
and may be recorded with the inheritance information. A reference to the participating classes
could also be recorded in an inheritance structure.
ARRAYS:
Arrays represent a collection of uniformly typed elements that may be randomly accessed by
index. For each dimension of an array, the compiler will need to know about such things as the
lower boundary of the array i.e the lowest valid index, the upper boundary i.e the largest valid
index, the index size, index type, the total size and the type of the elements contained. When many
different types can be used to index an array, the index type and the size will also be recorded.
Finally, the total amount of space to be allocated for each dimension of an array should be stored.
RECORDS:
Records represent a collection of possibly heterogeneous members which can be accessed by
name. The symbol table probably needs to record each of the records members. The compiler
also needs to know the size of the record how much space to allocate for all the members. Each of
the fields of the record will probably be a reference to another symbol table record like a variable
or a type which may in turn reference another record or array.
CLASS:
Just like a record, the fields of a class can be conveniently stored in a separate record. Classes will
also store their methods, constructors, destructors and virtual function table in this complex
information structure.
MODULE:
Stores the module size, its name, parent, its members and a time stamp. The time stamp is used to
guarantee that load time that the models have been compiled in the correct order or are all up to
date.
8.
a) Find the FIRST and FOLLOW sets for each of the non-terminals in the following
grammar (in the grammar below denotes epsilon, the empty string).
AaBa
BbCb/bcD
CcCc/
DDeb/
b) Differentiate between syntax directed definition and syntax directed translation scheme?
Ans)
Syntax directed definition generalizes a context free grammar by associating a set of attribute with
each node in a parse tree. Each attribute gives some information about the node i.e. Syntax
directed definition is a generalization of a CFG in which each grammar symbol has an associated
set of attributes partitioned into two subsets called the synthesized & inherited attributes of that
grammar symbol.
The value of an attribute at a parse tree node is defined by the semantic rule associated
with the production used at that node. Semantic rules set up dependencies between attributes that
will be represented by a graph.
Syntax directed translation schemes indicate the order in which semantic rules
are to be evaluated, they allow some implementation details to be shown. Syntax directed
translation is a method of translating a string into a sequence of actions by attaching one such
action to each rule of a grammar. Thus, parsing a string of the grammar produces a sequence of
rule applications and syntax directed translation provides a simple way to attach semantics to any
such syntax.
Syntax directed translation refers to a method of compiler implementation where the
source language translation is completely driven by the parser. The parsing process and parse trees
are used to direct semantic analysis and translation of the source program. This can be a separate
phase of the compiler or we can augment our conventional grammar with information to control
the semantic analysis and translations. Such grammars are called attributed grammars.
COMPILER DESIGN
1. Answer the following:
2* 10
In this code optimization we require the knowledge of the target machine architecture i.e. the
register, addressing mode, clock speed etc.
Machine Independent code optimization:
This optimization can be performed independent of target machine. These are the program
transformations that improve the target code without taking into consideration of any properties of
the target machine.
d. Explain the difference between Bottom-up and Top-down parsing?
Ans: Bottom-up parsing is a process of reducing an input string say W to the start symbol of the
grammar by tracing out the right most derivation (RMD) of W in reverse order.
Bottom-up parsing involves the selection of a substring that matches the right side of the
production, whose reduction to the non-terminal on the left side of the production represents one
step along the reverse of a right most derivation.
Basically top-down parsing attempts to find the left most derivations for the input string
W, since string W can be scanned by the parser left to right, one symbol/token at a time and the
left most derivations generates the leaves of the parse tree in the left to right order, which matches
the input scan order.
e. What are the drawbacks of SLR (1) parser?
Ans: In the SLR parsing table if there are multiple entries, so it is possible that our parser will be
in an indeterministic situation which is not allowed.
So it becomes clear that SLR is less powerful LR parser since SLR (1) grammars constitute a
small subset of context free grammars.
f. What do you means by porting of a compiler?
Ans: Porting is a process of moving the code from one platform to another while making sure that
it works on the target platform also.
High level languages are designed to be portable that is the programs written in a high level
language can be run on any computer that has a compiler or interpreter for those particular
languages.
Porting of compiler means that the compiler must be modular, supporting separate compilation.
g. Describe the structure of LL parser?
Ans:
S
X
Y
Z
$
X+Y$
PARSER
PROGRAM
Parse Table
The main constituent of a LL parser is it uses a Stack which consists of the grammar symbols and an
input buffer that contains the input string.
LEXICAL ANALYZER: This module has the task of separating the continuous string of
characters into distinctive groups that make sense. Such a group is called token. A token may
be composed of a single character or a sequence of characters. This sequence of characters is
called lexme.
SYNTAX ANALYSER: This is the module in which the overall structure is identified and
involves an understanding of the order in which the symbols in a program may appear. In this
process of analyzing each sentence, the parser builds abstract tree structure .parser will
generate a parse tree.
SEMANTIC ANALYZER: The semantic analyzer gathers the type information and checks
the tree produced by the syntax analyzer for the semantic errors. This phase also generates a
tree called Annonated tree.
INTERMEDIATE CODE GENERATION: After passing through the above three phases the
source program will pass through Intermediate code generation where it will be converted into
a compact form using one of the following three methods:
Three Address Code
Quadruple
Post fix notation
CODE OPTIMIZATION: It is an optional phase which optimizes the source code for
effective memory utilization. If the code is optimized then no further optimization is required.
TARGET CODE GENERATION: The final phase of the compiler is the generation of the
target code, consisting normally of relocatable machine code or assembly code. Memory
locations are selected for each of the variable used in the programs.
2.
a) For the following grammar, find the FIRST and FOLLOW sets of each of the non-terminals:
S aAB / bA/
A aAb /
B bB / c
Ans)
FIRST (S) = {a, b, }
FIRST (A) = {a, }
(B)
are used to direct semantic analysis and translation of the source program. This can be a separate
phase of the compiler or we can augment our conventional grammar with information to control
the semantic analysis and translations. Such grammars are called attributed grammars.
S
A
a
c
SaAb
Acd
Aef
As there are no multiple entries in the parsing table, so this grammar is a LL (1) grammar.
by language to compile itself is the essence of bootstrapping. For boot strapping purposes, a
compiler is characterized by three languages: the source language S that it compiles, the target
language it T and the implementation language I that it is written in. We represent the three
languages using a T-diagram, because of its shape. The three languages S, I, and T may all be
quite different. For example, a compiler may run on one machine and produce target code for
another machine. Such a compiler is often called a cross-compiler.
Suppose we write a cross-compiler for a new language L in implementation language S to
generate code for machine N; that is we create LSN. If an existing compiler for S runs on machine
M and generates code for M, it is characterized by SMM. If LSN is run through SMM, we get a
compiler LMN that is a compiler from L to N that runs on M.
3.
a) Use T-diagram to describe the steps you would take to create a powerful compiler using a
quick dirty compiler?
b) Define and discuss the objectives of SDTS. What do you mean by underlying source
grammar? Explain with an example.
Ans) Syntax Directed Translation Schemes describe the order and timing of attribute
computation. Syntax directed translation schemes:
Embeds the semantic rules into the grammar
Each semantic rule can only use information computed by already executed semantic
rules
A translation scheme is a convenient way of describing an L-attributed definition.
It explains each production of the CFG according to the following rules:
1. If there is a production of the form X AB and X.i, A.i, B.i are the inherited
attributes of X, A, B respectively then:
A.i = F(X, i)
B.i = g(X.i, A.i)
Where A.i is the inherited attribute of A
2. If X.s, A.s, B.s are the synthesized attributes then: X.s = F(A.s, B.s)
3. If there is a production X then
X.s = X.i
They are independent to their successors.
4. The definitions must be written at the right side of the production
parenthesis like:
by using
Underlying source program means all the attributes or the semantic rules are
being attached to the source program
c) Construct the DAG for the following statement
Z=XY+X*Y*UV/W+X+V
Ans)
t1=X
t2=t1Y
t 3 = t 2 *Y
t4=t3*U
t5=V
t6=t5/W
t7=t4t6
t 8 = t1 + V
t9=t7+t8
t9
+
t8
t7
+
t6
t4
*
t5
t3
*
t2
t1
4.
a) Describe the contents of a symbol table. How is the symbol table involved in the
interactions between the different components of the compiler and in the error detection?
Give a simple example in each case.
Ans) Exactly what information is stored in the symbol table depends on many things. The
programming language will determine much of the information that is stored, but the target
architecture will also influence what data is stored. In fact some assumptions about how to
produce code can affect what values are stored in the table. Different information will need
to be stored for constants, variables, procedures, enumerations, type definitions and so on.
What follows is a description of various common declarative language constructs and typical
classes of information symbol table would record for those constructs.
CONSTANTS:
Constants are identifiers that represent a fixed value- one that can never be changed. Since
programmers will wish to access these values by name, the name must be stored. Finally,
since the values must be used properly in the type system, type information is also included.
No run time location needs to be stored for constants. These are typically stored right into
the code stream by the compiler at compilation time.
VARIABLES:
Variables are identifiers whose value may change between executions and during a single
execution of a program. They represent the contents of some memory location. The symbol
table needs to record the variables name as well as its allocated storage space at runtime.
Typically this location is stored as an offset relative to some position.
TYPES (user defined):
A user defined type is typically a conglomeration of 1 or more existing types. Types are
accessed by name and reference a type definition structure. Each structure will record
important information about itself, like its size, the name of its members or its upper or
lower bounds. What information is stored will depend on what type is being defined.
SUBPROGRAMS:
Procedures, functions and methods are named segments of code. Naturally, the symbol table
should record a procedures name. The type they return if any should be noted. When
subprograms are accessed at run time it is typically by their location in the code stream, thus
the location of the code generated for a given procedure should also be recorded. The formal
parameters and local variables of a function are separate identifiers in their own right, and
should be stored in separate records. Thus, they are treated much like the fields of a user
defined record. They are stored as a list of variable records separated, but accessible from,
the main procedure record.
CLASSES:
Classes are abstract data types which restrict access to its members and provide convenient
language level polymorphism. They are really a special case of user defined types, and are
structurally no different. But it may be convenient to store information about classes above
and beyond that required for other user defined types. This includes the location of the
default constructor and destructor, and the address of the virtual function table.
INHERITANCE:
There may be different ways to perform inheritance, and a symbol table record is needed to
keep track of which classes are being inherited and exactly how inheritance is performed. A
compiler might consider whether shared or non shared inheritance is performed. A
compiler might consider whether keywords public, private and protected modify the
visibility of inherited items, and may be recorded with the inheritance information. A
reference to the participating classes could also be recorded in an inheritance structure.
ARRAYS:
Arrays represent a collection of uniformly typed elements that may be randomly accessed by
index. For each dimension of an array, the compiler will need to know about such things as
the lower boundary of the array i.e the lowest valid index, the upper boundary i.e the largest
valid index, the index size, index type, the total size and the type of the elements contained.
When many different types can be used to index an array, the index type and the size will
also be recorded. Finally, the total amount of space to be allocated for each dimension of an
array should be stored.
RECORDS:
Records represent a collection of possibly heterogeneous members which can be accessed by
name. The symbol table probably needs to record each of the records members. The
compiler also needs to know the size of the record how much space to allocate for all the
members. Each of the fields of the record will probably be a reference to another symbol
table record like a variable or a type which may in turn reference another record or array.
CLASS:
Just like a record, the fields of a class can be conveniently stored in a separate record.
Classes will also store their methods, constructors, destructors and virtual function table in
this complex information structure.
MODULE:
Stores the module size, its name, parent, its members and a time stamp. The time stamp is
used to guarantee that load time that the models have been compiled in the correct order or
are all up to date.
b) Explain the machine dependent and machine independent code optimization. What are
their advantages?
Ans) Machine independent optimizations can be performed independently of the target
machine for which the compiler is generating code; that is, the optimizations are not tied to
the target machines specific language or platform. Examples of machine independent
optimizations are: elimination of loop invariant computation, induction variable elimination
and elimination of common sub expression.
Machine dependent optimization requires knowledge of the target machine.
An attempt to generate object code that will utilize the target machines registers more
efficiently is an example of machine dependent code optimization.
Advantages are:
5.
a) Explain the working principle of operator precedence parsing algorithm. Explain the
parsing action for the input string id 1 id 2 / id 3 * id 4 id 5 id 1 with reference to the
operator precedence relation table given below:
*
/
id
>
>
>
>
>
*
<
>
>
>
>
/
<
<
>
>
>
<
<
<
<
>
id
<
<
<
<
$
>
>
>
>
>
b) What information is recorded in the symbol table of a compiler for a block structured
language? Give examples of how this information is created and/or used at each stage of
compilation.
Ans)
Symbol table is a scratch pad where the compiler stores the information about the objects
in the program such as variables, functions and procedures.
It enables the compiler to do type checking and determine the scope of a variable.
There is no type compatibility constraint or scoping rules at run time.
No type error will occur when the program runs
A type system is said to be strongly typed if it passes only type safe programs
A language is strongly typed if its compiler is strongly typed
Scope rules of a language are used for specifying which declaration of a variable is
associated with a specific occurrence of the variable
Scope rules apply to variables, constants, new type definitions and functions
A set of statements enclosed within blocking symbols (BEGIN and END, { and }, etc.)
is called a block (compound statement)
Blocks nest inside other blocks
Blocks are either disjoint or nested
A block-structured language allows procedures/functions to nest within other
procedures/functions
6.
a) Construct LL ( 1 ) parsing table for the following grammar:
S aBDh
B cC
C bC /
D EF
E g /
F f /
Ans)
S aBDh
B cC
C bC /
D EF
E g /
F f /
S
B
C
D
a
c
SaBDh
BcC
CbC
C
DEF
C
DEF
E
F
Eg
E
Ff
b) Explain how the scope rules and the block structure of a programming language decide
the structure of the symbol table?
Ans) Scoping is one of the applications of the symbol table. There are two types of scopes:
Local
Global
Symbol table decides whether the particular symbol is local or global
For example:
Int X;
Main ( )
{
Int Y;
Add ( );
}
Add ( )
{
Int Z;
}
{ } indicates the block i.e the life span of the variable is limited to that block
For individual blocks symbol table is created
The global value is stored in the top block so that it can be accessed by all the blocks
X
Address of the nodes
Leaf nodes of every block stores the address of the next block.
Void main ( )
{
Int x;
C.out<< enter x;
c.in>> x;
{
Int y;
C.out<< enter y;
c.in>> y;
{
Int z;
c.out<< enter z;
c.in>> z;
}
}
}
Statement
X
MAIN BLOCK
BLOCK 1
x
y
BLOCK 2
x,y
z
7.
a) Construct the SLR parsing table for the following grammar:
E E+ T
E T
T T * F
T F
F id
L L,E / E
Ans) Augmented Grammar:
EE
------------------
(0)
E E+ T
------------------
(1)
ET
-----------------
(2)
TT * F
------------------
(3)
TF
------------------
(4)
Fid
------------------
(5)
LL, E
------------------
(6)
L E
------------------
(7)
Item Set I0 :
E .E
E .E + T
E .T
T .T * F
T . F
F . id
L .L, E
L .E
PROCESS E
I1 = GOTO (I0, E)
I1:
EE.
E E. + T
L E.
PROCESS T
I2 = GOTO (I0, T)
I2:
ET.
T T. * F
PROCESS F
I3 = GOTO (I0, F)
I3:
TF.
PROCESS id
Fid.
PROCESS L
I5 = GOTO (I0, L)
I5:
LL . , E
I6 = GOTO (I1, +)
I6:
EE + . T
T .T * F
T .F
F . id
PROCESS *
I7 = GOTO(I2,*)
I7:
TT * . F
F . id
PROCESS ,
I8 = GOTO(I5, ,)
I8:
LL , . E
E . E + T
E .T
PROCESS T
I9 = GOTO (I6, T)
I9:
EE + T .
PROCESS F
Already processed in I3
PROCESS id
Already processed in I4
TT * F .
PROCESS id
Already processed in I4
In item set (8) symbols to be processed are E, T
PROCESS E
LL , E .
PROCESS T
Already processed in I2
State
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
I10
I11
ACTION
*
,
id
4
GOTO
E
1
T
2
F
3
6
7
8
4
4
11
ACTION
State
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
I10
I11
S6/R7
R2
R4
R5
R7
S7/ R2
R4
R5
R7
R2
R4
R5
S8
id
S4
R7
R2
R4
R5
$
accepted
R2
R4
R5
S4
S4
R1
R3
R6
R1
R3
R6
R1
R3
R6
R1
R3
R6
R1
R3
R6
L
5
b) What is the objective of intermediate code generation? What is the different form of
intermediate code generated by intermediate code generation phase?
Ans) The objective of Intermediate Code generation are:
Syntax trees
Post fix notation
Three address code
Qudruple
Triples
Indirect Triple
8.
a) What is the objective of intermediate code generation? Generate the three address code
for the following code segment:
Main ( )
{
int a = 1;
int b[10];
while (a<= 10)
b[a] = 2 ** a;
}
Ans)
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
a=1
t 1 = 10 * 4
t 2 = add (b) 4
t 3 = t 2[t 1]
if a <= 10 goto (7)
goto (12)
t4=a*4
t 5 = add (b) 4
t 6 = t 5[t4]
t7=2**a
t6=t7
exit
S AaAb
A BbBa
A
B
Ans) Augmented Grammar:
S S
------------------------------- (0)
S AaAb
------------------------------- (1)
A BbBa
------------------------------- (2)
------------------------------- (3)
-------------------------------- (4)
I0:
S . S, $
S . AaAb, $
S . BbBa, $
A . , a
B . , b
B S, , a$
FIRST (a) = FIRST ($) = {$}
PROCESS B
I 3: GOTO (I 0, B)
I 3 = SB. bBa , $
In item set (1) no symbols to be processed
In item set (2) symbols to be processed is a
PROCESS a
I 4: GOTO (I 2, a)
I4 = SAa. Ab , $
Aa B A, b, a$
FIRST (a) = FIRST (b$) = {b}
A . , b
In item set (3) symbols to be processed is b
PROCESS b
I 5: GOTO (I 3, b)
Bb, B B, a, a$
I 5 = SB b. Ba , $
B . , a
PROCESS b
I 8: GOTO (I 6, b)
I 8 = SAaAb. , $
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
a
R3
b
R4
A
2
B
3
S
1
Accepted
S4
S5
R3
R4
7
S8
S9
R1
R2
c) Write the quadruples, triples and indirect triples for the following expression:
X[i] := Y
X:= Y[i]
Ans)
QUADRUPLES
X[i]:= Y
t 1 = X[i]
t2=Y
t1=t2
t2
t2
t1
X = Y[i]
t 1 = Y[i]
t2=X
t2=t1
Operator Operand Operand Result
1
2
[]
Y
i
t1
=
t2
t1
t2
TRIPLE
X[i]:= Y
(0)
(1)
X = Y[i]
[]
=
X
Y
I
(0)
(0)
(1)
[]
=
X
(0)
I
Y
INDIRECT TRIPLE
POINTER
X[i]:= Y
(0)
(100)
(100)
[]
(1)
(200)
(200)
(100)
POINTER
X = Y[i]
(0)
(100)
(100)
[]
(1)
(200)
(200)
(100)
1. What is a compiler?
A compiler is a program that reads a program written in one language the source language and translates it into
an equivalent program in another language-the target language. The compiler reports to its user the presence of
errors in the source program.
2. What are the two parts of a compilation? Explain briefly.
Analysis and Synthesis are the two parts of compilation.
The analysis part breaks up the source program into constituent pieces and creates an intermediate
representation of the source program.
The synthesis part constructs the desired target program from the intermediate representation.
Linear Analysis.
Hierarchical Analysis.
Semantic Analysis.
Find a zero */
{
if(!strcmp(s,0))
return 0;
}
Ans-No of tokens is 22
7. Find the no. of tokens in the following code segments.
(a) printf(i=%d,&i=%d,i,&i);
(b) int max(i,j)
int i,j;
{return (i>j?i:j);}
Ans-(a)10
(b)25
8. What is a symbol table?
A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the
identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data
from that record quickly.
Whenever an identifier is detected by a lexical analyzer, it is entered into the symbol table. The attributes of an
identifier cannot be determined by the lexical analyzer.
9. Mention some of the cousins of a compiler.
Cousins of the compiler are:
Preprocessors
Assemblers
Loaders and Link-Editors
10. List the phases that constitute the front end of a compiler.
The front end consists of those phases or parts of phases that depend primarily on the source language and are
largely independent of the target machine. These include
Lexical and Syntactic analysis
The creation of symbol table
Semantic analysis
Generation of intermediate code
A certain amount of code optimization can be done by the front end as well. Also includes error handling that
goes along with each of these phases.
11. Mention the back-end phases of a compiler.
The back end of compiler includes those portions that depend on the target machine and generally those
portions do not depend on the source language, just the intermediate language. These include
(i)Code optimization
(ii)Code generation, along with error handling and symbol- table operations.
12. Define compiler-compiler.
Systems to help with the compiler-writing process are often been referred to as compiler-compilers, compilergenerators or translator-writing systems.
Largely they are oriented around a particular model of languages , and they are suitable for generating
compilers of languages similar model.
Postfix
Three address code
3. Define backpatching.
Backpatching is the activity of filling up unspecified information of labels using appropriate semantic
actions during the code generation process.In the semantic actions the functions used are
mklist(i),merge_list(p1,p2) and backpatch(p,i)
4. Mention the functions that are used in backpatching.
(i)mklist(i) creates the new list. The index i is passed as an argument to this function where I is
an index to the array of quadruple.
(ii)merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It returns the
pointer to the concatenated list.
(iii)backpatch(p,i) inserts i as target label for the statement pointed by pointer p.
5. What is the intermediate code representation for the expression a or b and not c?
The intermediate code representation for the expression a or b and not c is the three address sequence
t1 := not c
t2 := b and t1
t3 := a or t2
6. What are the various methods of implementing three address statements?
The three address statements can be implemented using the following methods.
Quadruple : a structure with at most four fields such as operator(OP),arg1,arg2,result.
Triples : the use of temporary variables is avoided by referring the pointers in the symbol
table.
Indirect triples : the listing of triples has been done and listing pointers are used instead of
using statements.
7. Give the syntax-directed definition for if-else statement.
1. S if E then S1
E.true := new_label()
E.false :=S.next
S1.next :=S.next
S.code :=E.code | | gen_code(E.true : ) | | S1.code
2. S if E then S1 else S2
E.true := new_label()
E.false := new_label()
S1.next :=S.next
S2.next :=S.next
S.code :=E.code | | gen_code(E.true : ) | | S1.code| | gen_code(go to,S.next) |
|gen_code(E.false :) | | S2.code
8. Distinguish between compile time and run time environments .
Compile time environment includes
a)Declaration of variables.
b)Scope of variables.
c)Definition of procedures.
Run time environment includes
a)Binding of variables.
b)Life time of variables.
c)Activation of procedures.
9. Write the procedure to generate TAC.
a)Convert to postfix form.
b)Use the procedure of evaluation of the expression to get the three address code.
Ex.a*(b+c)/(d+e)
Postfixabc+*de+/
TAC--- t1=b+c
t2=a*t1
t3=d+e
t4=t2/t3
10. How you will evaluate the attributes in L-attributed definition.
a)Traverse the parse tree in depth first left to right (in postorder).
b)Evaluate inherited attribute when a node is visited for the first time.
c)Evaluate synthesized attribute when a node is visited for last time.
General evaluation order is i/p stringparse treedependency graphevaluation order.
1. Define parser.
Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with
collective meaning.
Also termed as Parsing.
2. Mention the basic issues in parsing.
There are two important issues in parsing.
Specification of syntax
Representation of input after parsing.
3. Why lexical and syntax analyzers are separated out?
Reasons for separating the analysis phase into lexical and syntax analyzers:
Simpler design.
Compiler efficiency is improved.
Compiler portability is enhanced.
1. The operator like minus has two different precedence (unary and binary).Hence it is hard to handle tokens
like minus sign.
2. This kind of parsing is applicable to only small class of grammars.
13. What is dangling else problem?
Ambiguity can be eliminated by means of dangling-else grammar which is show below:
stmt if expr then stmt
| if expr then stmt else stmt
| other
14. Write short notes on YACC.
YACC is an automatic tool for generating the parser program.
YACC stands for Yet Another Compiler Compiler which is basically the utility available from UNIX.
Basically YACC is LALR parser generator.
It can report conflict or ambiguities in the form of error messages.
15. What is meant by handle pruning?
A rightmost derivation in reverse can be obtained by handle pruning.
If w is a sentence of the grammar at hand, then w = n, where n is the nth right-sentential form of some
as yet unknown rightmost derivation
S = 0 => 1=> n-1 => n = w
16. Define LR(0) items.
An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. Thus,
production A XYZ yields the four items
A.XYZ
AX.YZ
AXY.Z
AXYZ.
17. What is meant by viable prefixes?
The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser are called
viable prefixes. An equivalent definition of a viable prefix is that it is a prefix of a right sentential form that
does not continue past the right end of the rightmost handle of that sentential form.
18. Define handle.
A handle of a string is a substring that matches the right side of a production, and whose reduction to the
nonterminal on the left side of the production represents one step along the reverse of a rightmost
derivation.
A handle of a right sentential form is a production A and a position of where the string may be
found and replaced by A to produce the previous right-sentential form in a rightmost derivation of . That is
, if S =>Aw =>w,then A in the position following is a handle of w.
19. What are kernel & non-kernel items?
Kernel items, whish include the initial item, S .S, and all items whose dots are not at the left end.
Non-kernel items, which have their dots at the left end.
20. What is phrase level error recovery?
Phrase level error recovery is implemented by filling in the blank entries in the predictive parsing table
with pointers to error routines. These routines may change, insert, or delete symbols on the input and
issue appropriate error messages. They may also pop from the stack.
21. Differentiate between top down and bottom up parser.
Top down parser(TDP)
Bottom up parser(BUP)
(v)Less power.
(v)High power.
22. Under which conditions predictive parsing can be constructed for a grammar?
The grammar must be free from left recursion and should be left factored.
cost
MOV R0,R1
MOV R1,M
SUB 5(R0),*10(R1)
8. What is a basic block? Define leader used in basic block and give one example.
A basic block is a sequence of consecutive statements in which flow of control enters at the beginning
and leaves at the end without halt or possibility of branching.
Leader
a)1st statement is a leader.
b)Target of a conditional or unconditional is a leader.
c)Statement that immediately follows a conditional or unconditional is a leader.
Statement starting from a leader upto the next leader,but not including the next leader is a basic
block.
Ex.
fact(x)
{int f=1;
for(i=2;i<=x;i++)
f=f*I;
return f;}
The TAC for the above code is
1.
2.
3.
4.
5.
6.
7.
8.
f=1
i=2
if(i>x) goto 8
f=f*i
t1=i+1
i=t1
goto 3
goto calling program
Here the leaders are statement 1,3,4 and 8.1st Block B1 consists of statement 1 and 2.Block B2
consists of only statement 3.Block B3 is from statement 4 to statement 7.Block B4 consists of
only statement 8.
The machine dependent optimization is based on the characteristics of the target machine for the
instruction set used and addressing modes used and registers used for the instructions to
produce the efficient target code. This also includes peephole optimization.
The machine independent optimization is based on the characteristics of the programming
languages for appropriate programming structure and usage of efficient arithmetic
properties in order to reduce the execution time. This includes loop optimization(code
motion, loop jamming, loop unrolling), dead code elimination, common sub-expression
elimination, constant propagation, constant folding, strength reduction etc.
4. What are the different data flow properties?
Available expressions
Reaching definitions
Live variables
Busy variables
5. Eliminate left recursion and left factor the following grammar.
Eaba|abba|Eb|EbE
Ans---Elimination of left recursion
EabaE1|abbaE1
E1bE1|bEE1|
Left factor
EabA
AaE1|baE1
E1bB|
BE1|EE1
6. Eliminate left recursion in more than one level.
SAa|b
AAc|Sd|
Ans-Substitute the productions of S in the second production of A.We get
SAa|b
AAc|Aad|bd|
Elimination of left recursion
SAa|b
AbdA1|A1
A1c A1|ad A1|
7. What is dynamic scoping?
In dynamic scoping a use of non-local variable refers to the non-local data declared in most recently called
and still active procedure. Therefore each time new findings are set up for local names called procedure. In
dynamic scoping symbol tables can be required at run time.
9. What is code motion?
Code motion is an optimization technique in which amount of code in a loop is decreased. This
transformation is applicable to the expression that yields the same result independent of the number of
times the loop is executed. Such an expression is placed before the loop.
Ex.-
while(i<100)
{
x=i*sin(A)/sin(B);
}
Can be written as
t=sin(A)/sin(B);
while(i<100)
{
x=i*t;
}
The source code should be such that it should produce minimum amount of target code.
There should not be any unreachable code.
Dead code should be completely removed from source language.
The optimizing compilers should apply following code improving transformations on source language.
i) common sub-expression elimination
ii) dead code elimination
iii) code movement
iv) strength reduction
11. What are the various ways to pass a parameter in a function?
Call by value
Call by reference
Copy-restore
Call by name
12. Suggest a suitable approach for computing hash function.
(i)Using hash function we should obtain exact locations of name in symbol table.
(ii)The hash function should result in uniform distribution of names in symbol table.
(iii)The hash function should be such that there will be minimum number of collisions. Collision is such
a situation where hash function results in same location for storing the names.
13. What is the difference between S-attributed and L-attributed definitions?
S-attributed
L-attributed