Sunteți pe pagina 1din 64

FOURTH SEMESTER EXAMINATION-2010

COMPILER DESIGN
1. Answer the following questions:

2*10

a) What is Syntax directed translation scheme? What are the different forms of
intermediate code used in compilation process?
Ans- A syntax directed translation scheme:
Describes the order and timing of the attribute computation
Embeds semantic rules to the grammar
Each semantic rule can only use information computed by already executed
semantic rules
It is a convenient way of describing an L- attributed definition
The different forms of Intermediate codes used in the compilation process are:

Three address code


Quadruples
Triples
Indirect triple

b) What is dead code elimination?


Ans- A variable is said to be dead at a place in a program if the value content in the
variable at that place is not used anywhere in the program. One advantage of copy
propagation is that it often turns the copy statement into dead code. Removal of the dead
assignment makes no difference to the result or meaning of a program. The elimination of
the dead variable from the code optimizes the code.
c) What is reduce-reduce (R-R) conflict in LR parser?
Ans: A reduce-reduce conflict occurs when the parser has two or more handles at the same
time on the top of the stack. What ever choice the parser makes is just as likely to be wrong
as not.

d) Why LR parsing is preferred over other parsers?


Ans: A large class of grammars can be parsed using LR (K) parsers where L stands for
left to right scanning of the input, R stands for constructing a rightmost derivation in
reverse and K is the number of input symbols of the look ahead to make parsing
decisions. LR parsing has several advantages for which it is preferred over other parsers:
It can recognize virtually all programming language constructs for which grammars
can be written.
It is the most general non-back tracking shift-reduce parser method
It can be implemented efficiently as other shift-reduce methods.
It can parse any grammar that a predictive parser can parse.
It can detect syntax error as early as possible on a left to right scan of the input.
e) What do you mean by Run time storage allocation?
Ans:
When we define a variable:
int count;
the size of the memory storage is already known before the program is actually
run- count occupies one cell of type int.
It is possible to defer the memory allocation until the program is run. This is
called run time storage allocation or dynamic memory allocation

f) Eliminate Left recursion from the following grammar:


E-> aa/abba/Eb/EE
Ans:
Eaa
Eabba
EEb
EEE
Elimination of left recursion can be done by inserting another symbol A such that
AA
AA/
Elimination of left recursion:
EE
EbE/EE/

EaaE/abbaE

g) Briefly describe one use of flow graphs in the compiler writing?


Ans:
A graph representation of three address statement called flow graph is useful
for understanding code generation algorithms even if the graph is not
explicitly constructed by a code generation algorithm.
A flow graph of a program can be extensively used as a vehicle to collect
information about the intermediate program.
Some register assignment algorithms use flow graphs to find the inner loop
where a program is expected to spend most of the time.
The nodes of the flow graph are basic blocks.

h) Explain the concept of Boot strapping in compiler design process?


Ans: Compilation of High Level Language is a long and a complex process. Thus writing a
compiler for HLL from the scratch is difficult and time inefficient. Auto reinforcing
technique called the design effort and to improve the quality.
Boot Strapping gives the concept of creating the compiler.
For creating any kind of compiler we need three languages:
i.
Source languageS: language for which we are creating
the compiler.
ii.
Target language T: language in which the object code
is generated.
iii. Implementation Language I: Language using which
we will write the compiler.

i) What is back patching in the process of Intermediate code generation?


Ans: When we generate go to statements in three address code, we face the problem that we
may not know the labels that control must go to at the time the jump statements are generated.
We can overcome this problem by generating a series of branching statements with the targets
of the jumps temporarily left unspecified. Each such statement will be put on a list of go to
statements whose labels will be filled in when the proper label can be determined. We can call
this subsequent filling in the labels as Back Patching.
To manipulate lists of labels we use three functions:

I. MAKE LIST (j): creates a new list containing only j an index into the array of
quadruples is being generated. MAKE LIST returns a pointer to the list it has made.
II. MERGE LIST (q1, q2): take the lists pointed to by q1 and q2, concatenates them into one
list and returns a pointer to the concatenated list.
III. BACKPATCH (q,j): insert j as the target label for each of the statements on the list
pointed by q.
j) Differentiate between Phase and a pass in compiler construction?
Ans: Conceptually a compiler operates in phases, each of which transforms the source program from
one representation to another. There are six phases in compiler construction.
Whereas Pass means a group of phases. Compilers are broken into several passes and each pass
of the compiler communicate with each other via a temporary file. The process of creating executable
code from a source code can involve several stages. This means when a source program is inputted to
the compiler it reads the source program, stores the value, variables, and functions etc in a temporary
file. This is done in one pass. Other passes of the compiler reads the data from the previous passes for
execution.
It depends on the designer regarding the number of passes to be created.

2) a. What is the role of intermediate code generation in overall compiler design?


Ans) Intermediate code generation is the phase in which in which the source program will be
converted into a compact form using one of the following three methods:
I. Triple address code
II. Post fix
III. Quadruple
The front end translates a source program into an intermediate representation. From
which the back end generates target code. Although a source program can be translated
directly into the target language. Some benefits of using machine independent
intermediate form are:
Intermediate code is closer to the target machine than the source language and
hence easier to generate code
It allows a variety of optimizations to be performed in a machine independent
way.
Typically intermediate code generation can be implemented via syntax directed
translation and thus can be folded into parsing by augmenting the code for the
parser.
b. Define operator precedence relation and operator precedence grammar. Construct
precedence function for the following precedence relation:

id

id

>
>
>
>
>

<
>
>
>
>

<
<
>
>
>

<
<
<
<
>

<
<
<
<

>
>
>
>
>

Ans) Operator precedence relation:


We can define operator precedence relation among terminals only.
It can be defined as follows:
If a and b are two terminals; then a<b & b>a implies that a is having less precedence
than b.
If a>b & b<a then it implies that a is having more precedence than b.
If a=b then both a & b are having equal precedence.
Operator precedence grammar:
A grammar is said to be operator precedence grammar if it satisfies the following conditions:
There should not be in the right side of the production.
There should not be two consecutive non-terminals.

3.
a) Discuss the construction of LR parser. What are the various data structures used in LR
parser design? Discuss the construction of ACTION [] and GOTO [] table?
Ans) construction of LR parser:
An LR parser is a general non- back tracking shift reduce parser.
An LR parser is a parser for context free grammars that reads input from Left to right and produces a
Right most Derivation. The term LR (k) parser is also used; here the k refers to the number of
unconsumed look ahead input symbols that are used in making parsing decisions. Usually k is 1 and
is often omitted. A context-free grammar is called LR (k) if there exists an LR (k) parser for it.
An LR parser is said to perform bottom up parsing because it attempts to deduce the top level
grammar productions by building up from the leaves.
The LR Parser consists of Input tape, Stack, Parsing program and a parsing table.
Construction of ACTION [] and GOTO [] table:
GOTO []:
GOTO contains only non terminals.
GOTO part will be characterized as:

GOTO [i/p] [grammar symbol]


ACTION []:
ACTION part contains the terminals given in the production rule.
ACTION table entry is done by shift, reduce, Accept and error operations.

b) Write the role of an error detector in compilation process? Discuss different errors in
lexical-phase?
Ans) The role of error detector is that when it encounters an error in any phase of the compiler it does
not halt the parsing process rather it continues with the parsing process. The role of the error detector
in the compilation process is:
Detect the errors
Handle and react.
Notify the calling module
Notify the user
Easy to program and maintain
The different errors in lexical phase are:
Character streams that do not match the token patterns.
Ill formed numeric literals and identifiers.
4.a) What is the necessity of optimization in compilation? Discuss the factors influencing
optimization?

Ans) The aim of code optimization is to rearrange the instructions given in a program so as to gain
the execution speed without changing the basic meaning or semantic of he source program. There
are two types of code optimization.
Machine independent optimizations can be performed independently of the target machine for
which the compiler is generating code; that is, the optimizations are not tied to the target machines
specific language or platform. Examples of machine independent optimizations are: elimination of
loop invariant computation, induction variable elimination and elimination of common sub
expression.
Machine dependent optimization requires knowledge of the target machine. An
attempt to generate object code that will utilize the target machines registers more efficiently is an
example of machine dependent code optimization.
The factors influencing code optimization are:
The machine
Architecture of the CPU i.e RISC or CISC

Number of functional unit(s)


Cache size
CPU register(s)

Sometimes,

the

time

taken

to

undertake

optimization

in

itself

may

be

an

issue.

Optimizing existing code usually does not add new features, and worse, it might add new bugs in
previously working code (as any change might). Because manually optimized code might sometimes
have less 'readability' than un optimized code, optimization might impact maintainability of it also.
Optimization comes at a price and it is important to be sure that the investment is worthwhile.
An automatic optimizer (or optimizing compiler, a program that performs code optimization) may
itself have to be optimized, either to further improve the efficiency of its target programs or else speed
up its own operation. A compilation performed with optimization 'turned on' usually takes longer,
although this is usually only a problem when programs are quite large.
In particular, for just-in-time compilers the performance of the run time compile component, executing
together with its target code, is the key to improving overall execution speed.

b) Explain the symbol table construct for the block structure programming language?
Ans) Scoping is one of the applications of the symbol table. There are two types of scopes:
Local
Global
Symbol table decides whether the particular symbol is local or global
For example:
Int X;
Main ( )
{
Int Y;
Add ( );
}
Add ( )
{
Int Z;
}
{ } indicates the block i.e the life span of the variable is limited to that block
For individual blocks symbol table is created
The global value is stored in the top block so that it can be accessed by all the blocks

Address of the nodes

Leaf nodes of every block stores the address of the next block.

Void main ( )
{
Int x;
C.out<< enter x;
c.in>> x;
{
Int y;
C.out<< enter y;
c.in>> y;
{
Int z;
c.out<< enter z;
c.in>> z;
}
}
}

statement
x

MAIN BLOCK

BLOCK 1
X
Y
BLOCK 2
x,y
Z

5. Consider the following grammar:


E ( L) / a
L L, E / E
a) Construct DFA of LR ( 0) items for this grammar
b) Construct SLR (1) parsing table
c) Show the parsing stack and actions of an SLR (1) parser for the input string ( ( a ) , a , ( a,
a ))
d) Is this grammar a LR (0) grammar? If not describe the LR (0) conflict.
Ans)
a) Augmented grammar:
E E
E (L)
Ea
LL,E
LE
Item Set I0:
E . E
E . (L)
E .a
L . L , E
L .E

I 0:

In item set (0) the symbols to be processed are E, ( , a , L


PROCESS E
I1 = GOTO (I0, E)
I1:

EE.
L E.
PROCESS (

I2 = GOTO (I0, ( )
I2:

E( . L )

L.L, E
L .E
PROCESS a

I3 = GOTO (I0, a )
I3:

Ea .
PROCESS L

I4 = GOTO (I0, L )
I4:

LL . , E

In item set (1) no symbols to be processed.


In item set (2) symbols to be processed are L , E
PROCESS L
I5 = GOTO(I2,L)
I5:

E( L . )
LL.,E
PROCESS E

GOTO ( I2, E )
Already processed in I 1
In item set (3) no symbols are to be processed
In item set (4) symbols to be processed is ,
PROCESS ,
I6 = GOTO (I4, ,)
I6:

LL , . E

In item set (5) symbols to be processed are ) ,


PROCESS )
I7 = GOTO (I5, ) )
I7:

E( L ) .
PROCESS ,

Already processed in I 6
In item set (6) symbols to be processed is E
PROCESS E
I8 = GOTO (I6, E )
I8:

LL , E .

In item set (7) no symbols are to be processed


In item set (8) no symbols are to be processed
DFA
I0

E
I1
11
(

I2

I3

I4

I5

I7

I6

I8

b)

State
I0
I1
I2
I3
I4
I5
I6
I7
I8

(
2

ACTION
)

a
3

GOTO
E
1

L
4

6
6
8

ACTION
State
I0
I1
I2
I3
I4
I5
I6
I7
I8

(
S2
R4
S5
R2

R1
R3

R4

S7

R4
S4
R2
S6
S6

R1
R3

R1
R3

R2

a
S3
R4

R2

R2

R1
R3

R1
R3

Accepted

c)
d) Yes the grammar is a LR (0 ) grammar

6
a) What is an activation record? Explain clearly the components of an activation record?
Ans) The information needed by a single information or single activation of a procedure is
managed using a contiguous block of storage called an Activation Record or Activation
Frame consisting of the collection of the fields.
The components of an activation record are:

The temporary values used during expression evaluation


Local data of a procedure
Saved machine states information( PCs, registers, return address)
Access links for access to non-local names
The actual parameters
The returned value used by called procedure to return a value of calling procedure.
Control link points to the activation record of the caller.

b) Construct DAG for the following sequence of statements


X=Y/Z
W=P*Y
Y=Y*Z
P=W-X

Ans)

*
W

7.
a) Consider the following context free grammar where S is the start symbol and the terminals
are a, ( )
S ( )
S a
S (A)
A S
AA, S

Show precisely why this grammar is not LL (1). Rewrite this grammar to make it
suitable for recursive descent parsing.
5
b) Discuss the importance of symbol table in compiler design. How is the symbol table
manipulated at various phase of compilation?
Ans) A Symbol table is a data structure used by a compiler to keep track of the scope, life and
binding information about names. These names are used to identify the various program elements
like variables, constants, procedures and the labels of statements. The symbol table is searched
every time a name is encountered in the source next. When a new name or new information about
an existing name is discovered the content of the symbol table changes.

Exactly what information is stored in the symbol table depends on many things. The programming
language will determine much of the information that is stored, but the target architecture will also
influence what data is stored. In fact some assumptions about how to produce code can affect what
values are stored in the table. Different information will need to be stored for constants, variables,
procedures, enumerations, type definitions and so on. What follows is a description of various
common declarative language constructs and typical classes of information symbol table would
record for those constructs.
CONSTANTS:
Constants are identifiers that represent a fixed value- one that can never be changed. Since
programmers will wish to access these values by name, the name must be stored. Finally, since the
values must be used properly in the type system, type information is also included. No run time
location needs to be stored for constants. These are typically stored right into the code stream by
the compiler at compilation time.

VARIABLES:

Variables are identifiers whose value may change between executions and during a single
execution of a program. They represent the contents of some memory location. The symbol table
needs to record the variables name as well as its allocated storage space at runtime. Typically this
location is stored as an offset relative to some position.
TYPES (user defined):
A user defined type is typically a conglomeration of 1 or more existing types. Types are accessed
by name and reference a type definition structure. Each structure will record important information
about itself, like its size, the name of its members or its upper or lower bounds. What information
is stored will depend on what type is being defined.
SUBPROGRAMS:
Procedures, functions and methods are named segments of code. Naturally, the symbol table
should record a procedures name. The type they return if any should be noted. When subprograms
are accessed at run time it is typically by their location in the code stream, thus the location of the
code generated for a given procedure should also be recorded. The formal parameters and local
variables of a function are separate identifiers in their own right, and should be stored in separate
records. Thus, they are treated much like the fields of a user defined record. They are stored as a
list of variable records separated, but accessible from, the main procedure record.
CLASSES:
Classes are abstract data types which restrict access to its members and provide convenient
language level polymorphism. They are really a special case of user defined types, and are
structurally no different. But it may be convenient to store information about classes above and
beyond that required for other user defined types. This includes the location of the default
constructor and destructor, and the address of the virtual function table.
INHERITANCE:
There may be different ways to perform inheritance, and a symbol table record is needed to keep
track of which classes are being inherited and exactly how inheritance is performed. A compiler
might consider whether shared or non shared inheritance is performed. A compiler might
consider whether keywords public, private and protected modify the visibility of inherited items,
and may be recorded with the inheritance information. A reference to the participating classes
could also be recorded in an inheritance structure.

ARRAYS:
Arrays represent a collection of uniformly typed elements that may be randomly accessed by
index. For each dimension of an array, the compiler will need to know about such things as the
lower boundary of the array i.e the lowest valid index, the upper boundary i.e the largest valid

index, the index size, index type, the total size and the type of the elements contained. When many
different types can be used to index an array, the index type and the size will also be recorded.
Finally, the total amount of space to be allocated for each dimension of an array should be stored.
RECORDS:
Records represent a collection of possibly heterogeneous members which can be accessed by
name. The symbol table probably needs to record each of the records members. The compiler
also needs to know the size of the record how much space to allocate for all the members. Each of
the fields of the record will probably be a reference to another symbol table record like a variable
or a type which may in turn reference another record or array.

CLASS:
Just like a record, the fields of a class can be conveniently stored in a separate record. Classes will
also store their methods, constructors, destructors and virtual function table in this complex
information structure.
MODULE:
Stores the module size, its name, parent, its members and a time stamp. The time stamp is used to
guarantee that load time that the models have been compiled in the correct order or are all up to
date.

8.
a) Find the FIRST and FOLLOW sets for each of the non-terminals in the following
grammar (in the grammar below denotes epsilon, the empty string).
AaBa
BbCb/bcD
CcCc/
DDeb/

b) Differentiate between syntax directed definition and syntax directed translation scheme?
Ans)
Syntax directed definition generalizes a context free grammar by associating a set of attribute with
each node in a parse tree. Each attribute gives some information about the node i.e. Syntax
directed definition is a generalization of a CFG in which each grammar symbol has an associated

set of attributes partitioned into two subsets called the synthesized & inherited attributes of that
grammar symbol.
The value of an attribute at a parse tree node is defined by the semantic rule associated
with the production used at that node. Semantic rules set up dependencies between attributes that
will be represented by a graph.
Syntax directed translation schemes indicate the order in which semantic rules
are to be evaluated, they allow some implementation details to be shown. Syntax directed
translation is a method of translating a string into a sequence of actions by attaching one such
action to each rule of a grammar. Thus, parsing a string of the grammar produces a sequence of
rule applications and syntax directed translation provides a simple way to attach semantics to any
such syntax.
Syntax directed translation refers to a method of compiler implementation where the
source language translation is completely driven by the parser. The parsing process and parse trees
are used to direct semantic analysis and translation of the source program. This can be a separate
phase of the compiler or we can augment our conventional grammar with information to control
the semantic analysis and translations. Such grammars are called attributed grammars.

c) Explain, why it is possible to design an independent lexical analyzer?


Ans)

FOURTH SEMESTER EXAMINATION-2011

COMPILER DESIGN
1. Answer the following:

2* 10

a. Explain why is it possible to design an independent lexical analyzer?


b. Define and differentiate between compile time error and runtime error?
c. Explain the machine dependent and machine independent code optimization?
Ans: code optimization is of two types:
Machine dependent code optimization:

In this code optimization we require the knowledge of the target machine architecture i.e. the
register, addressing mode, clock speed etc.
Machine Independent code optimization:
This optimization can be performed independent of target machine. These are the program
transformations that improve the target code without taking into consideration of any properties of
the target machine.
d. Explain the difference between Bottom-up and Top-down parsing?
Ans: Bottom-up parsing is a process of reducing an input string say W to the start symbol of the
grammar by tracing out the right most derivation (RMD) of W in reverse order.
Bottom-up parsing involves the selection of a substring that matches the right side of the
production, whose reduction to the non-terminal on the left side of the production represents one
step along the reverse of a right most derivation.
Basically top-down parsing attempts to find the left most derivations for the input string
W, since string W can be scanned by the parser left to right, one symbol/token at a time and the
left most derivations generates the leaves of the parse tree in the left to right order, which matches
the input scan order.
e. What are the drawbacks of SLR (1) parser?
Ans: In the SLR parsing table if there are multiple entries, so it is possible that our parser will be
in an indeterministic situation which is not allowed.
So it becomes clear that SLR is less powerful LR parser since SLR (1) grammars constitute a
small subset of context free grammars.
f. What do you means by porting of a compiler?
Ans: Porting is a process of moving the code from one platform to another while making sure that
it works on the target platform also.
High level languages are designed to be portable that is the programs written in a high level
language can be run on any computer that has a compiler or interpreter for those particular
languages.
Porting of compiler means that the compiler must be modular, supporting separate compilation.
g. Describe the structure of LL parser?

Ans:

S
X
Y
Z
$

X+Y$

PARSER
PROGRAM

Parse Table

The main constituent of a LL parser is it uses a Stack which consists of the grammar symbols and an
input buffer that contains the input string.

h. Describe the various data structures used to create a symbol table?


Ans: The various data structures used to create a symbol table are:
Unordered List: An unordered list would enter each name sequentially as it is declared
Ordered List: In ordered list the names are ordered according to the character.
Binary Tree: Binary tree combine the first search time of an ordered array, O (log n)
time on the average, with an insertion easy of a linked list.
Hash Table: From efficiency point of view hash table are the best method. The hash
table consists of finding a numerical value for the identifier, perhaps some
combinations of the ASCII code as a number or even its bit code and then performing
some of the techniques used for hashing numbers.
Stack: Stack is also a data structure for a symbol table where a pointer is kept to the top
of the stack for each block. In this data structure, names are pushed onto the stack as
they are encountered when a block is completed that portion and a pointer to it are
moved so that the containing block names are completed.

i. Distinguish between syntax and semantics of a programming language? Explain which


parts of a compiler are primarily concerned with each?
Ans: Syntax of a programming language is the form of its expressions, statements and program
modules.
Semantic of a programming language is the meaning given to the various syntactic
structures.
Front end of the compiler are primarily concerned with the syntax and semantics of the
programming language.
j. What is the major functioning of the five main stages of a compiler?
Ans:

LEXICAL ANALYZER: This module has the task of separating the continuous string of
characters into distinctive groups that make sense. Such a group is called token. A token may
be composed of a single character or a sequence of characters. This sequence of characters is
called lexme.
SYNTAX ANALYSER: This is the module in which the overall structure is identified and
involves an understanding of the order in which the symbols in a program may appear. In this
process of analyzing each sentence, the parser builds abstract tree structure .parser will
generate a parse tree.
SEMANTIC ANALYZER: The semantic analyzer gathers the type information and checks
the tree produced by the syntax analyzer for the semantic errors. This phase also generates a
tree called Annonated tree.
INTERMEDIATE CODE GENERATION: After passing through the above three phases the
source program will pass through Intermediate code generation where it will be converted into
a compact form using one of the following three methods:
Three Address Code
Quadruple
Post fix notation
CODE OPTIMIZATION: It is an optional phase which optimizes the source code for
effective memory utilization. If the code is optimized then no further optimization is required.
TARGET CODE GENERATION: The final phase of the compiler is the generation of the
target code, consisting normally of relocatable machine code or assembly code. Memory
locations are selected for each of the variable used in the programs.

2.
a) For the following grammar, find the FIRST and FOLLOW sets of each of the non-terminals:
S aAB / bA/
A aAb /
B bB / c

Ans)
FIRST (S) = {a, b, }
FIRST (A) = {a, }

FIRST (B) = {b, }


FOLLOW (S) = {$}
S aAB is in the form of B so
FOLLOW (A) = FIRST ()
=
FIRST
= {b, c}
S aAB is in the form of B and FIRST (B) has so

(B)

FOLLOW (B) = FOLLOW (S)


= {$}
S bA
FOLLOW (A) = FOLLOW (S) = {$}
So finally after combining the FOLLOW of the non-terminals are:
FOLLOW (S) = {$}
FOLLOW (A) = {$, b, c}
FOLLOW (B) = {$}
b) Differentiate between syntax directed definition and syntax directed translation scheme?
Ans)
Syntax directed definition generalizes a context free grammar by associating a set of attribute with
each node in a parse tree. Each attribute gives some information about the node i.e. Syntax
directed definition is a generalization of a CFG in which each grammar symbol has an associated
set of attributes partitioned into two subsets called the synthesized & inherited attributes of that
grammar symbol.
The value of an attribute at a parse tree node is defined by the semantic rule associated
with the production used at that node. Semantic rules set up dependencies between attributes that
will be represented by a graph.
Syntax directed translation schemes indicate the order in which semantic rules are to be
evaluated, they allow some implementation details to be shown. Syntax directed translation is a
method of translating a string into a sequence of actions by attaching one such action to each rule
of a grammar. Thus, parsing a string of the grammar produces a sequence of rule applications and
syntax directed translation provides a simple way to attach semantics to any such syntax.
Syntax directed translation refers to a method of compiler implementation where the
source language translation is completely driven by the parser. The parsing process and parse trees

are used to direct semantic analysis and translation of the source program. This can be a separate
phase of the compiler or we can augment our conventional grammar with information to control
the semantic analysis and translations. Such grammars are called attributed grammars.

c) Test whether the following grammar is LL (1)?


S aAb
A cd/ef
Ans)
S aAb
A cd/ef

FIRST (S) = {a}


FIRST (A) = {c,e}
FOLLOW (S) = {$}
S aAb is in the form of B so
FOLLOW (A) = FIRST (b) = {b}

PREDICTIVE PARSER TABLE:

S
A

a
c
SaAb
Acd

Aef

As there are no multiple entries in the parsing table, so this grammar is a LL (1) grammar.

d) Explain the concept of boot strapping in compiler design process?


Ans) A compiler is a complex enough program that we would like to write it in a friendlier
language than assembly language. Even C compilers are written in C. using the facilities offered

by language to compile itself is the essence of bootstrapping. For boot strapping purposes, a
compiler is characterized by three languages: the source language S that it compiles, the target
language it T and the implementation language I that it is written in. We represent the three
languages using a T-diagram, because of its shape. The three languages S, I, and T may all be
quite different. For example, a compiler may run on one machine and produce target code for
another machine. Such a compiler is often called a cross-compiler.
Suppose we write a cross-compiler for a new language L in implementation language S to
generate code for machine N; that is we create LSN. If an existing compiler for S runs on machine
M and generates code for M, it is characterized by SMM. If LSN is run through SMM, we get a
compiler LMN that is a compiler from L to N that runs on M.
3.
a) Use T-diagram to describe the steps you would take to create a powerful compiler using a
quick dirty compiler?
b) Define and discuss the objectives of SDTS. What do you mean by underlying source
grammar? Explain with an example.

Ans) Syntax Directed Translation Schemes describe the order and timing of attribute
computation. Syntax directed translation schemes:
Embeds the semantic rules into the grammar
Each semantic rule can only use information computed by already executed semantic
rules
A translation scheme is a convenient way of describing an L-attributed definition.
It explains each production of the CFG according to the following rules:
1. If there is a production of the form X AB and X.i, A.i, B.i are the inherited
attributes of X, A, B respectively then:
A.i = F(X, i)
B.i = g(X.i, A.i)
Where A.i is the inherited attribute of A
2. If X.s, A.s, B.s are the synthesized attributes then: X.s = F(A.s, B.s)
3. If there is a production X then
X.s = X.i
They are independent to their successors.
4. The definitions must be written at the right side of the production
parenthesis like:

by using

A {B.i Rule} {X.s Rule}


{X.i Rule}
5. If there is no inherited attribute definition then the synthesized definition is
sufficient.

Two main issues of Syntax Directed Translation Schemes are:


Triggering execution of the semantic rules
Managing and accessing attributes value

Underlying source program means all the attributes or the semantic rules are
being attached to the source program
c) Construct the DAG for the following statement
Z=XY+X*Y*UV/W+X+V
Ans)
t1=X
t2=t1Y
t 3 = t 2 *Y
t4=t3*U
t5=V
t6=t5/W
t7=t4t6
t 8 = t1 + V
t9=t7+t8

t9
+

t8

t7
+

t6
t4
*

t5
t3
*

t2

t1

4.
a) Describe the contents of a symbol table. How is the symbol table involved in the
interactions between the different components of the compiler and in the error detection?
Give a simple example in each case.

Ans) Exactly what information is stored in the symbol table depends on many things. The
programming language will determine much of the information that is stored, but the target
architecture will also influence what data is stored. In fact some assumptions about how to
produce code can affect what values are stored in the table. Different information will need
to be stored for constants, variables, procedures, enumerations, type definitions and so on.
What follows is a description of various common declarative language constructs and typical
classes of information symbol table would record for those constructs.
CONSTANTS:

Constants are identifiers that represent a fixed value- one that can never be changed. Since
programmers will wish to access these values by name, the name must be stored. Finally,
since the values must be used properly in the type system, type information is also included.
No run time location needs to be stored for constants. These are typically stored right into
the code stream by the compiler at compilation time.
VARIABLES:
Variables are identifiers whose value may change between executions and during a single
execution of a program. They represent the contents of some memory location. The symbol
table needs to record the variables name as well as its allocated storage space at runtime.
Typically this location is stored as an offset relative to some position.
TYPES (user defined):
A user defined type is typically a conglomeration of 1 or more existing types. Types are
accessed by name and reference a type definition structure. Each structure will record
important information about itself, like its size, the name of its members or its upper or
lower bounds. What information is stored will depend on what type is being defined.
SUBPROGRAMS:
Procedures, functions and methods are named segments of code. Naturally, the symbol table
should record a procedures name. The type they return if any should be noted. When
subprograms are accessed at run time it is typically by their location in the code stream, thus
the location of the code generated for a given procedure should also be recorded. The formal
parameters and local variables of a function are separate identifiers in their own right, and
should be stored in separate records. Thus, they are treated much like the fields of a user
defined record. They are stored as a list of variable records separated, but accessible from,
the main procedure record.
CLASSES:
Classes are abstract data types which restrict access to its members and provide convenient
language level polymorphism. They are really a special case of user defined types, and are
structurally no different. But it may be convenient to store information about classes above
and beyond that required for other user defined types. This includes the location of the
default constructor and destructor, and the address of the virtual function table.
INHERITANCE:
There may be different ways to perform inheritance, and a symbol table record is needed to
keep track of which classes are being inherited and exactly how inheritance is performed. A
compiler might consider whether shared or non shared inheritance is performed. A
compiler might consider whether keywords public, private and protected modify the

visibility of inherited items, and may be recorded with the inheritance information. A
reference to the participating classes could also be recorded in an inheritance structure.
ARRAYS:
Arrays represent a collection of uniformly typed elements that may be randomly accessed by
index. For each dimension of an array, the compiler will need to know about such things as
the lower boundary of the array i.e the lowest valid index, the upper boundary i.e the largest
valid index, the index size, index type, the total size and the type of the elements contained.
When many different types can be used to index an array, the index type and the size will
also be recorded. Finally, the total amount of space to be allocated for each dimension of an
array should be stored.
RECORDS:
Records represent a collection of possibly heterogeneous members which can be accessed by
name. The symbol table probably needs to record each of the records members. The
compiler also needs to know the size of the record how much space to allocate for all the
members. Each of the fields of the record will probably be a reference to another symbol
table record like a variable or a type which may in turn reference another record or array.
CLASS:
Just like a record, the fields of a class can be conveniently stored in a separate record.
Classes will also store their methods, constructors, destructors and virtual function table in
this complex information structure.
MODULE:
Stores the module size, its name, parent, its members and a time stamp. The time stamp is
used to guarantee that load time that the models have been compiled in the correct order or
are all up to date.

b) Explain the machine dependent and machine independent code optimization. What are
their advantages?
Ans) Machine independent optimizations can be performed independently of the target
machine for which the compiler is generating code; that is, the optimizations are not tied to
the target machines specific language or platform. Examples of machine independent
optimizations are: elimination of loop invariant computation, induction variable elimination
and elimination of common sub expression.
Machine dependent optimization requires knowledge of the target machine.
An attempt to generate object code that will utilize the target machines registers more
efficiently is an example of machine dependent code optimization.

Advantages are:

5.
a) Explain the working principle of operator precedence parsing algorithm. Explain the
parsing action for the input string id 1 id 2 / id 3 * id 4 id 5 id 1 with reference to the
operator precedence relation table given below:

*
/

id

>
>
>
>
>

*
<
>
>
>
>

/
<
<
>
>
>

<
<
<
<
>

id
<
<
<
<

$
>
>
>
>
>

b) What information is recorded in the symbol table of a compiler for a block structured
language? Give examples of how this information is created and/or used at each stage of
compilation.
Ans)
Symbol table is a scratch pad where the compiler stores the information about the objects
in the program such as variables, functions and procedures.
It enables the compiler to do type checking and determine the scope of a variable.
There is no type compatibility constraint or scoping rules at run time.
No type error will occur when the program runs
A type system is said to be strongly typed if it passes only type safe programs
A language is strongly typed if its compiler is strongly typed
Scope rules of a language are used for specifying which declaration of a variable is
associated with a specific occurrence of the variable
Scope rules apply to variables, constants, new type definitions and functions
A set of statements enclosed within blocking symbols (BEGIN and END, { and }, etc.)
is called a block (compound statement)
Blocks nest inside other blocks
Blocks are either disjoint or nested
A block-structured language allows procedures/functions to nest within other
procedures/functions

6.
a) Construct LL ( 1 ) parsing table for the following grammar:

S aBDh
B cC
C bC /
D EF
E g /
F f /

Ans)

S aBDh
B cC
C bC /
D EF
E g /
F f /

FIRST (F) = {f, }


FIRST (E) = {g, }
FIRST (D) = FIRST (E) {} U FIRST (F)
= {g, } { } U FIRST (F)
= {g} U {f, }
= {g, f}
FIRST (C) = {b, }
FIRST (B) = {c}

FIRST (S) = {a}


FOLLOW

FOLLOW (S) = {$}


S aBDh is in the form of B so FOLLOW (B) is
FOLLOW (B) = FIRST (Dh)
= FIRST (D) {} U FIRST (h)
= {g, f} U {h}
= {g, f, h}
S aBDh is in the form of B so FOLLOW (D) is

FOLLOW (D) = FIRST (h)


= {h}
B cC is in the form of B so
FOLLOW (C) = FOLLOW (B) = {g, f, h}
C bC is in the form of B so
FOLLOW (C) = {g, f, h}
D EF is in the form of B so
FOLLOW (F) = FOLLOW (D) = {h}

LL (1) PARSING TABLE

S
B
C
D

a
c
SaBDh
BcC

CbC

C
DEF

C
DEF

E
F

Eg

E
Ff

b) Explain how the scope rules and the block structure of a programming language decide
the structure of the symbol table?
Ans) Scoping is one of the applications of the symbol table. There are two types of scopes:
Local
Global
Symbol table decides whether the particular symbol is local or global
For example:
Int X;
Main ( )
{
Int Y;
Add ( );
}
Add ( )
{
Int Z;
}
{ } indicates the block i.e the life span of the variable is limited to that block
For individual blocks symbol table is created
The global value is stored in the top block so that it can be accessed by all the blocks

X
Address of the nodes

Leaf nodes of every block stores the address of the next block.

Void main ( )
{
Int x;
C.out<< enter x;
c.in>> x;

{
Int y;
C.out<< enter y;
c.in>> y;
{
Int z;
c.out<< enter z;
c.in>> z;
}
}
}

Statement
X

MAIN BLOCK

BLOCK 1
x
y
BLOCK 2
x,y
z

7.
a) Construct the SLR parsing table for the following grammar:
E E+ T
E T
T T * F

T F
F id
L L,E / E
Ans) Augmented Grammar:
EE

------------------

(0)

E E+ T

------------------

(1)

ET

-----------------

(2)

TT * F

------------------

(3)

TF

------------------

(4)

Fid

------------------

(5)

LL, E

------------------

(6)

L E

------------------

(7)

Item Set I0 :
E .E
E .E + T
E .T
T .T * F
T . F
F . id
L .L, E
L .E

In item set (0) the symbols to be processed are E, T, F, id, L

PROCESS E
I1 = GOTO (I0, E)
I1:

EE.
E E. + T
L E.

PROCESS T
I2 = GOTO (I0, T)
I2:

ET.
T T. * F
PROCESS F

I3 = GOTO (I0, F)
I3:

TF.
PROCESS id

I4 = GOTO (I0, id)


I4:

Fid.
PROCESS L

I5 = GOTO (I0, L)

I5:

LL . , E

In item set (1) symbol to be processed is +


PROCESS +

I6 = GOTO (I1, +)
I6:

EE + . T
T .T * F
T .F
F . id

In item set (2) symbol to be processed is *

PROCESS *
I7 = GOTO(I2,*)
I7:

TT * . F
F . id

In item set (3) no symbols are to be processed


In item set (4) no symbols are to be processed
In item set (5) symbol to be processed is ,

PROCESS ,

I8 = GOTO(I5, ,)
I8:

LL , . E
E . E + T
E .T

In item set (6) symbols to be processed are T, F, id

PROCESS T

I9 = GOTO (I6, T)
I9:

EE + T .

PROCESS F
Already processed in I3
PROCESS id

Already processed in I4

In item set (7) symbols to be processed are F, id


PROCESS F
I10 = GOTO (I7, F)
I10:

TT * F .

PROCESS id

Already processed in I4
In item set (8) symbols to be processed are E, T

PROCESS E

I11 = GOTO (I8, E)


I11:

LL , E .

PROCESS T

Already processed in I2

In item set (9) no symbols to be processed.


In item set (10) no symbols to be processed
In item set (11) no symbols to be processed

State
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
I10
I11

ACTION
*
,

id
4

GOTO
E
1

T
2

F
3

6
7

8
4
4
11

ACTION
State
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
I10
I11

S6/R7
R2
R4
R5

R7
S7/ R2
R4
R5

R7
R2
R4
R5
S8

id
S4
R7
R2
R4
R5

$
accepted
R2
R4
R5

S4
S4
R1
R3
R6

R1
R3
R6

R1
R3
R6

R1
R3
R6

R1
R3
R6

L
5

b) What is the objective of intermediate code generation? What is the different form of
intermediate code generated by intermediate code generation phase?
Ans) The objective of Intermediate Code generation are:

Ease of re-targeting different machines.


Perform machine independent code optimization.
Create linear representation of a program
An intermediate representation spans the gap between the source and target
languages.
Implementable via syntax directed translation, so can be folded into the parsing
process.
The different forms of intermediate code generated by intermediate code generation
phase are:

Syntax trees
Post fix notation
Three address code
Qudruple
Triples
Indirect Triple

8.
a) What is the objective of intermediate code generation? Generate the three address code
for the following code segment:

Main ( )
{
int a = 1;
int b[10];
while (a<= 10)
b[a] = 2 ** a;
}
Ans)

Three Address Code:

a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.

a=1
t 1 = 10 * 4
t 2 = add (b) 4
t 3 = t 2[t 1]
if a <= 10 goto (7)
goto (12)
t4=a*4
t 5 = add (b) 4
t 6 = t 5[t4]
t7=2**a
t6=t7
exit

b) Find the canonical collection of sets of LR (1) items:

S AaAb
A BbBa
A
B
Ans) Augmented Grammar:
S S

------------------------------- (0)

S AaAb

------------------------------- (1)

A BbBa

------------------------------- (2)

------------------------------- (3)

-------------------------------- (4)

I0:

S . S, $
S . AaAb, $
S . BbBa, $
A . , a
B . , b

B S, , a$
FIRST (a) = FIRST ($) = {$}

In item set (0) symbols to be processed are S, A, B


PROCESS S
I 1: GOTO (I 0, S)
I 1 = SS. , $
PROCESS A
I 2: GOTO (I 0, A)
I 2 = SA.aAb , $

PROCESS B
I 3: GOTO (I 0, B)
I 3 = SB. bBa , $
In item set (1) no symbols to be processed
In item set (2) symbols to be processed is a
PROCESS a
I 4: GOTO (I 2, a)
I4 = SAa. Ab , $

Aa B A, b, a$
FIRST (a) = FIRST (b$) = {b}

A . , b
In item set (3) symbols to be processed is b

PROCESS b
I 5: GOTO (I 3, b)
Bb, B B, a, a$
I 5 = SB b. Ba , $
B . , a

FIRST (a) = FIRST (a$) = {a}

In item set (4) symbols to be processed is A


PROCESS A
I 6: GOTO (I 4, A)
I 6 = SAaA. b , $

In item set (5) symbols to be processed is B


PROCESS B
I 7: GOTO (I 5, B)
I 7 = SBbB. a , $
In item set (6) symbols to be processed is b

PROCESS b
I 8: GOTO (I 6, b)
I 8 = SAaAb. , $

In item set (7) symbols to be processed is a


PROCESS a
I 9: GOTO (I 7, a)
I 9 = SBbBa. , $

LR (1) PARSING TABLE

I0
I1
I2
I3
I4
I5
I6
I7
I8
I9

a
R3

b
R4

A
2

B
3

S
1

Accepted
S4
S5
R3

R4

7
S8

S9
R1
R2

c) Write the quadruples, triples and indirect triples for the following expression:
X[i] := Y
X:= Y[i]
Ans)
QUADRUPLES
X[i]:= Y
t 1 = X[i]
t2=Y
t1=t2

Operator Operand Operand Result


1
2
[]
X
i
t1
=

t2

t2

t1

X = Y[i]
t 1 = Y[i]
t2=X
t2=t1
Operator Operand Operand Result
1
2
[]
Y
i
t1
=

t2

t1

t2

TRIPLE
X[i]:= Y
(0)
(1)

X = Y[i]

[]
=

X
Y

I
(0)

(0)
(1)

[]
=

X
(0)

I
Y

INDIRECT TRIPLE
POINTER

X[i]:= Y

(0)

(100)

(100)

[]

(1)

(200)

(200)

(100)

POINTER

X = Y[i]

(0)

(100)

(100)

[]

(1)

(200)

(200)

(100)

1. What is a compiler?
A compiler is a program that reads a program written in one language the source language and translates it into
an equivalent program in another language-the target language. The compiler reports to its user the presence of
errors in the source program.
2. What are the two parts of a compilation? Explain briefly.
Analysis and Synthesis are the two parts of compilation.
The analysis part breaks up the source program into constituent pieces and creates an intermediate
representation of the source program.
The synthesis part constructs the desired target program from the intermediate representation.

3. List the subparts or phases of analysis part.


Analysis consists of three phases:

Linear Analysis.
Hierarchical Analysis.
Semantic Analysis.

4. Depict diagrammatically how a language is processed.


5. What is linear analysis?
Linear analysis is one in which the stream of characters making up the source program is read from left to right
and grouped into tokens that are sequences of characters having a collective meaning.
Also called lexical analysis or scanning.
6. Find the no. of tokens in the following code segment.
float fun(char *s)
/*

Find a zero */

{
if(!strcmp(s,0))
return 0;
}
Ans-No of tokens is 22
7. Find the no. of tokens in the following code segments.
(a) printf(i=%d,&i=%d,i,&i);
(b) int max(i,j)
int i,j;
{return (i>j?i:j);}
Ans-(a)10
(b)25
8. What is a symbol table?

A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the
identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data
from that record quickly.
Whenever an identifier is detected by a lexical analyzer, it is entered into the symbol table. The attributes of an
identifier cannot be determined by the lexical analyzer.
9. Mention some of the cousins of a compiler.
Cousins of the compiler are:
Preprocessors
Assemblers
Loaders and Link-Editors
10. List the phases that constitute the front end of a compiler.
The front end consists of those phases or parts of phases that depend primarily on the source language and are
largely independent of the target machine. These include
Lexical and Syntactic analysis
The creation of symbol table
Semantic analysis
Generation of intermediate code
A certain amount of code optimization can be done by the front end as well. Also includes error handling that
goes along with each of these phases.
11. Mention the back-end phases of a compiler.
The back end of compiler includes those portions that depend on the target machine and generally those
portions do not depend on the source language, just the intermediate language. These include
(i)Code optimization
(ii)Code generation, along with error handling and symbol- table operations.
12. Define compiler-compiler.
Systems to help with the compiler-writing process are often been referred to as compiler-compilers, compilergenerators or translator-writing systems.
Largely they are oriented around a particular model of languages , and they are suitable for generating
compilers of languages similar model.

13. List the various compiler construction tools.


The following is a list of some compiler construction tools:
Parser generators
Scanner generators
Syntax-directed translation engines
Automatic code generators
Data-flow engines
14. Differentiate tokens, patterns, lexeme.
Tokens- Sequence of characters that have a collective meaning.(group of characters with
logical meaning).
Patterns- There is a set of strings in the input for which the same token is produced as output.
This set of strings is described by a rule called a pattern associated with the token(rule for
group of characters to form tokens).
Lexeme- A sequence of characters in the source program that is matched by the pattern for a
token.(Actual character stream that represent the token).
15. List the operations on languages.
Union L U M ={s | s is in L or s is in M}
Concatenation LM ={st | s is in L and t is in M}
Kleene Closure L* (zero or more concatenations of L)
Positive Closure L+ ( one or more concatenations of L)
16. Write a regular expression for an identifier.
An identifier is defined as a letter followed by zero or more letters or digits.
The regular expression for an identifier is given as
letter (letter | digit)*
17. Mention the various notational shorthands for representing regular expressions.
(i)One or more instances (+)

(ii)Zero or one instance (?)


(iii)Character classes ([abc] where a,b,c are alphabet symbols denotes the regular expressions a | b
| c.)
(iv)Non regular sets
18. What is the function of a hierarchical analysis?
Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with
collective meaning.
Also termed as Parsing.
19. What does a semantic analysis do?
Semantic analysis is one in which certain checks are performed to ensure that components of a program
fit together meaningfully.
Mainly performs type checking.
20. List the various error recovery strategies for a lexical analysis.
Possible error recovery actions are:
(i)Panic mode recovery
(ii)Deleting an extraneous character
(iii)Inserting a missing character
(iv)Replacing an incorrect character by a correct character
(v)Transposing two adjacent characters
1. What are the benefits of intermediate code generation?
A Compiler for different machines can be created by attaching different back end to the
existing front ends of each machine.
A Compiler for different source languages can be created by proving different front ends for
corresponding source languages t existing back end.
A machine independent code optimizer can be applied to intermediate code in order to optimize
the code generation.
2. What are the various types of intermediate code representation?
There are mainly three types of intermediate code representations.
Syntax tree

Postfix
Three address code
3. Define backpatching.
Backpatching is the activity of filling up unspecified information of labels using appropriate semantic
actions during the code generation process.In the semantic actions the functions used are
mklist(i),merge_list(p1,p2) and backpatch(p,i)
4. Mention the functions that are used in backpatching.
(i)mklist(i) creates the new list. The index i is passed as an argument to this function where I is
an index to the array of quadruple.
(ii)merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It returns the
pointer to the concatenated list.
(iii)backpatch(p,i) inserts i as target label for the statement pointed by pointer p.
5. What is the intermediate code representation for the expression a or b and not c?
The intermediate code representation for the expression a or b and not c is the three address sequence
t1 := not c
t2 := b and t1
t3 := a or t2
6. What are the various methods of implementing three address statements?
The three address statements can be implemented using the following methods.
Quadruple : a structure with at most four fields such as operator(OP),arg1,arg2,result.
Triples : the use of temporary variables is avoided by referring the pointers in the symbol
table.
Indirect triples : the listing of triples has been done and listing pointers are used instead of
using statements.
7. Give the syntax-directed definition for if-else statement.
1. S if E then S1
E.true := new_label()

E.false :=S.next
S1.next :=S.next
S.code :=E.code | | gen_code(E.true : ) | | S1.code
2. S if E then S1 else S2
E.true := new_label()
E.false := new_label()
S1.next :=S.next
S2.next :=S.next
S.code :=E.code | | gen_code(E.true : ) | | S1.code| | gen_code(go to,S.next) |
|gen_code(E.false :) | | S2.code
8. Distinguish between compile time and run time environments .
Compile time environment includes
a)Declaration of variables.
b)Scope of variables.
c)Definition of procedures.
Run time environment includes
a)Binding of variables.
b)Life time of variables.
c)Activation of procedures.
9. Write the procedure to generate TAC.
a)Convert to postfix form.
b)Use the procedure of evaluation of the expression to get the three address code.
Ex.a*(b+c)/(d+e)
Postfixabc+*de+/
TAC--- t1=b+c

t2=a*t1
t3=d+e
t4=t2/t3
10. How you will evaluate the attributes in L-attributed definition.
a)Traverse the parse tree in depth first left to right (in postorder).
b)Evaluate inherited attribute when a node is visited for the first time.
c)Evaluate synthesized attribute when a node is visited for last time.
General evaluation order is i/p stringparse treedependency graphevaluation order.
1. Define parser.
Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with
collective meaning.
Also termed as Parsing.
2. Mention the basic issues in parsing.
There are two important issues in parsing.
Specification of syntax
Representation of input after parsing.
3. Why lexical and syntax analyzers are separated out?
Reasons for separating the analysis phase into lexical and syntax analyzers:

Simpler design.
Compiler efficiency is improved.
Compiler portability is enhanced.

4. Define a context free grammar.


A context free grammar G is a collection of the following
V is a set of non terminals
T is a set of terminals
S is a start symbol

P is a set of production rules


G can be represented as G = (V,T,S,P)
Production rules are given in the following form
Non terminal (V U T)*
5. Briefly explain the concept of derivation.
Derivation from S means generation of string w from S. For constructing derivation two things are
important.
i) Choice of non terminal from several others.
ii) Choice of rule from production rules for corresponding non terminal.
Instead of choosing the arbitrary non terminal one can choose
i) either leftmost derivation leftmost non terminal in a sentinel form
ii) or rightmost derivation rightmost non terminal in a sentinel form
6. Define ambiguous grammar.
A grammar G is said to be ambiguous if it generates more than one parse tree for some sentence of
language L(G).
i.e. both leftmost and rightmost derivations are same for the given sentence.
7. What is a operator precedence parser?
A grammar is said to be operator precedence if it possess the following properties:
1. No production on the right side is .
2. There should not be any production rule possessing two adjacent non terminals at the right hand side.
8. List the properties of LR parser.
1. LR parsers can be constructed to recognize most of the programming languages
for which the context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of class of
grammars that can be parsed using predictive parsers.
3. LR parsers work using non backtracking shift reduce technique yet it is
efficient one.

9. Mention the types of LR parser.


(i)SLR parser- simple LR parser
(ii)LALR parser- lookahead LR parser
(iii)Canonical LR parser
10. What are the problems with top down parsing?
The following are the problems associated with top down parsing:
(i)Backtracking is costly and slow.
(ii)Left recursion may leads to infinite loop.
(iii)Left factoring may lead to ambiguity(Dangling else problem).
(iv)Debugging is difficult.
11. Write the algorithm for FIRST and FOLLOW.
FIRST
1. If X is terminal, then FIRST(X) IS {X}.
2. If X is a production, then add to FIRST(X).
3. If X is non terminal and X Y1Y2..Yk is a production, then place a in FIRST(X) if for some i , a is
in FIRST(Yi) , and if is in all of FIRST(Y1),FIRST(Yk);then add to FIRST(X).
FOLLOW
1. Place $ in FOLLOW(S),where S is the start symbol and $ is the input right endmarker.
2. If there is a production A B, then everything in FIRST() except for is placed in
FOLLOW(B).
3. If there is a production A B, or a production A B where FIRST() contains , then
everything in FOLLOW(A) is in FOLLOW(B).
12. List the advantages and disadvantages of operator precedence parsing.
Advantages
This type of parsing is simple to implement.
Disadvantages

1. The operator like minus has two different precedence (unary and binary).Hence it is hard to handle tokens
like minus sign.
2. This kind of parsing is applicable to only small class of grammars.
13. What is dangling else problem?
Ambiguity can be eliminated by means of dangling-else grammar which is show below:
stmt if expr then stmt
| if expr then stmt else stmt
| other
14. Write short notes on YACC.
YACC is an automatic tool for generating the parser program.
YACC stands for Yet Another Compiler Compiler which is basically the utility available from UNIX.
Basically YACC is LALR parser generator.
It can report conflict or ambiguities in the form of error messages.
15. What is meant by handle pruning?
A rightmost derivation in reverse can be obtained by handle pruning.
If w is a sentence of the grammar at hand, then w = n, where n is the nth right-sentential form of some
as yet unknown rightmost derivation
S = 0 => 1=> n-1 => n = w
16. Define LR(0) items.
An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. Thus,
production A XYZ yields the four items
A.XYZ
AX.YZ
AXY.Z
AXYZ.
17. What is meant by viable prefixes?

The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser are called
viable prefixes. An equivalent definition of a viable prefix is that it is a prefix of a right sentential form that
does not continue past the right end of the rightmost handle of that sentential form.
18. Define handle.
A handle of a string is a substring that matches the right side of a production, and whose reduction to the
nonterminal on the left side of the production represents one step along the reverse of a rightmost
derivation.
A handle of a right sentential form is a production A and a position of where the string may be
found and replaced by A to produce the previous right-sentential form in a rightmost derivation of . That is
, if S =>Aw =>w,then A in the position following is a handle of w.
19. What are kernel & non-kernel items?
Kernel items, whish include the initial item, S .S, and all items whose dots are not at the left end.
Non-kernel items, which have their dots at the left end.
20. What is phrase level error recovery?
Phrase level error recovery is implemented by filling in the blank entries in the predictive parsing table
with pointers to error routines. These routines may change, insert, or delete symbols on the input and
issue appropriate error messages. They may also pop from the stack.
21. Differentiate between top down and bottom up parser.
Top down parser(TDP)

Bottom up parser(BUP)

(i) It creates parse tree starting from root and


proceeds to children .

(i) It creates parse tree starting from children


and proceeds to the root.

(ii)It uses left most derivation.

(ii)It uses reverse of right most derivation.

(iii)problem:when a non terminal has more


(iii)problem: when a handle is detected it is than one
alternative then it shouldhave criteria reduced .
to decide the right choice.
(iv)Parsing table size is small.

(iv)Parsing table size is bigger than TDP.

(v)Less power.

(v)High power.

(vi)Error detection is easy.

(vi) Error detection is difficult.

22. Under which conditions predictive parsing can be constructed for a grammar?
The grammar must be free from left recursion and should be left factored.

1. Mention the properties that a code generator should possess.


The code generator should produce the correct and high quality code. In other words, the code
generated should be such that it should make effective use of the resources of the target
machine.
Code generator should run efficiently.
2. List the terminologies used in basic blocks.
Define and use the three address statement a:=b+c is said to define a and to use b and c.
Live and dead the name in the basic block is said to be live at a given point if its value is
used after that point in the program. And the name in the basic block is said to be dead at
a given point if its value is never used after that point in the program.
3. What is a flow graph?
A flow graph is a directed graph in which the flow control information is added to the basic blocks.
The nodes to the flow graph are represented by basic blocks
The block whose leader is the first statement is called initial block.
There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the given
sequence. We can say that B1 is a predecessor of B2.
4. What is a DAG? Mention its applications.
Directed acyclic graph(DAG) is a useful data structure for implementing transformations on basic
blocks.
DAG is used in
Determining the common sub-expressions.
Determining which names are used inside the block and computed outside the block.
Determining which statements of the block could have their computed value outside the
block.
Simplifying the list of quadruples by eliminating the common su-expressions and not
performing the assignment of the form x := y unless and until it is a must.
5. Define peephole optimization.
Peephole optimization is a simple and effective technique for locally improving target code. This
technique is applied to improve the performance of the target program by examining the short sequence
of target instructions and replacing these instructions by shorter or faster sequence.

6. List the characteristics of peephole optimization.


Redundant instruction elimination
Flow of control optimization
Algebraic simplification
Dead code elimination
Use of machine idioms
7. How do you calculate the cost of an instruction?
The cost of an instruction can be computed as one plus cost associated with the source and destination
addressing modes given by added cost.
Instruction

cost

MOV R0,R1

MOV R1,M

SUB 5(R0),*10(R1)

8. What is a basic block? Define leader used in basic block and give one example.
A basic block is a sequence of consecutive statements in which flow of control enters at the beginning
and leaves at the end without halt or possibility of branching.
Leader
a)1st statement is a leader.
b)Target of a conditional or unconditional is a leader.
c)Statement that immediately follows a conditional or unconditional is a leader.
Statement starting from a leader upto the next leader,but not including the next leader is a basic
block.
Ex.

fact(x)
{int f=1;
for(i=2;i<=x;i++)
f=f*I;

return f;}
The TAC for the above code is
1.
2.
3.
4.
5.
6.
7.
8.

f=1
i=2
if(i>x) goto 8
f=f*i
t1=i+1
i=t1
goto 3
goto calling program
Here the leaders are statement 1,3,4 and 8.1st Block B1 consists of statement 1 and 2.Block B2
consists of only statement 3.Block B3 is from statement 4 to statement 7.Block B4 consists of
only statement 8.

9. How would you represent the following equation using DAG?


a:=b*-c+b*-c
10. Give some examples of SDT.
(1)To store type information into symbol table.
(2)To build syntax tree.
(3)To issue error messages.
(4)To perform consistency checks like type checking ,parameter checking etc.
(5)To generate intermediate code.
1. Mention the issues to be considered while applying the techniques for code optimization.
The semantic equivalence of the source program must not be changed.
The improvement over the program efficiency must be achieved without changing the
algorithm of the program.
2. What are the basic goals of code movement?
To reduce the size of the code i.e. to obtain the space complexity.
To reduce the frequency of execution of code i.e. to obtain the time complexity.
3. What do you mean by machine dependent and machine independent optimization?

The machine dependent optimization is based on the characteristics of the target machine for the
instruction set used and addressing modes used and registers used for the instructions to
produce the efficient target code. This also includes peephole optimization.
The machine independent optimization is based on the characteristics of the programming
languages for appropriate programming structure and usage of efficient arithmetic
properties in order to reduce the execution time. This includes loop optimization(code
motion, loop jamming, loop unrolling), dead code elimination, common sub-expression
elimination, constant propagation, constant folding, strength reduction etc.
4. What are the different data flow properties?
Available expressions
Reaching definitions
Live variables
Busy variables
5. Eliminate left recursion and left factor the following grammar.
Eaba|abba|Eb|EbE
Ans---Elimination of left recursion
EabaE1|abbaE1
E1bE1|bEE1|
Left factor
EabA
AaE1|baE1
E1bB|
BE1|EE1
6. Eliminate left recursion in more than one level.
SAa|b
AAc|Sd|
Ans-Substitute the productions of S in the second production of A.We get

SAa|b
AAc|Aad|bd|
Elimination of left recursion
SAa|b
AbdA1|A1
A1c A1|ad A1|
7. What is dynamic scoping?
In dynamic scoping a use of non-local variable refers to the non-local data declared in most recently called
and still active procedure. Therefore each time new findings are set up for local names called procedure. In
dynamic scoping symbol tables can be required at run time.
9. What is code motion?
Code motion is an optimization technique in which amount of code in a loop is decreased. This
transformation is applicable to the expression that yields the same result independent of the number of
times the loop is executed. Such an expression is placed before the loop.
Ex.-

while(i<100)
{
x=i*sin(A)/sin(B);
}

Can be written as
t=sin(A)/sin(B);
while(i<100)
{
x=i*t;
}

10. What are the properties of optimizing compiler?

The source code should be such that it should produce minimum amount of target code.
There should not be any unreachable code.
Dead code should be completely removed from source language.
The optimizing compilers should apply following code improving transformations on source language.
i) common sub-expression elimination
ii) dead code elimination
iii) code movement
iv) strength reduction
11. What are the various ways to pass a parameter in a function?
Call by value
Call by reference
Copy-restore
Call by name
12. Suggest a suitable approach for computing hash function.
(i)Using hash function we should obtain exact locations of name in symbol table.
(ii)The hash function should result in uniform distribution of names in symbol table.
(iii)The hash function should be such that there will be minimum number of collisions. Collision is such
a situation where hash function results in same location for storing the names.
13. What is the difference between S-attributed and L-attributed definitions?
S-attributed

L-attributed

1. Uses synthesized attributes only

1. Allows both synthesized and inherited attribute.


Each inherited attribute can inherit either from parent
or sibling only.

2. Semantic rules are placed at the

2. Semantic actions can be placed anywhere on r.h.s


of productions.

end of the production.

3.Translations are carried out

3.Carry out the translation by traversing the parse tree


depth first left to right.

during bottom up parsing.


14. What is dead code elimination?
The process of detecting the code that is useless and eliminating during its optimization is called dead
code elimination.
15. Draw the DAG for the following basic block
t1=a+b
t2=c+d
t 3 = e -t 2
X = t 1 -t 3

16. What is interprocedural analysis?Why interprocedural analysis is essential(What are the


applications of interprocedual analysis)?

A data-flow analysis that tracks information across procedures boundaries is said to be


interprocedural.Many analyses such as point-to analysis,can only be donein a meaningful way
if they are interprocedural.
Applications of interprocedural analysis are
(i)Virtual method invocation.
(ii)Pointer alias analysis.
(iii)Parallelization.
(iv)Detection of software errors and vulnerabilities.
(v)SQL injection
(vi)Buffer overflow.
17. What is a call site? What is a call graph?
Programs call procedures at certain points referred to as call sites.
A call graph for a program is a bipartite graph with nodes for call sites and nodes for procedures.An
edge goes from a call site node to a procedure node if that procedure may be called at the site.
18. What do you mean by flow sensitivity and context sensitivity?
A data flow analysis that produces facts that depend on location in the program is said to be flowsensitive.If the analysis produces facts that depend on the history of procedure calls is said to be context
sensitive.A data flow analysis can be either flow- or context-sensitive ,both or neither.
19. What is datalog?What are datalog rules?What is a datalog program?
Datalog is a language that uses a Prolog-like notation, but whose semantic is far simpler than that of
Prolog. The elements of Datalog are atoms of the form p(X1,X2,..,Xn).Here
(i) p is a predicate-a symbol that represents a type of statement such as a definition reaches the
beginning of a block.
(ii) X1,X2,..,Xn are terms such as variables and constants.
Rules are a way of expressing logical inferences. The form of a rule is
H:-B1 & B2 && Bn
The components are as follows:
(i)H and B1 , B2 , Bn are literals-either items or negated items.

(ii)H is the head and B1 , B2 , Bn form body of the rule


(iii)Each of the Bis is sometimes called a subgoal of the rule.
20. What is BDD(Binary Decision Diagram)?
A BDD is a representation of Boolean functions by rooted DAGs. The interior nodes correspond to
Boolean variables and have two children, low (representing truth value 0) and high(representing 1).A
truth assignment makes the represented function true if and only if the path from the root in which we
go to the low child if the variable at a node is 0 and to the high child otherwise, leads to the 1 leaf.
21. Explain the concept of bootstrapping in compiler design.
Bootstrapping is the process of designing a compiler in its source language.For bootstrapping a
compiler can be characterized by three languages: the source language (S), the target language (T), and
the implementation language (I).Implementation language means the language in which the compiler is
written.
. The three language S, I, and T can be quite different. Such a compiler is called cross-compiler

22. What do you mean by run time storage allocation?


Ans:
The runtime storage might be subdivided into
- Target code
- Data objects
- Stack to keep track of procedure activation
- Heap to keep all other information

23. What do you mean by postfix translations?

S-ar putea să vă placă și