Documente Academic
Documente Profesional
Documente Cultură
Programming
By
Aditya Bhardwaj
Unit 2 Syllabus Designed
• Compilers: 2.1. Introduction to various translators, 2.2. Various
phases of compiler, 2.3. Introduction to Grammars and finite
automata, 2.4. Bootstrapping for compilers, 2.2.1 Lexical Analysis
and syntax analysis, 2.2.2 Intermediate Code Generation, 2.2.3
Code optimization techniques, 2.2.4 Code generation, 2.3. Case
study :LEXX and YACC, 2.4.Design of a compiler in C++ as Prototype.
• Debuggers: Introduction to various debugging techniques, Case
Study: - Debugging in Turbo C++ IDE.
Session1:
Basic Terminologies
Basic Terminologies
• Parsing: It is the process of analyzing a stream of input in order to
determine its grammatical structure with respect to a given formal
grammar.
Interpreter Loader
Why Don’t Write Machine Code Directly ?
• To instruct, the hardware codes must be written in binary format, which
is simply a series of 1s and 0s.
• But, it would be a difficult task for computer programmers to write such
codes, which is why we have translators (compilers, assembler,
Interpreter) to convert high level language to machine code.
• A linker tool is used to link all the parts of the program together for
execution (executable machine code).
• A loader loads all of them into memory and then the program is
executed.
How yours program is executed?
Source Assembly
Programmer code
Program
Compiler Assembler
Machine
Code
Linker
Programmer
Combines all object code using
Does manual
libraries
Correction of
The code
Editor Debugger Loader
Debugging Execute under
Loader loads program into
results Control of
RAM
debugger
Execution on
the target machine
Session 3:
2.1. Introduction to various
translators
2.1. Introduction to various translators
• Source code: A program written in high-level language is called
as source code.
• To convert the source code into machine code, translators are
needed.
• Translator: Translator is used to translator the high-level
language program input into an equivalent machine language
program
2.1. Different type of translators (Contd..)
• Compiler
A compiler is a translator that converts high-level language to assembly language.
• Assembler
An assembler is a translator that converts the assembly language to machine-level
language.
2.1. Compiler Vs Interpreter (Contd..)
S. No. Compiler Interpreter
1 Performs the translation of a program as a Performs statement by statement
whole. translation.
2 Execution is faster. Execution is slower.
Source Target
Compiler
Program Program
Error messages
Output
Basic Steps
8
• Compiler:
The compiler passes the source code through various phases and generates the
target assembly code.
•Assembler:
It takes assembly language as input and converts into machine code or object code.
•Linker:
It combines all the object modules of a source code to generate an executable module.
4. Intermediate representation
• It is closer to the machine form, and is usually easy to produce. One such
representation: “three-address code.”
6. Code Generation
• Generate code in the assembly format.
Note: Just from the functionality itself we can give naming convention to
compiler phases.
Session 5
2.2.1 Lexical Analysis and syntax analysis,
2.2.2 Intermediate Code Generation,
2.2.3 Code optimization techniques,
2.2.4 Code generation
Phase-1: Lexical Analysis
The main functions of this phase are:
1. Identify the lexical units in source statement and produce output as
a sequence of tokens that the parser uses for syntax analysis.
2. Classify tokens into different lexical classes e.g. constants, reserved
words, variables etc. and enter them in different tables.
3. To build literal table, and uniform symbol table.
The lexical analyzer also called remover because it breaks high
level language program into a series of tokens, by removing any
whitespace or comments in the source code.
Phase-1: Lexical Analysis (Contd..)
• Tokens: In programming language, keywords, constants,
identifiers, strings, numbers, operators and punctuations
symbols can be considered as tokens.
Example 2. Suppose we pass a statement through lexical analyzer
• a = b + c ;
It will generate token sequence like this:
• Id1 = id2 + id3 ;
Where each id reference to it’s variable in the symbol table.
Example of tokens:
• For example, consider the program
Fig.2.1 Parse Tree via left-most derivation Fig.2.2 Parse Tree via right-most derivation
Ambiguous Grammar
• An ambiguous grammar is a context-free grammar for which there
exists a string that can have more than one leftmost derivation or parse
tree.
• Top-down parsing : A parser can start with the start symbol and
try to transform it to the input string. Example : LL Parsers.
E-> E+E
E
E-> id
E-> E*E
id + id * id
Top Down Parser
Grammars Rules:
Apply E-> E+E E
E-> E+E
E-> id
E-> E*E
id + id * id
Top Down Parser
Grammars Rules:
Apply E-> id E
E-> E+E
E-> id
E-> E*E
id + id * id
Top Down Parser
E-> E+E
E-> id
E-> E*E
Apply E-> E*E E
E
E E
id + id * id
Top Down Parser
E-> E+E
Apply E-> id E E-> id
E-> E*E
E
E E
id + id * id
Top Down Parser
E-> E+E
Apply E-> id E E-> id
E-> E*E
E
E E
id + id * id
Bottom-up Parser
• As the name suggests, bottom-up parsing starts with the input
symbols and tries to construct the parse tree up to the start
symbol.
• Two types:
2a. Recursive descent parsing
2b. Predictive parsing
Example of Bottom-up Parse
14
Grammar Rules:
E-> int+(E)
E-> int
E-> E+(E)
E
int
+ ( int ) + ( int )
Example of Bottom-up Parse
16
E E
int
+ ( int ) + ( int )
Example of Bottom-up Parse
17
E E
int
+ ( int ) + ( int )
Example of Bottom-up Parse
18
E E E
int
+ ( int ) + ( int )
Example of Bottom-up Parse
19
E E E
int
+ ( int ) + ( int )
Which Parser is More Powerful ?
• Bottom-up parser is more powerful compared to Top-down
parsing because:
Top down requires exponential time to complete their job.
Top down suffers from backtracking problem.
Phase-3: Semantic Analysis
MOV R1,Id3
R1,#1
MUL
R2,Id2
MOV
R1,R2
ADD
Id1,R1
Symbol-Table
26
Management
• The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.
• The data structure should be designed to allow the compiler to find the
record for each name quickly and to store or retrieve data from that record
quickly
• These attributes may provide information about the storage allocated for a
name, its type, its scope (where in the program its value may be used), and
in the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or
by reference), and the type returned.
new Val Id1 & attribute
old Val Id2 & attribute
fact Id3 &attribute
Error Handling
27
Routine:
• One of the most important functions of a compiler is the detection and
reporting of errors in the source program. The error message should allow
the programmer to determine exactly where the errors have occurred.
Errors may occur in all or the phases of a compiler.
• Whenever a phase of the compiler discovers an error, it must report the
error to the error handler, which issues an appropriate diagnostic message.
Both of the table-management and error-Handling routines interact with all
phases of the compiler.
One pass
28
compiler
• One pass compiler passes through the source code of each compilation unit
only once.
• Their efficiency is limited because they don’t produce intermediate codes
which can be refined easily.
• One pass compilers very common because of their simplicity.
• Check for semantic errors and generate code.
• They are faster then multi pass compilers.
• Also known as Narrow compiler.
• Pascal and C are both languages that allow one pass compilation.
Multi-pass
29
compilers
• The input is passed through certain phases in one pass. Then the output of
previous phases is passed through other phases in second pass and so on
until the desired output is generated.
• It requires less memory because each pass takes output of previous phase
as input.
• It may create one or more intermediate code.
• Also known as wide compiler.
• Modula-2 is a language whose structure requires that a compiler has
at
least two passes.
Front End vs Back End of a Compilers 30
The phases of a compiler are collected into front end and back end.
The FRONT END consists of those phases that depend primarily on
the source program. These normally include Lexical and Syntactic
analysis, Semantic analysis ,and the generation of intermediate code.
A certain amount of code optimization can be done by front end as
well.
The BACK END includes the code optimization phase and final
code generation phase, along with the necessary error handling and
symbol table operations.
The front end Analyzes the source program and produces
intermediate code while the back end Synthesizes the target program
from the intermediate code.
Cont…. 31
The front end phase consists of those phases that primarily depend
on source program and are independent of the target machine.
Back end phase of compiler consists of those phases which depend
on target machine and are independent of the source program.
Intermediate representation may be considered as middle end, as
it depends upon source code and target machine.
32
Session 6:
2.4 Bootstrapping for
Compilers
2.4. Bootstrapping for compilers
• Bootstrapping is a most important concept for building a new
compiler.
• To construct any new compiler, we requires three language:
1) Source language (S)
2) Target language (T)
3) Implementation language (I)
Bootstrapping for compilers (contd..)
• We represent the three language using the following diagram called T-diagram:
• Notation: T-diagram.
S T
SIT
Bootstrapping for compilers (contd..)
Example:
• Suppose we have Pascal compiler written in C language that takes Pascal code
and produce C as output.
• Now , create Pascal compiler which will convert Pascal code in C, but
implemented in C++.
• For this, we need converter (i.e., compiler) which can converts C to C++.
98
Bootstrapping for compilers (contd..)
P C P C
C++
C
C C++
• When compiler PcC is run through CxC++ we get new compiler Pc++C
• This process illustrated by T-diagram is called Bootstrapping.
C C++
C C++
• When compiler PcC is run through CxC++ we get new compiler Pc++C.
• This process illustrated by T-diagram is called Bootstrapping.
• Compile PcC + CxC++ = Pc++C
100
Cross-compiler Concept
Compilers are of two kinds:
– Native compilers are written in the same language as the target
language.
– Cross compilers are written in different language as the target
language.
Cross- Compiler (contd..)
• A cross compiler is a compiler capable of creating
executable code for a platform other than the one on which the
compiler is running.
• For example, a compiler that runs on a windows but generates
code that runs on Android (android studio software) is a cross
compiler.
Session 7:
2.3 Case Study: LEX and YACC
2.3: LEX and YACC
• Before 1975 writing a compiler was a very time-consuming
process. Then Lesk and Johnson published papers on LEX and
YACC which greatly simplify compiler design.
• LEX takes some pattern ab* and generate C code for lexical
analyzer.
Fig.2.1 Parse Tree via left-most derivation Fig.2.2 Parse Tree via right-most derivation
bas.l file overview
Steps for how LEX and YACC works ?
Step1: You want to create bas.exe (Turbo C) compiler.
Step2: Yacc reads the grammar descriptions from bas.y and generate
a parser function ‘yyparse,’ which is stored in ytab.c file.
Step3: Y.tab.h header file is given to LEX
Step4: Lex takes pattern matching rules from bas.l file and generate
yylex function which is store in lex.yy.c file.
Step5: Finally compiler and link Lexer and Parser together for creating
executable compiler file bas.exe
Commands to create compiler: bas.exe
Command 1 -To create y.tab.h, y.tab.c: yacc -d bas.y
a/b
Q3. Design a DFA for following regular expression (a/b)*abb
3
a b
a
a
a 2 b
a,b
1 5
a,b a
4
b
Subset Construction Method
Fig1. NFA
3
a b
a Step1
a
a 2 b
Construct a transition table showing all reachable
a,b states for every state for every input signal.
1 5
a,b a
4
b
Subset Construction Method
Fig1. NFA Fig2. Transition table
3
a b
a
a
a 2 b
a,b
1 5
a,b a
4
b
Subset Construction Method
Fig1. NFA Fig2. Transition table
3 q δ(q,a) δ(q,b)
a b 1 {1,2,3,4,5} {4,5}
a
a
a 2 b 2 {3} {5}
a,b
1 5 3 ∅ {2}
b
5 ∅ ∅
Subset Construction Method
Transition from state q with input a Transition from state q with input b
Fig1. NFA without λ-transitions Fig2. Transition table
3 q δ(q,a) δ(q,b)
a b Starts here 1 {1,2,3,4,5} {4,5}
a
a
a 2 b 2 {3} {5}
a,b
1 5 3 ∅ {2}
b
5 ∅ ∅
Subset Construction Method
4 {5} {4}
5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
2 {3} {5}
3 ∅ {2}
4 {5} {4}
5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) Starts with q δ(q,a) δ(q,b)
Initial state
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5}
2 {3} {5}
{4,5}
3 ∅ {2}
4 {5} {4}
5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) Starts with q δ(q,a) δ(q,b)
Initial state
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5}
2 {3} {5}
{4,5}
3 ∅ {2}
4 {5} {4}
5 ∅ ∅
Step3
Repeat this process(step2) until no more new states
are reachable.
Fig2. Transition table Fig3. Subset Construction table
4 {5} {4}
5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5}
4 {5} {4} 5
4
5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5
4
5 ∅ ∅
{3,5}
Fig2. Transition table Fig3. Subset Construction table
∅
2
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2
∅ ∅ ∅
2
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2
∅ ∅ ∅
2 3 5
3
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2
∅ ∅ ∅
2 3 5
Stops here as there are no more
reachable states
3 ∅ 2
Fig4. Resulting FA after applying Subset
Fig3. Subset Construction table Construction to fig1
a
q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 12345 b 245 a
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
35
{4,5} 5 4 a
a,b a
{2,4,5} {3,5} 4 b
a b
∅
5 ∅ ∅ 1
3
a,b b
4 5 4 b a
{3,5} ∅ 2
a
2
45 5 b
∅ ∅ ∅
b 4 a
2 3 5
3 ∅ 2
b
NFA to DFA Conversion End
Students Exercise
Q1. Differentiate between NFA and DFA. Is NFA
can be converted to DFA? If yes how?
Introduction to Automata
Grammars
Grammar in Compiler Overview
Introduction to Grammars
• According to Noam Chomsky, there are four types of grammars
− Type 0, Type 1, Type 2, and Type 3.
• The following table in next slide shows how they differ from
each other −
Grammars Description
Type - 3 Grammar
• Type-3 grammars generate regular languages. Type-3 grammars must
have a single non-terminal on the left-hand side and a right-hand side
consisting of a single terminal or single terminal followed by a single
non-terminal.
• The productions must be in the form X → a or X → aY
• where X, Y ∈ N (Non terminal)
• and a ∈ T (Terminal)
• The rule S → ε is allowed if S does not appear on the right side of any
rule.
• Example
• X → ε X → a | aY Y → b
Grammars Description
Type - 2 Grammar
• Type-2 grammars generate context-free languages.
• The productions must be in the form A → γ
• where A ∈ N (Non terminal)
• and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
• These languages generated by these grammars are be recognized by a
non-deterministic pushdown automaton.
• Example
• S → X a X → a X → aX X → abc X → ε
Grammars Description
Type - 1 Grammar
• Type-1 grammars generate context-sensitive languages. The
productions must be in the form
• αAβ→αγβ
• where A ∈ N (Non-terminal)
• and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
• The strings α and β may be empty, but γ must be non-empty.
• The rule S → ε is allowed if S does not appear on the right side of any
rule. The languages generated by these grammars are recognized by a
linear bounded automaton.
• Example
• AB → AbBc A → bcA B → b
Grammars Description
Type - 0 Grammar
• Type-0 grammars generate recursively enumerable languages. The
productions have no restrictions. They are any phase structure grammar
including all formal grammars.
• They generate the languages that are recognized by a Turing machine.
• The productions can be in the form of α → β where α is a string of
terminals and nonterminals with at least one non-terminal and α cannot
be null. β is a string of terminals and non-terminals.
• Example
• S → ACaB Bc → acB CB → DB aD → Db
Chomsky Hierarchy: Properties of Grammar
• This is a hierarchy, so every language of type 3 is also of types
2, 1 and 0; every language of type 2 is also of types 1 and 0
etc.
• The distinction between languages can be seen by examining
the structure of the production rules of their corresponding
grammar, or the nature of the automata which can be used to
identify them.
Session 9
Debuggers:
Introduction to various debugging techniques, Case
Study: - Debugging in Turbo C++ IDE.
How yours program is executed?
Source Assembly
Programmer code
Program
Compiler Assembler
Machine
Code
Linker
Programmer
Combines all object code using
Does manual
libraries
Correction of
The code
Editor Debugger Loader
Debugging Execute under
Loader loads program into
results Control of
RAM
debugger
Execution on
the target machine
Debugging
• It is a systematic process of spotting and fixing the number of
bugs, or defects, in a piece of software so that the software is
behaving as expected.
Debugging Techniques
1) Print Debugging: In this printf C function is used to identify
the error occurred during flow of execution of a process.
2) Remote debugging: It is the process of debugging a program
running on a system different from the debugger. To start
remote debugging, a debugger connects to a remote system
over a network. The debugger can then control the execution
of the program on the remote system and retrieve information
about its state.
Debugging Techniques (contd..)
3) Post-mortem debugging: It is debugging of the program after it
has already crashed. For example, analysis of memory dump
(or core dump) of the crashed process.
4) Delta Debugging: Delta Debugging is a methodology to
automate the debugging of programs using a scientific
approach of hypothesis-trial-result loop.
Session 10: Students Self-Assignment
1. Practice all the phase of compiler with example.
2. Practice examples for DFA construction.
3. Practice examples for NFA to DFA conversion.
4. Case Study: Debugging in Turbo C++ IDE
5. Design of a compiler in C++ as Prototype case study.
6. Practice whether a grammar is ambiguous or not using
parse tree ?
Thank You