Sunteți pe pagina 1din 177

Unit 2 System 1

Programming
By

Aditya Bhardwaj
Unit 2 Syllabus Designed
• Compilers: 2.1. Introduction to various translators, 2.2. Various
phases of compiler, 2.3. Introduction to Grammars and finite
automata, 2.4. Bootstrapping for compilers, 2.2.1 Lexical Analysis
and syntax analysis, 2.2.2 Intermediate Code Generation, 2.2.3
Code optimization techniques, 2.2.4 Code generation, 2.3. Case
study :LEXX and YACC, 2.4.Design of a compiler in C++ as Prototype.
 
• Debuggers: Introduction to various debugging techniques, Case
Study: - Debugging in Turbo C++ IDE.
Session1:
Basic Terminologies
Basic Terminologies
• Parsing: It is the process of analyzing a stream of input in order to
determine its grammatical structure with respect to a given formal
grammar.

• Parse tree: Graphical representation of a derivation or deduction is called a


parse tree. Each interior node of the parse tree is a non-terminal; the
children of the node can be terminals or non-terminals.

• Ambiguous grammar: An ambiguous grammar is a context-


free grammar for which there exists a string that can have more than one
leftmost derivation or parse tree.
Basic Terminologies
• Unambiguous grammar : An unambiguous grammar is a
context-free grammar for which every valid string has a
unique leftmost derivation or parse tree.
• Regular expression: a sequence of symbols and characters
expressing a string or pattern to be searched for within a
longer piece of text. Ex. acb, 0111, ab*
Basic Terminologies
• Token: Token is a sequence of characters that can be treated as a single
logical entity. Typical tokens are,
1) Identifiers
2) keywords
3) operators
4) special symbols
5)constants

• Lexeme: A lexeme is a sequence of characters in the source program that


is matched by the pattern for a token.
Token and Lexeme Examples
Session 2:
Concept, how program is
executed ?
How yours program is executed?

Interpreter Loader
Why Don’t Write Machine Code Directly ?
• To instruct, the hardware codes must be written in binary format, which
is simply a series of 1s and 0s.
• But, it would be a difficult task for computer programmers to write such
codes, which is why we have translators (compilers, assembler,
Interpreter) to convert high level language to machine code.

• A linker tool is used to link all the parts of the program together for
execution (executable machine code).

• A loader loads all of them into memory and then the program is
executed.
How yours program is executed?
Source Assembly
Programmer code
Program
Compiler Assembler

Machine
Code

Linker
Programmer
Combines all object code using
Does manual
libraries
Correction of
The code
Editor Debugger Loader
Debugging Execute under
Loader loads program into
results Control of
RAM
debugger
Execution on
the target machine
Session 3:
2.1. Introduction to various
translators
2.1. Introduction to various translators
• Source code: A program written in high-level language is called
as source code. 
• To convert the source code into machine code, translators are
needed.
• Translator: Translator is used to translator the high-level
language program input into an equivalent machine language
program
2.1. Different type of translators (Contd..)
• Compiler
A compiler is a translator that converts high-level language to assembly language.

• Interpreter is a translator which is used to convert programs in high-level language to


low-level language. Interpreter translates line by line and reports the error once it
encountered during the translation process. The difference lies in the way they read
the source code or input. If an error occurs, an interpreter stops execution and reports
it, whereas a compiler reads the whole program even if it encounters several errors.

• Assembler
An assembler  is a translator that converts the assembly language to machine-level
language.
2.1. Compiler Vs Interpreter (Contd..)
S. No. Compiler Interpreter
1 Performs the translation of a program as a Performs statement by statement
whole. translation.
2 Execution is faster. Execution is slower.

3 Requires more memory for the generated Memory usage is efficient as no


intermediate object code. intermediate object code is generated.
4 Debugging is hard as the error messages It stops translation when the first error
are generated after scanning the entire is met. Hence, debugging is easy.
program only.
5 Programming languages like C, C++ uses Programming languages like Python,
compilers. BASIC, and Ruby uses interpreters.
Session 4:
2.2. Various phases of compiler
Compiler 3
• A compiler is a large program that can read a program in one language the source language
- and translate it into an equivalent program in another language - the target language;
• An important role of the compiler is to report any errors in the source prog ram that it
I ut
detects during the translation process

Source Target
Compiler
Program Program

Error messages

Output
Basic Steps
8

• Compiler:
The compiler passes the source code through various phases and generates the
target assembly code.

•Assembler:
It takes assembly language as input and converts into machine code or object code.

•Linker:
It combines all the object modules of a source code to generate an executable module.

•Loader: In computer systems a loader is the part of an operating system that is responsible


for loading program in RAM. It allocates the addresses to an executable module in main
memory for execution.
2.2 Phases of Compiler Design 10

A compiler operates in phases. A phase is a logically interrelated operation that takes


source program in one representation and produces output in another representation. The
phases of a compiler are shown in below:
There are two phases of compilation.
 Analysis (Machine Independent/Language Dependent)
 Synthesis(Machine Dependent/Language independent)

Compilation process is partitioned into no-of-sub processes called ‘phases’.


• Q1. List down the major stages in the process of compilation?
6 Phases of Compiler
11
1. Lexical Analyzer:

• Break the source program into meaningful units, called tokens.


• Convert all alphabets into one case (in languages that do not distinguish cases).

• Lexical Classes or Tokens or Lexemes: Identifiers, Constants, Keywords,
Operators.
2. Syntax Analyzer:

•Determine the structure of programs, individual expressions, and statements,


that is, determine how a program unit is constructed from other units.
•It Groups Tokens of source Program into Grammatical Production, In Short
Syntax Analysis Generates Parse Tree
• Language represented by formalism called context free grammar.
3. Semantic Analyzer:
•Semantic analysis checks whether the parse tree constructed follows the rules
of language. For example, assignment of values is between compatible data
types, and adding string to an integer. Type checking is done in this phase.

4. Intermediate representation
• It is closer to the machine form, and is usually easy to produce. One such
representation: “three-address code.”

• Three-address code consists of a sequence of instructions, each with at most 3


operands.

Generate code in the assembly format.


5. Code Optimization
• Optimize the Intermediate code obtained.

6. Code Generation
• Generate code in the assembly format.
Note: Just from the functionality itself we can give naming convention to
compiler phases.
Session 5
2.2.1 Lexical Analysis and syntax analysis,
2.2.2 Intermediate Code Generation,
2.2.3 Code optimization techniques,
2.2.4 Code generation
Phase-1: Lexical Analysis
The main functions of this phase are:
1. Identify the lexical units in source statement and produce output as
a sequence of tokens that the parser uses for syntax analysis.
2. Classify tokens into different lexical classes e.g. constants, reserved
words, variables etc. and enter them in different tables.
3. To build literal table, and uniform symbol table.
 The lexical analyzer also called remover because it breaks high
level language program into a series of tokens, by removing any
whitespace or comments in the source code.
Phase-1: Lexical Analysis (Contd..)
• Tokens: In programming language, keywords, constants,
identifiers, strings, numbers, operators and punctuations
symbols can be considered as tokens.
Example 2. Suppose we pass a statement through lexical analyzer
• a = b + c ;                
It will generate token sequence like this:
• Id1 = id2 + id3 ;                
Where each id reference to it’s variable in the symbol table.
Example of tokens:
• For example, consider the program

Printf (“System Programming”) ;

There are 5 valid token in this printf statement.


Lexical Error
Lexical Error: Error while identifying lexemes. Like inz a; instead of int a;.
lexical error is when the input doesn't belong to any of these lists:
• key words: "if", "else", main"... 
• Relational operators, include: < > =!= => =< ==
• Other operators, include: = : + - * / %
• Delimiters, include: . ( ) , { } ; [ ]
•   variables: [a-z/A-Z]
numbers: [0-9]*
Example: 9var: error, number before characters, not a variable and not a key word
either. $: error
Phase-2:Syntax Analysis
• It checks the syntactical structure of the given input, i.e.
whether the given input is in the correct syntax (of the language
in which the input has been written) or not.
• It does so by building a data structure, called a Parse tree or
Syntax tree.
• The parse tree is constructed by using the pre-defined
Grammar of the language and the input string.
• If the given input string can be produced with the help of the
syntax tree (in the derivation process), the input string is
found to be in the correct syntax.
Parse Tree Example 1
Parse Tree Example 2
Syntax Error
• Syntax error: Error in the syntax of some construct. Like printf
30; instead of printf("30"); 
• = a + b*c
How to Create Parse Tree
• Parse tree is constructed using Left-most and Right-most
derivation.
Left-most Derivation:
• A left derivation replaces the left-most nonterminal at each
step of the derivation.
Right-most Derivation:
• A right derivation replaces the right-most nonterminal at each
step.
Example 2. Left-most, Right-most Derivation Tree
Parse Tree via Left and Right-most Derivation

Fig.2.1 Parse Tree via left-most derivation Fig.2.2 Parse Tree via right-most derivation
Ambiguous Grammar
• An ambiguous grammar is a context-free grammar for which there
exists a string that can have more than one leftmost derivation or parse
tree.

• An unambiguous grammar is a context-free grammar for which every


valid string has a unique leftmost derivation or parse tree.
Types of Parsing
1.     Top down parsing
2.     Bottom up parsing

• Top-down parsing : A parser can start with the start symbol and
try to transform it to the input string. Example : LL Parsers.

•  Bottom-up parsing : A parser can start with input and attempt


to rewrite it into the start symbol. Example : LR Parsers.
Top Down Parser
• It can be viewed as an attempt to find a left-most derivation for an
input string or an attempt to construct a parse tree for the input starting
from the root to the leaves. Two types:
1.a.   Recursive descent parsing
1.b.   Predictive parsing
Grammars Rules:
• E-> E+E
• E-> id
• E-> E*E
Top Down Parser
 Top-down parsing expands a parse tree from the start symbol to the
leaves
 Always expand the leftmost non-terminal
Grammars Rules:

E-> E+E
E
E-> id
E-> E*E

id + id * id
Top Down Parser
Grammars Rules:
Apply E-> E+E E
E-> E+E
E-> id
E-> E*E

id + id * id
Top Down Parser
Grammars Rules:

Apply E-> id E
E-> E+E
E-> id
E-> E*E

id + id * id
Top Down Parser
E-> E+E
E-> id
E-> E*E
Apply E-> E*E E

E
E E

id + id * id
Top Down Parser
E-> E+E
Apply E-> id E E-> id
E-> E*E

E
E E

id + id * id
Top Down Parser
E-> E+E
Apply E-> id E E-> id
E-> E*E

E
E E

id + id * id
Bottom-up Parser
• As the name suggests, bottom-up parsing starts with the input
symbols and tries to construct the parse tree up to the start
symbol.
•  Two types:
2a.   Recursive descent parsing
2b.   Predictive parsing
Example of Bottom-up Parse
14

int + (int) + (int)

Grammar Rules:
E-> int+(E)
E-> int
E-> E+(E)

int + ( int ) + ( int )


Example of Bottom-up Parse
15

int + (int) + (int) E +


(int) + (int)

E
int
+ ( int ) + ( int )
Example of Bottom-up Parse
16

int + (int) + (int) E +


(int) + (int)
E + (E) + (int)

E E
int
+ ( int ) + ( int )
Example of Bottom-up Parse
17

int + (int) + (int) E +


(int) + (int)
E + (E) + (int)
E + (int)

E E
int
+ ( int ) + ( int )
Example of Bottom-up Parse
18

int + (int) + (int) E +


(int) + (int)
E + (E) + (int)
E + (int) E +
(E)
E

E E E

int
+ ( int ) + ( int )
Example of Bottom-up Parse
19

int + (int) + (int) E + E


(int) + (int)
E + (E) + (int)
E + (int) E +
(E) E
E

E E E

int
+ ( int ) + ( int )
Which Parser is More Powerful ?
• Bottom-up parser is more powerful compared to Top-down
parsing because:
 Top down requires exponential time to complete their job.
 Top down suffers from backtracking problem.
Phase-3: Semantic Analysis

• A semantic analyzer checks the source program for semantic errors


and collects the type information for the code generation.
• Semantic analysis has the following functions:
i. Check phrases for semantic errors. E.g. int x = “System
Programming”
ii. Maintain the symbol table which contains information regarding
identifier type, and scope of identifier
iii. An important part of semantic analysis is type checking, where
the compiler checks that each operator has matching operands.
Semantic Error
• Semantic error: Error in the meaning of the statement.
Phase-4: Intermediate Code 19
Generation
After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation
(a program for an abstract machine). This intermediate representation
should have two important properties:
• it should be easy to produce and
• it should be easy to translate into the target machine.
The considered intermediate form called three-address code, which consists
of a sequence of assembly-like instructions with three operands per
instruction. Each operand can act like a register.
This phase bridges the analysis and synthesis phases of translation.
Example: 20

newval := oldval + fact * 1

Id1 := Id2 + Id3 * 1

Temp1 = into real (1)


Temp2 = Id3 * Temp1
Temp3 = Id2 + Temp2
Id1 = Temp3
Intermediate code representation:
Three-Address Code
Phase-5: Code Optimization 21

• Code optimization is the process of eliminating the non-essential


program statements without changing the meaning of actual
source code.
Advantages of Code Optimization
• It improves the running time of the target code.

• Resource utilization is improved as less number of CPU


registers will be utilized.

• Space and time complexity is also reduced.


Two Types of Optimization Techniques
Machine Independent/Dependent Optimization
• Machine Independent
– It means you don’t need to worry about on which machine compiler is working upon.
– In machine independent code optimization, optimization is applied to the intermediate
code.
• Machine dependent
– It means depending upon the features available on machine we can apply optimization
on it.
– In machine dependent code optimization, optimization is applied to the source code.
– It depends upon how many register machine architecture have, type of addressing mode
used. Therefore, its purely dependent on the machine.
Which part of code to be focused
for optimization?
Compiler Designer must focus on loop optimization

• Reducing the number of lines in loop will actually reduce


overall execution time of the program.

• Therefore most of the optimization techniques focus on


eliminating the redundancy on loop.
1. Machine Independent Optimization
1. Loop optimization
1a. Code motion
1b. Loop unrolling
1c. Loop jamming
1d. Constant folding
1a. Code Motion:

• Reduce the evaluation frequency of expression.


• Bring loop invariant statements out of the loop.
Example:
for (j= 0; j<10; j ++) {
b = x+2;
a[j] = 5* j;
}
According to the above code, b = x+2 is calculated again and again in each iteration.
Once b is calculated, it does not change. So, this line can be placed outside the loop
as follows.
b = x+2;
for (j=0; j< 10; j++)
{ a[j] = 5 * j;
}
1b. Loop Unrolling
• Loop overhead can be reduced by reducing the number of iterations and replicating the
body of the loop.
Example:
In the code fragment below, the body of the loop can be replicated once and the number
of iterations can be reduced from 100 to 50.
for (i = 0; i < 100; i++)
g ();
Below is the code fragment after loop unrolling.
for (i = 0; i < 100; i += 2)
{
g ();
g ();
}
1c. Loop Jamming

• Loop fusion (or loop jamming) is a compiler optimization and loop transformation


which replaces multiple loops with a single one.
• Example
int i, a[100], b[100];
for (i = 0; i < 100; i++)
a[i] = 1;
for (i = 0; i < 100; i++)
b[i] = 2;
is equivalent to:
int i, a[100], b[100];
for (i = 0; i < 100; i++)
{
a[i] = 1;
b[i] = 2;
}
1d. Constant Folding
– Constant folding is the process of recognizing and evaluating constant expressions at 
compile time rather than computing them at runtime.
Example:
– In the code fragment below, the expression (3 + 5) can be evaluated at compile time and
replaced with the constant 8.
int a;
for (i=0; i<10; i++)
{
a= 3 + 5;
print (%d, a); Below is the code fragment after constant folding.
int a = 8;
for (i=0; i<10; i++)
{
print (%d, a);
}
2. Machine Dependent Optimization
2. Peephole optimization
2a. Redundant load/store elimination.
2b. Use of machine idioms.
2c. Strength reduction.
Peephole optimization
• In compiler theory, peephole optimization is a kind of 
optimization performed over a very small set of instructions in
a segment of generated code. The set is called a "peephole" or a
"window".

• It works by recognizing sets of instructions window (peephole),


that can be replaced by shorter or faster sets of instructions.
2a) Redundant load and store elimination

• % Add B and C, Store in A %


Eg. A=B+C MOV B, Ro
MOV B, Ro % load value of B in Ro% Add C, Ro
ADD C, Ro MOV Ro, A
Redundant,
MOV Ro, A % store value of Ro in memory variable A% MOV A, Ro
eliminate store
• % Add A and E, Store in D % ADD E, Ro
Eg. D=A+E MOV Ro, D
MOV A, Ro  We are storing final value of A into Ro,
ADD E, Ro and again loading value of A into Ro.
 So, loading and storing is there.
MOV Ro, D
 Store operation (MOV Ro, A ) has to be
eliminated .
Final Optimized code
Final optimized code after eliminating “MOV Ro, A” is follows:
• MOV B, Ro
• ADD C, Ro
• MOV A, Ro
• ADD E, Ro
• MOV Ro, D
2b) Use of Machine Idioms
Example: i=i+1
To perform increment for variable i by 1, assembly instructions carried
out in processor are:
• MOV i, Ro
• ADD 1, Ro
• MOV Ro, i
This can be optimized using following increment operator.
• inc i
2c. Strength Reduction
• Strength reduction means replacing the high strength operator by the low strength.
• Arithmetic operations like multiplication require more memory, time, and CPU cycles (4 cycles
of CPU). These expensive expressions can be replaced by  cheap expressions like b = a * 2; or
can be replaced by addition, b = a + a (2 cycles of CPU);
int a =5;
int b;
for (i=0; i<10; i++)
{
b= a *2;
print (%d, b); } Below is the code fragment after strength reduction
int a = 5;
int b;
for (i=0; i<10; i++)
{
b=a+a;
print (%d, b); }
Phase-6: Code
23
Generation
• The last phase of translation is code generation.
• Takes as input an intermediate representation of the source program and
maps it into the target language
• If the target language is machine, code, registers or memory locations are
selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences
of machine instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of registers
to hold variables.
Example: 24

Id1 := Id2 + Id3 * 1

MOV R1,Id3
R1,#1
MUL
R2,Id2
MOV
R1,R2
ADD
Id1,R1
Symbol-Table
26
Management
• The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name.
• The data structure should be designed to allow the compiler to find the
record for each name quickly and to store or retrieve data from that record
quickly
• These attributes may provide information about the storage allocated for a
name, its type, its scope (where in the program its value may be used), and
in the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or
by reference), and the type returned.
new Val Id1 & attribute
old Val Id2 & attribute
fact Id3 &attribute
Error Handling
27
Routine:
• One of the most important functions of a compiler is the detection and
reporting of errors in the source program. The error message should allow
the programmer to determine exactly where the errors have occurred.
Errors may occur in all or the phases of a compiler.
• Whenever a phase of the compiler discovers an error, it must report the
error to the error handler, which issues an appropriate diagnostic message.
Both of the table-management and error-Handling routines interact with all
phases of the compiler.
One pass
28
compiler
• One pass compiler passes through the source code of each compilation unit
only once.
• Their efficiency is limited because they don’t produce intermediate codes
which can be refined easily.
• One pass compilers very common because of their simplicity.
• Check for semantic errors and generate code.
• They are faster then multi pass compilers.
• Also known as Narrow compiler.
• Pascal and C are both languages that allow one pass compilation.
Multi-pass
29
compilers
• The input is passed through certain phases in one pass. Then the output of
previous phases is passed through other phases in second pass and so on
until the desired output is generated.
• It requires less memory because each pass takes output of previous phase
as input.
• It may create one or more intermediate code.
• Also known as wide compiler.
• Modula-2 is a language whose structure requires that a compiler has
at
least two passes.
Front End vs Back End of a Compilers 30
The phases of a compiler are collected into front end and back end.
The FRONT END consists of those phases that depend primarily on
the source program. These normally include Lexical and Syntactic
analysis, Semantic analysis ,and the generation of intermediate code.
A certain amount of code optimization can be done by front end as
well.
The BACK END includes the code optimization phase and final
code generation phase, along with the necessary error handling and
symbol table operations.
The front end Analyzes the source program and produces
intermediate code while the back end Synthesizes the target program
from the intermediate code.
Cont…. 31

The front end phase consists of those phases that primarily depend
on source program and are independent of the target machine.
Back end phase of compiler consists of those phases which depend
on target machine and are independent of the source program.
Intermediate representation may be considered as middle end, as
it depends upon source code and target machine.
32
Session 6:
2.4 Bootstrapping for
Compilers
2.4. Bootstrapping for compilers
• Bootstrapping is a most important concept for building a new
compiler.
• To construct any new compiler, we requires three language:
1) Source language (S)
2) Target language (T)
3) Implementation language (I)
Bootstrapping for compilers (contd..)
• We represent the three language using the following diagram called T-diagram:
• Notation: T-diagram.

S T

In textual form this can be represented as

SIT
Bootstrapping for compilers (contd..)
Example:
• Suppose we have Pascal compiler written in C language that takes Pascal code
and produce C as output.

• Now , create Pascal compiler which will convert Pascal code in C, but
implemented in C++.

• For this, we need converter (i.e., compiler) which can converts C to C++.

98
Bootstrapping for compilers (contd..)
P C P C

C++
C

C C++

• When compiler PcC is run through CxC++ we get new compiler Pc++C
• This process illustrated by T-diagram is called Bootstrapping.

• Mathematically PcC + CxC++ = Pc++C


99
Bootstrapping for compilers (contd..)
P C P C

C C++

C C++

• When compiler PcC is run through CxC++ we get new compiler Pc++C.
• This process illustrated by T-diagram is called Bootstrapping.
• Compile PcC + CxC++ = Pc++C

100
Cross-compiler Concept
Compilers are of two kinds:
– Native compilers are written in the same language as the target
language.
– Cross compilers are written in different language as the target
language.
Cross- Compiler (contd..)
• A cross compiler is a compiler capable of creating
executable code for a platform other than the one on which the
compiler is running.
• For example, a compiler that runs on a windows  but generates
code that runs on Android (android studio software) is a cross
compiler.
Session 7:
2.3 Case Study: LEX and YACC
2.3: LEX and YACC
• Before 1975 writing a compiler was a very time-consuming
process. Then Lesk and Johnson published papers on LEX and
YACC which greatly simplify compiler design.

• LEX is a lexical analyzer generator and YACC is a parser


generator.
How to design a
BASIC compiler ?
LEX and YACC (contd..)
2.3.1. LEX
• LEX will read your patterns and generate C code for a lexical
analyzer or scanner. The lexical analyzer matches strings in the input,
based on your patterns, and converts the strings to tokens.

• LEX takes some pattern ab* and generate C code for lexical
analyzer.

• Pattern: It means combination of string with operators.


Ex: 01, 0*1, 01*, (0+1), ab, a*b, ab*, (a+b) etc
2.3.2. YACC
• YACC: Yet another compiler-compiler
• YACC generate output as parser , tokens are matched with
parser to create the syntax tree.
• YACC will read your grammar and generate C code for a
syntax analyzer or parser. The syntax analyzer uses grammar
rules that allow it to analyze tokens from the lexical analyzer
and create a syntax tree. The syntax tree imposes a hierarchical
structure the tokens.
How LEX and YACC works ?
• bas.y: This file contains grammar rules/description for YACC.
• ytab.c: Generated parser function  ‘yyparse,’ is stored in this file.
•y.tab.h: Header file which is included in LEX
•bas.l: This file contains all pattern matching rules for LEX.
•lex.yy.c : Generated lexical analyzer function `yylex’, is stored in this file.
• cc: It is used to compile and link lexer and parser together to create executable bas.exe
bas.y File Grammar Rules

Fig.2.1 Parse Tree via left-most derivation Fig.2.2 Parse Tree via right-most derivation
bas.l file overview
Steps for how LEX and YACC works ?
Step1: You want to create bas.exe (Turbo C) compiler.
Step2: Yacc reads the grammar descriptions from bas.y and generate
a parser function  ‘yyparse,’ which is stored in ytab.c file.
Step3: Y.tab.h header file is given to LEX
Step4: Lex takes pattern matching rules from bas.l file and generate
yylex function which is store in lex.yy.c file.
Step5: Finally compiler and link Lexer and Parser together for creating
executable compiler file bas.exe
 Commands to create compiler:  bas.exe
Command 1 -To create y.tab.h, y.tab.c: yacc -d bas.y

Command 2 - To create lex.yy.c :lex bas.l

Command 3- To compile/link: cc lex.yy.c y.tab.c -obas.exe


Session 8:

2.3. Introduction to Grammars and


finite automata
These are the Concepts of Theory of
Computation (TOC/Automata) Subject
System Programming: Exam
Hierarchy of Automata/grammars developed in 1956 by
Noam Chomsky (Broad Overview TOC).
Why You are Studying ?
• Because Finite Automata is used in Lexical Analysis Phase for
pattern matching and types of token identification.
• Others Applications of FA:
– For spell-checking in Microsoft word.
– Pattern matching in Mobile phone.
– Speech processing.
• Grammars are used in Syntax Analysis Phase.
What is TOC ?
• This is something about computation.
• Computation means any task that can be performed by a
calculator or a computer.
• So, mathematically modeling of a computer or machine (i.e.,
representation through state diagram) and studied theory and
capacity of each machine is called TOC.
What’s the keyword ‘FA’ signifies ?
• In state diagram representation, number of states will be
finite, that’s why its called as finite automata.
FA state diagram for pattern unlock in Mobile Phone
• For state diagram 1, when transition moves to final state then ‘IF’ is valid
token otherwise not. Similarly, for alphabets and numbers.
Q1. Design a DFA in which start and end symbol
must be different .Given: Input alphabet, Σ={a, b}
Steps to Follow
• Step1: Find language L, starting with smallest string first.
L= {ab, ba, aab, abb, baa, bba, ..................}

• Step2: Draw DFA for minimum string of Language L


L = ab or ba
Step2a.
• We know that if the first input is 'a' then the last input cannot be 'a', so we
start two flows form start state, one for staring with 'a' and another for
starting with 'b'

Iteratively repeat step 2.


DFA drawn is:
Q2. Design a minimum state deterministic finite automata (DFA)
for accepting those strings over alphabet {a,b} which is starting
with “a” and ending with 'b''

a/b
Q3. Design a DFA for following regular expression (a/b)*abb

Step1: Find language L, starting with smallest string first.


L= {abb, aabb, babb, aaabb, ababb, baabb, ..................}

Step2: Draw DFA for smallest string of Language L


L = abb
Q4. A language consist of All strings of a’s and b’s which ends
with b and does not contain aa
Regular Expression is:
Step1: Write R.E for smallest string i.e. End with b here
R.E = b
Step2: String that does not contain aa and end with b
R.E = b*a*b
• Hint: Avoid every possibility for occurrence of ‘aa’
Finite Automaton can be classified into two types −

• Deterministic Finite Automaton (DFA)

• Non-deterministic Finite Automaton (NDFA / NFA)


• NFA is used to draw FA blindly in the starting phase. Then NFA
to DFA conversion is done.
Deterministic vs. Nondeterministic (Imp)
• Deterministic: DFA: In DFA, for each input symbol, one can determine the
state to which the machine will move. Hence, it is called Deterministic
Automaton. Practical implementation in compiler is DFA.

• Nondeterministic: In NDFA, for a particular input symbol, the machine


can move to any combination of the states in the machine. In other
words, the exact state to which the machine moves cannot be
determined. There is ambiguity in NFA.
DFA vs. NFA
Conversion from NFA to DFA
Subset Construction Method
Using Subset construction method to convert NFA to DFA involves the
following steps:
• For every state in the NFA, determine all reachable states for every input
symbol.
• The set of reachable states constitute a single state in the converted DFA
(Each state in the DFA corresponds to a subset of states in the NFA).
• Find reachable states for each new DFA state, until no more new states
can be found.
Subset Construction Method
Fig1. NFA
Subset Construction Method
Fig1. NFA

3
a b
a
a
a 2 b
a,b
1 5

a,b a
4

b
Subset Construction Method
Fig1. NFA

3
a b
a Step1
a
a 2 b
Construct a transition table showing all reachable
a,b states for every state for every input signal.
1 5

a,b a
4

b
Subset Construction Method
Fig1. NFA Fig2. Transition table

3
a b
a
a
a 2 b
a,b
1 5

a,b a
4

b
Subset Construction Method
Fig1. NFA Fig2. Transition table

3 q δ(q,a) δ(q,b)
a b 1 {1,2,3,4,5} {4,5}
a
a
a 2 b 2 {3} {5}
a,b
1 5 3 ∅ {2}

a,b a 4 {5} {4}


4

b
5 ∅ ∅
Subset Construction Method
Transition from state q with input a Transition from state q with input b
Fig1. NFA without λ-transitions Fig2. Transition table

3 q δ(q,a) δ(q,b)
a b Starts here 1 {1,2,3,4,5} {4,5}
a
a
a 2 b 2 {3} {5}
a,b
1 5 3 ∅ {2}

a,b a 4 {5} {4}


4

b
5 ∅ ∅
Subset Construction Method

Fig2. Transition table


q δ(q,a) δ(q,b) Step2
1 {1,2,3,4,5} {4,5}
The set of states resulting from every transition
2 {3} {5} function constitutes a new state. Calculate all
reachable states for every such state for every input
3 ∅ {2} signal.

4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table

q δ(q,a) δ(q,b) Starts with q δ(q,a) δ(q,b)


Initial state
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}

2 {3} {5}
3 ∅ {2}

4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) Starts with q δ(q,a) δ(q,b)
Initial state
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5}
2 {3} {5}
{4,5}
3 ∅ {2}

4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) Starts with q δ(q,a) δ(q,b)
Initial state
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5}
2 {3} {5}
{4,5}
3 ∅ {2}

4 {5} {4}

5 ∅ ∅

Step3
Repeat this process(step2) until no more new states
are reachable.
Fig2. Transition table Fig3. Subset Construction table

q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)


1 {1,2,3,4,5} {4,5}
1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5} {4,5}
3 ∅ {2} {2,4,5}

4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5}
4 {5} {4} 5
4
5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5
4
5 ∅ ∅
{3,5}
Fig2. Transition table Fig3. Subset Construction table

q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)


1 {1,2,3,4,5} {4,5}
1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5} {4,5} 5 4
3 ∅ {2} {2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4
{3,5}

Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5}

We already got 4 and 5.


So we don’t add them again.
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2


2
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2

∅ ∅ ∅
2
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2

∅ ∅ ∅
2 3 5
3
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} 4
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2

∅ ∅ ∅
2 3 5
Stops here as there are no more
reachable states
3 ∅ 2
Fig4. Resulting FA after applying Subset
Fig3. Subset Construction table Construction to fig1
a
q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 12345 b 245 a
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
35
{4,5} 5 4 a
a,b a
{2,4,5} {3,5} 4 b
a b

5 ∅ ∅ 1
3

a,b b
4 5 4 b a
{3,5} ∅ 2
a
2

45 5 b
∅ ∅ ∅
b 4 a
2 3 5
3 ∅ 2
b
NFA to DFA Conversion End
Students Exercise
Q1. Differentiate between NFA and DFA. Is NFA
can be converted to DFA? If yes how?
Introduction to Automata
Grammars
Grammar in Compiler Overview
Introduction to Grammars
• According to Noam Chomsky, there are four types of grammars
− Type 0, Type 1, Type 2, and Type 3.
• The following table in next slide shows how they differ from
each other −
Grammars Description
 Type - 3 Grammar
• Type-3 grammars generate regular languages. Type-3 grammars must
have a single non-terminal on the left-hand side and a right-hand side
consisting of a single terminal or single terminal followed by a single
non-terminal.
• The productions must be in the form X → a or X → aY
• where X, Y ∈ N (Non terminal)
• and a ∈ T (Terminal)
• The rule S → ε is allowed if S does not appear on the right side of any
rule.
• Example
• X → ε X → a | aY Y → b
Grammars Description
 Type - 2 Grammar
• Type-2 grammars generate context-free languages.
• The productions must be in the form A → γ
• where A ∈ N (Non terminal)
• and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
• These languages generated by these grammars are be recognized by a
non-deterministic pushdown automaton.
• Example
• S → X a X → a X → aX X → abc X → ε
Grammars Description
 Type - 1 Grammar
• Type-1 grammars generate context-sensitive languages. The
productions must be in the form
• αAβ→αγβ
• where A ∈ N (Non-terminal)
• and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
• The strings α and β may be empty, but γ must be non-empty.
• The rule S → ε is allowed if S does not appear on the right side of any
rule. The languages generated by these grammars are recognized by a
linear bounded automaton.
• Example
• AB → AbBc A → bcA B → b
Grammars Description
 Type - 0 Grammar
• Type-0 grammars generate recursively enumerable languages. The
productions have no restrictions. They are any phase structure grammar
including all formal grammars.
• They generate the languages that are recognized by a Turing machine.
• The productions can be in the form of α → β where α is a string of
terminals and nonterminals with at least one non-terminal and α cannot
be null. β is a string of terminals and non-terminals.
• Example
• S → ACaB Bc → acB CB → DB aD → Db
Chomsky Hierarchy: Properties of Grammar
• This is a hierarchy, so every language of type 3 is also of types
2, 1 and 0; every language of type 2 is also of types 1 and 0
etc. 
• The distinction between languages can be seen by examining
the structure of the production rules of their corresponding
grammar, or the nature of the automata which can be used to
identify them. 
Session 9
Debuggers:
Introduction to various debugging techniques, Case
Study: - Debugging in Turbo C++ IDE.
How yours program is executed?
Source Assembly
Programmer code
Program
Compiler Assembler

Machine
Code

Linker
Programmer
Combines all object code using
Does manual
libraries
Correction of
The code
Editor Debugger Loader
Debugging Execute under
Loader loads program into
results Control of
RAM
debugger
Execution on
the target machine
Debugging
• It is a systematic process of spotting and fixing the number of
bugs, or defects, in a piece of software so that the software is
behaving as expected. 
Debugging Techniques
1) Print Debugging: In this printf C function is used to identify
the error occurred during flow of execution of a process.
2) Remote debugging: It is the process of debugging a program
running on a system different from the debugger. To start
remote debugging, a debugger connects to a remote system
over a network. The debugger can then control the execution
of the program on the remote system and retrieve information
about its state.
Debugging Techniques (contd..)
3) Post-mortem debugging: It is debugging of the program after it
has already crashed. For example, analysis of memory dump
 (or core dump) of the crashed process.
4) Delta Debugging: Delta Debugging is a methodology to
automate the debugging of programs using a scientific
approach of hypothesis-trial-result loop.
Session 10: Students Self-Assignment
1. Practice all the phase of compiler with example.
2. Practice examples for DFA construction.
3. Practice examples for NFA to DFA conversion.
4. Case Study: Debugging in Turbo C++ IDE
5. Design of a compiler in C++ as Prototype case study.
6. Practice whether a grammar is ambiguous or not using
parse tree ?
Thank You

S-ar putea să vă placă și