Sunteți pe pagina 1din 30

INTRODUCTION

TO
COMPILER

<<professor>>

Computer Science Department


A compiler acts as a translator,
transforming human-oriented programming
languages into computer-oriented machine
languages.

Programming Machine
Language Compiler
Language
(Source) (Target)

Computer Science Department Slide 2 of 41


•Compilers may generate three types of code:
–Pure Machine Code
•Machine instruction set without assuming the existence
of any operating system or library.
•Mostly being OS or embedded applications.
–Augmented Machine Code
•Code with OS routines and runtime support routines.
•More often
–Virtual Machine Code
•Virtual instructions, can be run on any architecture with a
virtual machine interpreter or a just-in-time compiler
•Ex. Java

Computer Science Department Slide 3 of 41


INTRODUCTION TO
COMPILER
 Compilers
 Analysis of the source program
 The phases of a compiler
 Cousins of the compiler
 The grouping of phases
 Compiler Construction tools

Computer Science Department Slide 4 of 41


Let’s define:
A compiler is a program that converts high-level language to
assembly language.
An assembler is a program that converts the assembly language
to machine-level language. An assembler translates assembly
language programs into machine code. The output of an
assembler is called an object file, which contains a combination of
machine instructions as well as the data required to place these
instructions in memory.

A Preprocessor, generally considered as a part of compiler, is


a tool that produces input for compilers. It deals with macro-
processing, augmentation, file inclusion, language extension,
etc.
Computer Science Department Slide 5 of 41
Let’s define:
An interpreter, like a compiler, translates high-level language into
low-level machine language.
The difference lies in the way they read the source code or input.
A compiler reads the whole source code at once, creates tokens,
checks semantics, generates intermediate code, executes the
whole program and may involve many passes.
In contrast, an interpreter reads a statement from the input,
converts it to an intermediate code, executes it, then takes the next
statement in sequence. If an error occurs, an interpreter stops
execution and reports it; whereas a compiler reads the whole
program even if it encounters several errors.

Computer Science Department Slide 6 of 41


Linker is a computer program that links and merges various
object files together in order to make an executable file.

All these files might have been compiled by separate


assemblers.

The major task of a linker is to search and locate referenced


module/routines in a program and to determine the memory
location where these codes will be loaded, making the program
instruction to have absolute references.

Computer Science Department Slide 7 of 41


Machine Language a program for a
computer must be built by combining these
very simple commands into a program.
Since this is a tedious and error prone
process most programming is, instead,
done using a high-level programming
language.

Computer Science Department Slide 8 of 41


What is a
COMPILER?
Compilation is a process that translates a program
in one language (the source language) into an
equivalent program in another language (the object
or target language).

Compiler translates (or compiles) a program written


in a high-level programming language that is
suitable for human programmers into the low-level
machine language that is required by computers.
Computer Science Department Slide 9 of 41
Computer Science Department Slide 10 of 41
THE ANALYSIS-SYNTHESIS MODEL
OF COMPILATION
There are two parts to compilation:
–Analysis determines the operations implied by the
source program which are recorded in a tree
structure
–Synthesis takes the tree structure and translates
the operations therein into the target program

Computer Science Department Slide 11 of 41


PHASE OF A COMPILER:
• Analysis of Language1
• Synthesis of Language 2
ANALYSIS SYNTHESIS

INTERMEDIATE
CODE
LEXICAL GENERATION
ANALYSIS

CODE
OPTIMIZATION
SYNTAX
ANALYSIS

TARGET CODE
GENERATION
SEMANTIC
ANALYSIS

Computer Science Department Slide 12 of 41


THE PHASES OF A COMPILER

Computer Science Department Slide 13 of 41


THE PHASES OF A COMPILER

RTL Compiler is a powerful tool for logic synthesis and analysis for digital
designs.

Computer Science Department Slide 14 of 41


THE PHASES OF A
COMPILER
LEXICAL ANALYSIS
This is the initial part of reading and analysing
the program text:
•The text is read and divided into tokens, each
of which corresponds to a symbol in the
programming language,
•e.g., a variable name, keyword or number.

Computer Science Department Slide 15 of 41


THE PHASES OF A COMPILER
A lexical analyser or scanner is a program that groups
sequences of characters into lexemes, and outputs (to
the syntax analyser) a sequence of tokens. Here:
(a)Tokens are symbolic names for the entities that make
up the text of the program;
(b)A pattern is a rule that specifies when a sequence of
characters from the input constitutes a token;
(c)A lexeme is a sequence of characters in the source
program that matches the pattern for a token and is
identified by the lexical analyzer as an instance of that
token.
Computer Science Department Slide 16 of 41
THE PHASES OF A COMPILER
For example, the following code might result in
the table given below.
program foo(input,output);var x:integer;begin
readln(x);writeln;’value read =’,xͿ end.
LEXEME TOKEN PATTERN
program program p, r, o, g, r, a, m
newlines, spaces, tabs
foo id (foo) letter followed by seq.

of alphanumerics
Computer Science Department Slide 17 of 41
LEXICAL ANALYZER:
Lexical Analyzer or Linear Analyzer breaks the sentence
into tokens. For Example following assignment
statement :-
position = initial + rate * 60
Would be grouped into the following tokens:
1. The identifier position.
2. The assignment symbol =.
3. The identifier initial.
4. The plus sign.
5. The identifier rate.
6. The multiplication sign.
7. The number 60
Computer Science Department Slide 18 of 41
SYMBOL TABLE:

position Id1 & attributes

Initial Id2 & attributes

rate Id3 & attributes

An expression of the form :


Position =Initial +60*Rate
gets converted to  id1 = id2 +60*id3
So the Lexical Analyzer symbols to an array of easy to use
symbolic constants (TOKENS). Also, it removes spaces and
other unnecessary things like comments etc.

Computer Science Department Slide 19 of 41


THE PHASES OF A COMPILER
SYNTAX ANALYSIS
A syntax analyser or parser is a program that
groups sequences of tokens from the lexical
analysis phase into phrases each with an
associated phrase type.
A phrase is a logical unit with respect to the
rules of the source language.
For example, consider:
a := x * y + z

Computer Science Department Slide 20 of 41


SYNTAX ANALYSIS:
Syntax analysis is also called PARSING. It involves
grouping the tokens of the source program into
grammatical phrases that are used by the compiler to
synthesize output. It checks the code syntax using CFG :
i.e. the set of rules .For example: if we have grammar of
the form:
• E=E
• E=E+E
• E=E*E
• E = const.
Then corresponding parse tree derivation is:
E E = Eid = E+Eid = id + E*Eid = id + id*60

Computer Science Department Slide 21 of 41


Parse Tree

Computer Science Department Slide 22 of 41


SEMANTIC ANALYSIS
The semantic analysis phase checks source
program for semantic errors and gathers type
information for the subsequent code-generation
phase . In this checks are performed to ensure
that the components of a program fit together
meaningfully.
For example: we have a sample code:
int a; int b;
char c[ ];
a=b + c; (Type check is done)

Computer Science Department Slide 23 of 41


SYNTHESIS PHASE OF COMPILATION:

 INTERMEDIATE CODE GENERATION:


We can think of this intermediate representation
as a program for an abstract machine. For the
example used in lexical analysis the
intermediate representation will be:
temp1=initoreal(60)
temp2= id3*temp1
temp3=id2+temp2
id1=temp3

Computer Science Department Slide 24 of 41


CODE OPTIMIZATION
The code optimization phase attempts to improves the
intermediate code, so that faster-running machine code
result. Some optimization are trivial. So the final code for
example above will be:-
temp1=id3*60 // removed unnecessary
id1=id2+temp1 //variables
In “optimizing compilers” ,a significant amount of time is
spent on this phase. How-ever ,there are simple
optimizations that significantly improve the running time
of the target program with out slowing down the
compilation too much.

Computer Science Department Slide 25 of 41


CODE GENERATION
The Final phase of the compiler is the generation of the
target code, consisting normally of the relocatable
machine code or assembly code. Compilers may
generate many types of target codes depending on
machine code while some compilers make target code
only for a specific machine code. Translation of the taken
code might become:
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1

Computer Science Department Slide 26 of 41


Seatwork:

Input: result = a + b * c / d
• Tokens:

Slide 27 of 41
Input: result = a + b * c / d
Exp ::= Exp ‘+’ Exp
| Exp ‘-’ Exp
| Exp ‘*’ Exp
| Exp ‘/’ Exp
| ID
Assign ::= ID ‘=‘ Exp

Computer Science Department Slide 28 of 41


Answer

Input: result = a + b * c / d
• Tokens:
‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘c’, ‘/’, ‘d’

identifiers
operators

Computer Science Department Slide 29 of 41


Input: result = a + b * c / d
Exp ::= Exp ‘+’ Exp
Assign
| Exp ‘-’ Exp
| Exp ‘*’ Exp
| Exp ‘/’ Exp
ID ‘=‘ Exp
| ID Exp ‘+’ Exp
Assign ::= ID ‘=‘ Exp
ID
Exp ‘*’ Exp
ID
Exp ‘/’ Exp
ID ID

Computer Science Department Slide 30 of 41

S-ar putea să vă placă și