Sunteți pe pagina 1din 61

Compilers

Code Generation
MACHINE - DEPENDENT CODE OPTIMIZATION

Machine Independent Compiler


Features

MACHINE - INDEPENDENT CODE OPTIMIZATION


srinath_nk@yahoo.com 1 RVCE

N.K. Srinath

Source Program
Lexical analysis Syntax analysis Table Management

Intermediate Code generation


Code Optimization Code Generation

Error Handling

N.K. Srinath

srinath_nk@yahoo.com

RVCE

N.K. Srinath

srinath_nk@yahoo.com

RVCE

Most of the times, the phases of a compiler are collected into a front-end and a back-end. The front-end comprises of those phases or at times also parts of the phases which depend on the source language and are independent of the target machine. These include lexical analysis, syntactic analysis, creation of symbol table, semantic analysis and generation of intermediate code.
N.K. Srinath srinath_nk@yahoo.com 4 RVCE

It also includes some amount of error handling and code optimization that goes along with these phases. The back-end generally includes those phases of the compiler which depend on the target machine. They do not depend on the source language, just the intermediate language. Backend includes code optimization, code generation along with error handling and symbol-table operations.
N.K. Srinath srinath_nk@yahoo.com 5 RVCE

GENERATION OF OBJECT CODE


Code generation phase is after the syntax phase. The parser recognizes a portion of the source program according to the grammar. A sub program is executed for that grammar. This sub program is called semantic routine. Semantic routines generate object code directly and is also called code generation routines. Some complex compilers generate an intermediate form of the program.
N.K. Srinath srinath_nk@yahoo.com 6 RVCE

The code generation routines that is discussed are designed for the use with the grammar in fig
The list of simplified Pascal grammar is shown in fig. 1. < prog > ::= PROGRAM < program > VAR <dec - list >

begin < stmt - list > end.


2. 3. <prog - name>::= id < dec - list > ::= < dec > | < dec - list > ;

4. < dec >

::= < id - list > : < type >


srinath_nk@yahoo.com 7 RVCE

N.K. Srinath

5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

< type > ::= integer < id - list > ::= id | < id - list > , id <stmt - list > ::= < stmt > <stmt - list > ; < stmt > < stmt > ::= <assign> | <read > | <write> | <for> < assign > ::= id : = < exp > <exp>::= <term>|<exp>+<term>| <exp> - <term>
<term>::=<factor>|<term><factor>|<term>DIV <factor>

< factor> ::= id ; int | (< exp >) < READ> ::= READ ( < id - list >) < write > ::= WRITE ( < id - list >) < for > ::= FOR < idex - exp > Do < body >
N.K. Srinath srinath_nk@yahoo.com 8 RVCE

Note: This grammar is used for code generations to emphasize the point that code generation techniques need not be associated with any particular parsing method. This is because: (1)Operator precedence method ignores certain nonterminals. (2) Recursive-descent menod must use a slightly modified grammar
N.K. Srinath srinath_nk@yahoo.com 9 RVCE

The code generation is for the SIC/XE machine. The code generation routines make use of two data structure for working storage: (1) A List (2) A Stack Listcount: A variable Listcount is used to keep a count of the number of items currently in the list.

N.K. Srinath

srinath_nk@yahoo.com

10

RVCE

The code generation routine make use of token specifiers and are denoted by S(token) . Example: id int S (id) ; name of the identifier S (int) ; value of the integer, # 100

The code generation routines create segments of object code for the compiled program. A symbolic representation is given to these codes using SIC assembler language.

N.K. Srinath

srinath_nk@yahoo.com

11

RVCE

LOCCTR: It is a Location counter which is updated to reflect the next variable address in the compiled program (exactly as it is in an assembler). Application Process to READ Statement:

(Read) < Id-list> Id { Value} )

READ

(
N.K. Srinath

The parser tree for Read statement can be generated with many different parsing methods.
12 RVCE

srinath_nk@yahoo.com

In an operator precedence parse, the recognition occurs when a sub-string of the input is reduced to some non-terminal <Ni>. In a recursive-descent parse, the recognition occurs when a procedure returns to its caller, indicating success. Thus the parser first recognizes the id VALUE as an <id-list>, and then recognizes the complete statement as a < read >.
N.K. Srinath srinath_nk@yahoo.com 13 RVCE

The symbolic representation of the object code to be generated for the READ statement is as shown. + JSUB XREAD WORD 1 WORD VALUE This code consists of a call to a statement XREAD, which world be a part of a standard library associated with the compiler. The subroutine of any program that wants to perform a READ operation can call XREAD.
N.K. Srinath srinath_nk@yahoo.com 14 RVCE

The parameter list for XREAD is defined immediately after the JSUB that calls it. The first word is the number of variables that will be assigned values by the READ. The following word gives the addresses of these variables. Routines that might be used to accomplish the above code generation.

< id - list > :: = id add S (id) to list add 1 to Listcount


N.K. Srinath srinath_nk@yahoo.com 15 RVCE

< id - list > :: = < id - list >, id add S (id) to list add 1 to LC ListCount These two statements correspond to alternative structure for < id - list >, that is

<id-list > :: = id | < id - list >, id.


In either case, the token specifier S(id) for a new identifier being added to the <id-list> is inserted into the list used by the code-generation routines, and Listcount is updated by incrementing.
N.K. Srinath srinath_nk@yahoo.com 16 RVCE

< read > :: = READ (< id - list >)

generate [ + JSUB XREAD ] record external reference to XREAD generate [WORD Listcount] for each item on list do begin remove S (ITEM) from list generate [WORD S (ITEM)] end List _count : = 0
N.K. Srinath srinath_nk@yahoo.com 17 RVCE

Code-generation Process for the Assignment Statement

Example:
VARIANCE:=SUMSQ DIV 100 - MEAN * MEAN

Solution
The parser tree for this statement is shown in fig. Most of the work of parsing involves the analysis of the <exp> on the right had side of the " : = " statement.:
N.K. Srinath srinath_nk@yahoo.com 18 RVCE

Parser Tree

<Assign> <EXP> <EXP>

<TERM>
<Term> <Factor><Factor> Id Id DIV Int {VARIANCE} {SUMQ} {100}
N.K. Srinath srinath_nk@yahoo.com

<TERM>
<Term>

:=

<Factor><Factor> Id Id * {MEAN} {MEAN}


19 RVCE

A code-generation routine is called for each portion of the statement is recognized. Example: For a rule <term>1:: = <term> 2 * <factor> a code is to be generated. The subscripts are used to distinguish between the two occurrences of <term> . The code-generation routines perform all arithmetic operations using register A.

Before multiplication one of the operand <term>2 must be located in A-register.


N.K. Srinath srinath_nk@yahoo.com 20 RVCE

The results after multiplication, <term>2 * <factor> will be left in register A. So we need to keep track of the result left in register A by each segment of code that is generated. This is accomplished by extending the token-specifier idea to non-terminal nodes of the parse tree.

The node specifier S(<term>1) would be set to rA. This indicates that the result of this computation is in register A.
N.K. Srinath srinath_nk@yahoo.com 21 RVCE

The variable REGA is used to indicate the highest level node of the parse tree whose value is left in register A by the code generated so far. 1.

< assign > :: = id := <exp>


GETA (< exp >) generate [ STA S(id)] REGA : = null

The code generation routine for <assign> consists of bringing the value to be assigned into register A (using GETA). The STA instruction is generated to store the value in A register.
N.K. Srinath srinath_nk@yahoo.com 22 RVCE

Note that REGA is then set to null because the code for the statement has been completely generated, and any intermediate results are no longer needed.

The following rules do not require the generation of any machine instructions since no computation or data movement is involved.
The code generation routines for these rules simply set the node specifier of the higher-level node to reflect the location of the corresponding value.
N.K. Srinath srinath_nk@yahoo.com 23 RVCE

2.

<exp> :: =< term >


S (< exp >) : = S (< term >) if S (< exp >) = rA then REGA : = < exp >

3.

< exp >1 :: = < exp >2 + < term >


if S(< exp >2) = rA then generate [ADD S (< term >)] else if S (< term >) = rA then generate [ADD S (< exp >2)] else
N.K. Srinath srinath_nk@yahoo.com 24 RVCE

begin GETA (< EXP >2) generate [ADD S(< term >)] end S (< exp >1) : = rA REGA : = < exp >1 4.

< exp >1 :: = < exp >2 - < term >


if S(< exp >2) = rA then generate [SUB S (< term >)] else
N.K. Srinath srinath_nk@yahoo.com 25 RVCE

begin GETA (< EXP >2) generate [ SUB S(< term >)] end S (< exp >1) : = rA REGA : = < exp >1

5.

< term > :: = < factor >


S (< term >) : = S (< factor >) if S (<term>) = rA then REGA : = < term >
N.K. Srinath srinath_nk@yahoo.com 26 RVCE

6.

<term>1 :: = <term>2*<factor>
if S (< term >2) = rA then generate [ MUL S (<factor>)] else if S (< factor >) = rA then generate [ MUL S (< term >2)] else begin GETA (< term >2) generate [ MUL S(< factor >)] end S (< term >1) : = rA REGA : = < term >1
N.K. Srinath srinath_nk@yahoo.com 27 RVCE

7. <term> :: = <term>2 DIV <factor> if S (< term >2) = rA then generate [DIV S (< factor >)] else begin GETA (< term >2) generate [ DIV S (< factor >)] end S (< term >1) : = rA REGA : = < term >1

N.K. Srinath

srinath_nk@yahoo.com

28

RVCE

8.

< factor > :: = id


S (< factor >) := S (id)

9.

< factor > :: = int


S (< factor >) := S (int)

10.

< factor > :: = < exp >


S (< factor >) := S (< exp >) if S (< factor >) = rA then REGA : = < factor >
N.K. Srinath srinath_nk@yahoo.com 29 RVCE

The GETA procedure is shown Procedure - GETA (NODE) begin if REGA = null then generate [LDA S (NODE) ] else if S (NODE) rA then begin creates a new looking variable Tempi generate [STA Tempi] record forward reference to Tempi S (REGA) : = Tempi Generate [LDA S (NODE)]
N.K. Srinath srinath_nk@yahoo.com 30 RVCE

end (if rA) S(NODE) : = rA REGA : = NODE end {GETA }

N.K. Srinath

srinath_nk@yahoo.com

31

RVCE

The code generated for the above is as follows

LDA DIV STA LDA MUL STA LDA SUB STA


N.K. Srinath

SUMSQ # 100 TMP1 MEAN MEAN TMP2 TMP1 TMP2 VARIABLE


32 RVCE

srinath_nk@yahoo.com

INTERMEDIATE CODE REPRESENTATION


Some compilers generate an explicit intermediate representation of the source program after syntax and semantic analysis. This intermediate representation of the source program can be thought of as a program for an abstract machine and should have two main properties viz., 1. It should be easy to produce 2. It should be easy to translate into the target program
N.K. Srinath srinath_nk@yahoo.com 33 RVCE

Let us consider the situation given in the slide above. Suppose, we have to write a complier for m languages targeted for n machines. The obvious approach would be to write m*n compilers.

N.K. Srinath

srinath_nk@yahoo.com

34

RVCE

High Level language

HLL
This diagram shows two compilers converting higher level language to two different object codes for two machines.
It means that for a language it is necessary to have as many compilers as the number of machines.

Compilers

Object code for M1

Object code for M2

Example: C language to Intel processor and Motorola processor


N.K. Srinath srinath_nk@yahoo.com 35 RVCE

N.K. Srinath

srinath_nk@yahoo.com

36

RVCE

An intermediate language avoids most of the problems.

It allows a logical separation between machine independent and dependent phases and facilitates optimization. All we have to do is to choose a rich intermediate language that would bridge both the source programs and the target programs.
The first three phases are called as the front end of the compiler because they are machine independent.
N.K. Srinath srinath_nk@yahoo.com 37 RVCE

The code generation and related phase is called as the back end. The intermediate code generation is neither consider to be the back end nor front end. Next slide shows three languages producing a common intermediate code. From the intermediate code the object code for the two M/C are obtained.

Hence if we have Mnumber of languages and N object code is to be obtained, the number of front and back end that needs to be written is N+M.
N.K. Srinath srinath_nk@yahoo.com 38 RVCE

High level language

Front end of the compiler

Common Intermediate code

Back end of the compiler

Resultant object code for machine M1 and M2


N.K. Srinath srinath_nk@yahoo.com 39 RVCE

The intermediate form that is discussed here represents the executable instruction of the program with a sequence of quadruples. Each quadruples of the form Operation, OP1, OP2, result. Where Operation - is some function to be performed by the object code OP1 & OP2 - are the operands for the operation and Result - designation when the resulting value is to be placed.
N.K. Srinath srinath_nk@yahoo.com 40 RVCE

Example 1: SUM : = SUM + VALUE could be represented as + , SUM, Value, i1 :=, i1, , SUM The entry i1, designates an intermediate result (SUM + VALUE); the second quadruple assigns the value of this intermediate result to SUM. Assignment is treated as a separate operation ( :=).
N.K. Srinath srinath_nk@yahoo.com 41 RVCE

Example 2 : VARIANCE : = SUMSQ DIV 100 - MEAN * MEAN DIV, SUMSQ, #100, i1 *, MEAN, MEAN, i2 - , i1, i2, i3 ::=, i3, VARIABLE Note: Quadruples appears in the order in which the corresponding object code instructions are to be executed. This greatly simplifies the task of analyzing the code for purposes of optimization. It is also easy to translate into machine instructions.
N.K. Srinath srinath_nk@yahoo.com 42 RVCE

Example 3 : For the program shown below write the quadruples. PROGRAM STATS VAR SUM, SUMSQ, I, VALUE, MEAN, VARIANCE : INTEGER BEGIN SUM :=0; SUMSQ : = 0 ;

N.K. Srinath

srinath_nk@yahoo.com

43

RVCE

FOR I : = 1 to 100 DO BEGIN READ (VALUE) ; SUM : = SUM + VALUE ; SUMSQ : = SUMSQ + VALUE * VALUE END; MEAN : = SUM DIV 100; VARIANCE : = SUMSQ DIV 100 - MEAN * MEAN ; WRITE (MEAN, VARIANCE) END.

N.K. Srinath

srinath_nk@yahoo.com

44

RVCE

Solution
Line Operation OP 1 OP 2 Result Pascal Statement

1.
2.

:=
:=

#0
#0

SUM
SUMSQ

SUM : = 0
SUMSQ : = 0

3.

:=

#1

FOR I : = 1 to 100

4. JGT
5. CALL 6. PARA

I
XREAD VALUE

#100

(15)
READ (VALUE)

N.K. Srinath

srinath_nk@yahoo.com

45

RVCE

7. + SUM VALUE i1 8. := i1 SUM

SUM : = SUM + VALUE

9. * VALUE VALUE i2
10. + SUMSQ i2 i3

{SUMSQ:= SUMSQ +
VALUE * VALUE}

11. := i3
12. + I #1

SUMSQ
i4 {End of FOR loop}

13. := i4 14. J
N.K. Srinath

I (4)
srinath_nk@yahoo.com 46 RVCE

15. DIV SUM #100 16. : = i5

i5 MEAN

{MEAN:= SUM DIV 100}

17. DIV SUMSQ #100


18. * 19. 20. := MEAN MEAN i6 i8
N.K. Srinath

i6 {VARIANCE :=
i7 SUMSQ DIV 100 - MEAN * MEAN}
47 RVCE

i7

i8 VARIANCE

srinath_nk@yahoo.com

21.CALL XWRITE 22. PARAM MEAN 23. PARAM VARIANCE

{WRITE (MEAN, VALIANCE}

N.K. Srinath

srinath_nk@yahoo.com

48

RVCE

MACHINE - DEPENDENT CODE OPTIMIZATION There are several different possibilities for performing machine-dependent code optimization . Assignment and use of registers: Registers is used as instruction operand.

The number of registers available is limited.

N.K. Srinath

srinath_nk@yahoo.com

49

RVCE

Required to find the least used register to replace with new values when needed. Usually the existence of jump instructions creates difficulty in keeping track of registers contents. Divide the problem into basic blocks to tackle such problems. A basic block is a sequence of quadruples with one entry point, which is at the beginning of the block, one exit point, which is at the end of the block, and no jumps within the blocks.
N.K. Srinath srinath_nk@yahoo.com 50 RVCE

CALL operation is usually considered to begin a new basic block.

When control passes from one block to another, all values currently held in registers are saved in temporary variables. For example 3, the quadruples can be divided into five blocks. They are:
Block -- A Quadruples 1 - 3 Block -- B Quadruples 4
N.K. Srinath srinath_nk@yahoo.com 51 RVCE

Block -- C Quadruples 5 - 14 Block -- D Quadruples 15 - 20 Block -- E Quadruples 21 - 23 Fig. shows the basic blocks of the flow group for the quadruples.

A : 1-3 B:4 C : 5 - 14

An arrow from one block to D : 15 - 20 another indicates that control can pass directly from one E : 21 - 23 quadruple to another. This kind of representation is called a flow group.
N.K. Srinath srinath_nk@yahoo.com 52 RVCE

-Rearranging quadruples before machine code generation: Example : 1) DIV 2) * 3) SUMSQ MEAN i1 i2 # 100 i1 MEAN i2 i3

4) : =

i3

VARIANCE

N.K. Srinath

srinath_nk@yahoo.com

53

RVCE

LDA SUMSQ DIV # 100 STA i1 LDA MEAN MUL MEAN STA i2 LDA i1 SUB i2 STA i3 STA Variance shows a typical generation of machine code from the quadruples using only a single register ie Accumulator
N.K. Srinath srinath_nk@yahoo.com 54 RVCE

The optimizing compiler could rearrange the quadruples so that the second operand of the subtraction is computed first. This results in reducing two memory accesses. * MEAN MEAN # 100 i2 i1

DIV SUMSQ

:=
N.K. Srinath

i1
i3

i2

i3

VARIANCE
srinath_nk@yahoo.com 55 RVCE

LDA MUL STA LDA DIV SUB STA

MEAN MEAN i1 SUMSQ # 100 i1 VARIANCE

N.K. Srinath

srinath_nk@yahoo.com

56

RVCE

Characteristics and Instructions of Target Machine: Special loop - control instructions or addressing modes can be used to create more efficient object code. High-level machine instructions can perform complicated functions such as calling procedure and manipulating data structures in a single operation. If multiple functional blocks can be used, the source code can be rearranged to use all the blocks or most of the blocks concurrently. This is possible if the result of one block does not depend on the result of the other.
N.K. Srinath srinath_nk@yahoo.com 57 RVCE

MACHINE - DEPENDENT CODE OPTIMIZATION


There are several different possibilities for performing machine-dependent code optimization .
Assignment and use of registers Divide the problem into basic blocks. Rearrangement of machine instruction to improve efficiency of execution

N.K. Srinath

srinath_nk@yahoo.com

58

RVCE

Register Allocation
Assign specific CPU registers for specific values. Code Generation must maintain information on which registers: Are used for which purposes Are available for reuse
Main objective: Maximize the utilization of the CPU registers Minimize references to memory locations
N.K. Srinath srinath_nk@yahoo.com 59 RVCE

Possible uses for CPU registers Values used many times in a program Values that are computationally expensive Importance Efficiency Speed

N.K. Srinath

srinath_nk@yahoo.com

60

RVCE

Register Allocation Algorithm


Register Allocation Algorithm determines how many registers will be needed to evaluate an expression. It also determines the Sequence in which subexpressions should be evaluated to minimize register use.

N.K. Srinath

srinath_nk@yahoo.com

61

RVCE

S-ar putea să vă placă și