Good Assembler

GANDHINAGAR INSTITUTE OF TECHNOLOGY
Assemblers
Prepared By
Prof. Mihir N Shah
Assembler
Assembler translate source code written in assembly
language to object code.
Source Program
Mnemonic
Code
Symbol
Assembler
Object Code
Elements Of Assembly Language

Programming
An Assembly language is a
machine dependent,
low level Programming language specific to a certain
computer system.
Three features when compared with machine language
are
1.
2.
3.
Mnemonic Operation Codes

Symbolic operands
Data declarations
Mnemonic operation codes: eliminates the need to

memorize numeric operation codes.
Symbolic operands: Symbolic names can be associated
with data or instructions. Symbolic names can be used as
operands in assembly statements (need not know details
of memory bindings).
Data declarations: Data can be declared in a variety of
notations, including the decimal notation (avoids
conversion of constants into their internal representation).
Assembly Language
Structure
<Label><Mnemomic><Operand>Comments
Label
symbolic labeling of an assembler address (command
address at Machine level)
Mnemomic
Symbolic description of an operation
Operands
Contains of variables or address if necessary
Comments
Statement format
An Assembly language statement has following format:
[Label] <opcode> <operand spec>[,<operand spec>..]
If a label is specified in a statement, it is associated as a
symbolic name with the memory word generated for the
statement.
<operand spec> has the following syntax:
<symbolic name> [+<displacement>] [(<index
register>)]
Eg. AREA, AREA+5, AREA(4), AREA+5(4)
Mnemonic Operation
Codes
Each statement has two operands, first operand is always
a register and second operand refers to a memory word
using a symbolic name and optional displacement.
Operation Codes
MOVE: instructions move a value between a memory word

and a register
MOVER: First operand is target and second operand is source
MOVEM: First operand is source, second is target
All arithmetic is performed in a register (replaces the contents
of a register) and sets condition code.
A Comparision instruction sets condition code analogous to
arithmetics, i.e. without affecting values of operands.
condition code can be tested by a Branch on Condition (BC)
instruction and the format is:
BC <condition code spec> , <memory address>
Machine Instruction
Format
Sign is not a part of the instruction
Opcode: 2 digits, Register Operand: 1 digit, Memory
Operand: 3 digits
Condition code specified in a BC statement is encoded into the
first operand using the codes 1- 6 for specifications LT, LE,
EQ, GT, GE and ANY respectively
In a Machine Language Program, all addresses and constants
are shown in decimal as shown in the next slide
Example: ALP and its equivalent Machine

Language Program
Assembly Language
Statements
An assembly program contains three kinds of statements:
1. Imperative Statements
2. Declaration Statements
3. Assembler Directives
.Imperative Statements: They indicate an action to be
performed during the execution of an assembled program.
Each imperative statement is translated into one machine
instruction.
Declaration Statements: Syntax is as follows:

[Label] DS <constant>
[Label] DC '<value>'
The DS (declare storage) statement reserves memory and associates
names with them.
For Example:
A DS 1; reserves a memory area of 1 word, associating the name A to
it
G DS 200; reserves a block of 200 words and the name G is
associated with the first word of the block (G+6 etc. to access the other
words)
The DC (declare constant) statement constructs memory words

containing constants.
For Example:
ONE DC '1 ; associates name one with a memory word containing
Use of Constants
The DC statement does not really implement constants it just
initializes memory words to given values.
An Assembly Program can use constants just like HLL, in two ways
as immediate operands and as literals.
1. Immediate operands can be used in an assembly statement only if
the architecture of the target machine includes the necessary
features.
Ex: ADD
AREG,5
.This is translated into an instruction from two operands AREG

and the value '5' as an immediate operand
A literal is an operand with the syntax = '<value>'.

It differs from a constant because its location cannot be
specified in the assembly program.
Its value does not change during the execution of the program.
It differs from an immediate operand because no architectural
provision is needed to support its use.
ADD AREG, =5
ADD AREG, FIVE
FIVE DC 5
Use of literals
vs. Use of DC
Assembler Directive
Assembler directives instruct the assembler to perform certain
actions during the assembly of a program.
Some assembler directives are described in the following:
1) START
<constant>
This directive indicates that the first word of the target program
generated by the assembler should be placed in the memory word
having address <constant>.
2) END [<operand spec>]
This directive indicates the end of the of the source program. The
optional <operand spec> indicates the address of the instruction
where the execution of the program should begin.
Advantages of Assembly
Language
The primary advantages of assembly language programming
over machine language programming are due to the use of
symbolic operand specifications.
(in comparison to machine language program)
Assembly language programming holds an edge over HLL
programming in situations where it is desirable to use
architectural features of a computer.
(in comparison to high level language program)
Fundamentals of LP
Language processing =
analysis of source program + synthesis of target
program
Analysis of source program is specification of the source
program
Lexical rules: formation of valid lexical units(tokens) in the
source language
Syntax rules : formation of valid statements in the source
language
Semantic rules: associate meaning with valid statements of
the language
Synthesis of target program is construction of target

language statements
Memory allocation : generation of data structures in the
target program
Code generation
A Simple Assembly Scheme

Design Specification of an assembler
There are four steps involved to design the specification
of an assembler:
1. Identify information necessary to perform a task.
2. Design a suitable data structure to record info.
3. Determine processing necessary to obtain and maintain
the info.
4. Determine processing necessary to perform the task
There are two phases in specifying an assembler:

1. Analysis Phase
2. Synthesis
Phase(the
fundamental
requirements will arise in this phase)
information
Synthesis Phase
Consider the following statement:
MOVER BREG, ONE
The following info is needed to synthesize machine instruction for
this stmt:
Address of the memory word with which name ONE is associated
[depends on the source program, hence made available by the
Analysis phase].
Machine operation code corresponding to MOVER [does not depend
on the source program but depends on the assembly language, hence
synthesis phase can determine this information for itself]
Note: Based on above discussion, the two data structures required during the
synthesis phase are described next
Data structures in synthesis

phase
Symbol Table: built by the analysis phase
The two primary fields are name and address of the symbol
used to specify a value.
Mnemonics Table : already present
The two primary fields are mnemonic and opcode, along with length.
Synthesis phase uses these tables to obtain

The machine address with which a name is associated.
The machine op code corresponding to a mnemonic.
The tables have to be searched with the
Symbol name and the mnemonic as keys
Analysis Phase
Primary function of the Analysis phase is to build the symbol
table.
It must determine the addresses with which the symbolic
names used in a program are associated
It is possible to determine some addresses directly like the
address of first instruction in the program (ie.,start)
Other addresses must be inferred
To determine the addresses of the symbolic names we need
to fix the addresses of all program elements preceding it
through Memory Allocation.
To implement memory allocation a data structure called
location counter is introduced.
Analysis Phase Implementing memory

allocation
LC(location counter) :
Location Counter is always made to contain the address of the next memory
word in the target program.
It is initialized to the constant specified at the START statement.
When a LABEL is encountered, it enters the LABEL and the contents of LC in a
new entry of the symbol table.
LABEL e.g. N, AGAIN, SUM etc
It then finds the number of memory words required by the assembly statement and
updates the LC contents
To update the contents of the LC, analysis phase needs to know lengths of the
different instructions
This information is available in the Mnemonics table and is extended with a field
called length
We refer the processing involved in maintaining the LC as LC Processing
For Example
Symbol Table
Symbol
Address
103
Since there the instructions take different amount of memory, it

is also stored in the mnemonic table in the length field
Mnemonic Table
Mnemonic
Opcode
Length
MOVER
04
MULT
03
Mnemo
nic
Opco
de
leng
th
ADD
01
SUB
Source
Program
Analysis
Phase
02
Mnemonic
Table
Synthesis
-----------------------------Phase
--->
Symb Addre
ol
ss
N
104
AGAIN 113
Symbol Table
Target
Program
Data Access
-- > Control
Access
Data structures of an assembler During analysis and Synthesis phases
Two Pass Translation

Two pass translations consist of pass I and pass II.
LC processing is performed in the first pass and symbols defined in
the program are entered into the symbol table, hence first pass
performs analysis of the source program.
So, two pass translation of assembly lang. program can handle
forward reference easily.
The second pass synthesizes the target form using the address
information found in the symbol table.
First pass constructs an intermediate representation of the source
program and that will be used by second pass.
IR consists of two main components: data structure + IC
(intermediate code)
Single pass translation

A one pass assembler requires 1 scan of the source program to
generate machine code.
The process of forward references is talked using a process called
back patching.
The operand field of an instruction containing forward references
is left blank initially.
A table of instruction containing forward references is maintained
separately called table of incomplete instruction (TII).
This table can be used to fill-up the addresses in incomplete
instruction.
The address of the forward referenced symbols is put in the blank
field with the help of back patching list.
How forward references can

be solved using backpatching?
It builds a table of incomplete instructions (TII) to record information

about instructions whose operand fields were left blank.
Each entry in this table contains a pair of the form (instruction address,
symbol) to indicate that the address of symbol should be put in the
operand field of the instruction with the address instruction address.
By the time the END statement is processed, the symbol table would
contain the addresses of all symbols defined in the source program and
TII would contain information describing all forward references.
The assembler can now process each entry in TII to complete the
concerned instruction.
Alternatively, entries in TII can be processed on the fly during normal
processing.
In this approach, all forward references to a symbol symb i would be
processed when the statement that defines symbol symbi is encountered.
The instruction corresponding to the statement

MOVER BREG, ONE
contains a forward reference to ONE.
Hence the assembler leaves the second operand field blank in the
instruction that is assembled to reside in location 101 of memory,
and makes an entry (101, ONE) in the table of incomplete
instructions (TII).
While processing the statement
ONE DC '1'
address of ONE, which is 115, is entered in the symbol table.
After the END statement is processed, the entry (101, ONE)
would be processed by obtaining the address of ONE from the
symbol table and inserting it in the second operand field of the
instruction with assembled address 101.
Advanced Assembler
Directives
1.ORIGIN
The syntax of this directive is
ORIGIN <address specification>
where <address specification> is an <operand specification> or
<constant>.
This directive instructs the assembler to put the address given by
<address specification> in the location counter.
The ORIGIN statement is useful when the target program does not
consist of a single contiguous area of memory.
The ability to use an <operand specification> in the ORIGIN statement
provides the ability to change the address in the location counter in a
relative rather than absolute manner.
2.EQU
The EQU directive has the syntax
<symbol> EQU <address specification>
where <address specification> is either a <constant> or
<symbolic name> <displacement>.
The EQU statement simply associates the name <symbol> with
the address specified by <address specification>. However, the
address in the location counter is not affected.
3.LTORG
The LTORG directive, which stands for 'origin for literals', allows a
programmer to specify where literals should be placed.
The assembler uses the following scheme for placement of literals:
When the use of a literal is seen in a statement, the assembler enters
it into a literal pool unless a matching literal already exists in the
pool.
At every LTORG statement, as also at the END statement, the
assembler allocates memory to the literals of the literal pool and
clears the literal pool.
This way, a literal pool would contain all literals used in the program
since the start of the program or since the previous LTORG
statement.
Thus, all references to literals are forward references by definition.
If a program does not use an LTORG statement, the assembler would
enter all literals used in the program into a single pool and allocate
memory to them when it encounters the END statement.
Data Structure Of Pass I Assembler

OPTAB
A table of mnemonics opcode and related information
OPTAB contains the field mnemonics opcodes, class and mnemonics
info.
The class field indicates whether the opcode belongs to an imperative
statement (IS), a declaration statement (DS), or an assembler
directive (AD).
If an imperative, the mnemonics info field contains the pair (machine
code, instruction length), else it contains the id of a routine to handle
the declaration or directive statement.
SYMTAB
A SYMTAB entry contains the symbol name, field address and
length.
Some address can be determining directly, e.g. the address of the first
instruction in the program, however other must be inferred.
To find address of other we must fix the addresses of all program
elements preceding it. This function is called memory allocation.
LITTAB
A table of literals used in the program.
A LITTAB entry contains the field literal and address.
The first pass uses LITTAB to collect all literals used in a program.
POOLTAB
Awareness of different literal pools is maintained using the auxiliary
table POOLTAB.
This table contains the literal number of the starting literal of each
literal pool.
At any stage, the current literal pool is the last pool in the LITTAB.
On encountering an LTORG statement (or the END statement), literals
in the current pool are allocated addresses starting with the current
value in LC and LC is appropriately incremented.
Intermediate code forms:

Intermediate code consist of a set of IC units, each unit
consisting of the following three fields
Address
Representation of mnemonics opcode
Representation of operands
Address
Opcode
Operands
Intermediate code for Imperative statement

First operand is represented by a single digit number which is a
code for a register or the condition code
The second operand, which is a memory operand, is

represented by a pair of the form (operand class, code)
The second operand, which is a memory operand, is

represented by a pair of the form (operand class, code)
Where operand class is one of the C, S and L standing for
constant, symbol and literal.
For a constant, the code field contains the internal
representation of the constant itself. Ex: the operand descriptor
for the statement START 200 is (C,200).
For a symbol or literal, the code field contains the ordinal
number of the operands entry in SYMTAB or LITTAB.
Variant II
This variant differs from variant I of the intermediate code
because in variant II symbols,condition codes and CPU
register are not processed.
So, IC unit will not generate for that during pass I.
Assembly Program to Compute N!
Intermediate Representation (Variant I &

Variant II)
Example 2
SYMTAB
Pass I Algorithm
Pass II Algorithm
Thank You

Good Assembler

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Good Assembler

Încărcat de

Drepturi de autor:

Formate disponibile

GANDHINAGAR INSTITUTE OF TECHNOLOGY

Elements Of Assembly Language

Mnemonic Operation Codes

Mnemonic operation codes: eliminates the need to

MOVE: instructions move a value between a memory word

Example: ALP and its equivalent Machine

Declaration Statements: Syntax is as follows:

The DC (declare constant) statement constructs memory words

.This is translated into an instruction from two operands AREG

A literal is an operand with the syntax = '<value>'.

Synthesis of target program is construction of target

A Simple Assembly Scheme

There are two phases in specifying an assembler:

Data structures in synthesis

Synthesis phase uses these tables to obtain

Analysis Phase Implementing memory

Since there the instructions take different amount of memory, it

Data structures of an assembler During analysis and Synthesis phases

Two Pass Translation

Single pass translation

How forward references can

It builds a table of incomplete instructions (TII) to record information

The instruction corresponding to the statement

Data Structure Of Pass I Assembler

Intermediate code forms:

Intermediate code for Imperative statement

The second operand, which is a memory operand, is

The second operand, which is a memory operand, is

Assembly Program to Compute N!

Intermediate Representation (Variant I &

S-ar putea să vă placă și