Sunteți pe pagina 1din 88

CS 9303 SYSTEM SOFTWARE INTERNALS

V.P JAYA CHITRA

Computer Technology Dept

Course Objective

This course aids the learners to understand the basic functions of Software components, viz. Assemblers ,Loaders Linkers, Macro processors and Compilers. Also discusses about design and implementation of Assemblers and Macro processor with examples. It then Introduces the concept of Virtual machine with object-oriented features supported. The performance of Emulation Techniques were also analyzed. As a prerequisite the learner should have had some exposure to elementary data structures and Assembly language.

SCOPE
At the end of the course, the learners will be able to: Design and Implement Assemblers. Understand and Analyze the features of Loaders and Linkers. Design and implement Macro processor. Understand about the design and operations of Compilers. Analyze the implementation of Virtual machine by supporting object oriented programming features. Analyze the performance of emulation techniques

UNIT PLAN

UNIT 1

Unit Plan
Title Machine Instructions and programs Assemblers Basic Assemblers functions Simple SIC Assembler Assembler algorithm and data structures Machine-dependent Assembler features i)Instruction formats and Addressing modes Sessions Session 1 Session 2 Session 3 Session 4 Session 5

Contd
Title Machine-dependent Assembler features ii)Program Relocation Machine-independent Assembler features i)Litrerals ii)Statements iii)Expressions Machine-independent Assembler features iv)Program blocks v)Control sections and Program Linking Sessions Session 6 Session 7 Session 8

Unit 1:Review of Computer Architecture

Unit 1: Review of Computer Architecture


Objective:
In this unit the basic concepts of program assembly is explained using SIC machine. This begins with the discussion of the relationships between system software and machine Architecture. The assemblers machine-dependent and MachineIndependent features is also discussed. The essentials of a one and two-pass assembler is also presented. As a result this unit aids in design and implementation of an Assembler.

Machine Instructions and programs

Session 1

Introduction to system software


Software

Application software usually used by end-user Concerned with the solution of some problem, using the computer as a tool. System software System software consists of a variety of programs that support the operation of a computer. Acts as an intermediary between users and hardware. Creates a virtual environment for the user that hides the actual computer architecture. Virtual Machine: Set of services and resources created by the system software and seen by the user. The characteristic in which most system software differ from application software is machine dependency.

System software

Interface A Virtual Machine Interface

System Software

Interface B

Hardware

Actual Machine Interface Virtual Machine

Figure 1.1 The Role of System Software

System software
components
Language Services Write programs in a high-level, user-oriented language, and then execute them i.e Translator assembler compiler interpreter Memory managers Allocate and retrieve memory space loader linker other utilities Collections of library routines that provide services either to user or system routines.

System software
Compiler : Translates high-level language to assembly language. Assembler : Translates assembly language to machine language (object files). Linker : Builds an executable file from a collection of object files. Loader: Reads instructions from the object file and stores them into memory for execution.

Issues in System Software

Advanced architectures complicates system software


Superscalar CPU Memory model Multiprocessor

New applications

Embedded systems Mobile computing

Machine Instructions and programs

Instruction Set

Load and store registers LDA, LDX, STA, STX, etc. Integer arithmetic operations ADD, SUB, MUL, DIV All arithmetic operations involve register A and a word in memory, with the result being left in A COMP Conditional jump instructions JLT, JEQ, JGT Subroutine linkage JSUB, RSUB I/O (transferring 1 byte at a time to/from the rightmost 8 bits of register A) Test Device instruction (TD)

Basic Assembler Functions

Session 2

Introduction to Assemblers
Assembler Functions: Translating mnemonic operation codes to their machine language equivalents. mnemonic code to machine code Assigning machine addresses to symbolic labels. symbols to addresses Handles Constants Literals Addressing Assembly language: A symbolic representation of machine instructions.

Assemblers

Source Program

Assembler

Object

Linker

Code
Executable Code Loader

Figure 1.2 Compilation pipeline

Assemblers
Basic assembler directives:

START END BYTE WORD RESB

Starting address of the program Indicate the end of the program To represent the constant Generate one-word integer constant Reserve the indicated number of bytes for a data area. RESW : Reserve the indicated number of words for a data area.

: : : : :

SIC Assembler
Assembler Functions:

Convert Mnemonic Operation Codes to Machine Level Equivalents. Mnemonic code (or instruction name) opcode. Convert Symbolic Operands to their equivalent machine addresses. (Requires Two passes). Symbolic operands (e.g., variable names) addresses. Build the machine instructions in the proper format Convert data constants specified in source program into their internal machine representations. Constants Numbers. To write Object Program and assembly listing.

SIC Assembler

Session 3

SIC Assembler
Issues :
Address translation Contains forward reference Reference to label that is defined later in the program. Requires two passes label definitions and assign addresses actual translation (object code)

Example Program with Object Code


Line Loc 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
110 115

Source statement COPY FIRST CLOOP START STL JSUB LDA COMP JEQ JSUB J LDA STA LDA STA JSUB LDL RSUB BYTE WORD WORD RESW RESW RESB 1000 RETADR RDREC LENGTH ZERO ENDFIL WRREC CLOOP EOF BUFFER THREE LENGTH WRREC RETADR CEOF 3 0 1 1 4096

Object code 141033 482039 001036 281030 301015 482061 3C1003 00102A 0C1039 00102D 0C1036 482061 081033 4C0000 454F46 000003 000000

1000 1000 1003 1006 1009 100C 100F 1012 1015 1018 101B 101E 1021 1024 1027 102A 102D 1030 1033 1036 1039
. .

ENDFIL

EOF THREE ZERO RETADR LENGTH BUFFER

SUBROUTINE TO READ RECORD INTO BUFFER

Fig. 1.3 Example Program

Contd..

Line 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255

Loc . 2039 203C 203F 2042 2045 2048 204B 204E 2051 2054 2057 205A 205D 205E . . . 2061 2064 2067 206A 206D 2070 2073 2076 2079 RDREC RLOOP

Source statement LDX LDA TD JEQ RD COMP JEQ STCH TIX JLT STX RSUB BYTE WORD ZERO ZERO INPUT RLOOP INPUT ZERO EXIT BUFFER,X MAXLEN RLOOP LENGTH XF1 4096

Object co 041030 001030 E0205D 30203D D8205D 281030 302057 549039 2C205E 38203F 101036 4C0000 F1 001000

EXIT INPUT MAXLEN

SUBROUTINE TO WRITE RECORD FROM BUFFER WRREC WLOOP LDX TD JEQ LDCH WD TIX JLT RSUB BYTE FIRST ZERO OUTPUT WLOOP BUFFER,X OUTPUT LENGTH WLOOP 041030 E02079 302064 509039 DC2079 2C1036 382064 4C0000 05 Fig. 1.4 Example Program

OUTPUT END

X05

Object code

Purpose

Reads records from input device (code F1) Copies them to output device (code 05) At the end of the file, writes EOF on the output device, then RSUB to the operating system A buffer is used to store record Buffering is necessary for different I/O rates The end of each record is marked with a null character (0016) The end of the file is indicated by a zero-length record RDREC, WRREC Save link register first before nested jump

Data transfer (RD, WD)


Subroutines (JSUB, RSUB)

Object Program

The generated object code of an assembler . The Object program format contains three types of records: Header Contains program name, start address and length. Text Contains Translated code and data of the program with addresses (where to be loaded) End Specifies the end of the Object program Address of first executable instruction

Object Program

Header record: Col. 1 Col. 2-7 Col. 8-13 Col. 14-19 Text record: Col.1 Col.2-7 Col. 8-9 Col. 10-69 End record: Col.1 Col.2-7

H Program name Starting address (hex) Length of object program in bytes (hex) T Starting address in this record (hex) Length of object code in this record in bytes (hex) Object code (69-10+1)/6=10 instructions E Address of first executable instruction (hex)
Fig 1.5 Object Program

Contd

Pass 1 (define symbols)

Symbol used to separate fields

H COPY 001000 00107A T 001000^1E^141033^482039^001036^281030^301015^482061 ... T 00101E^15^0C1036^482061^081044^4C0000^454F46^000003^000000 T 002039^1E^041030^001030^E0205D^30203F^D8205D^281030 T 002057^1C^101036^4C0000^F1^001000^041030^E02079^302064 T 002073^07^382064^4C0000^05 E 001000 starting address Fig 1.6 Object program Corresponding to Fig 1.3, Fig 1.4

Contd

Pass 1(define symbols)


1. 2. 3.

Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directives Assemble instructions Generate data values defined by BYTE, WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing

Pass 2(assemble instructions and generate object program)


1. 2. 3. 4.

Assembler Algorithm and Data Structures

SESSION 4

Assembler Algorithm and Data Structures

OPTAB (operation code table) mnemonic, machine code (instruction format, length) etc. static table instruction length array or hash table, easy for search SYMTAB (symbol table) label name, value, flag, (type, length) etc. dynamic table (insert, delete, search) hash table, non-random keys, hashing function Location Counter counted in bytes

Contd

Source program

Pass 1

Intermediate file
Symtab

Pass 2

Object code

Optab

Symtab

Algorithm for pass1 assembler

Contd

Contd

Contd

Contd

Algorithm for pass 2 Assembler

Contd

Assembler Features

SESSION 5

Assembler Features

Machine Dependent Assembler Features


instruction formats and addressing modes program relocation literals symbol-defining statements expressions program blocks control sections and program linking

Machine Independent Assembler Features


Instruction Format and Addressing Mode


Addressing Modes:

Extended format: Indirect addressing: Immediate addressing: Index addressing: Relative addressing:

+op m op @m op #c op m,X op m

Instruction Format and Addressing Mode


START directive specifies a beginning program address of 0: a relocatable program. Register-to-register instructions: simply convert the mnemonic name to their number equivalents OPTAB: for opcodes SYMTAB: preloaded with register names and their values Fetch a value stored in a register is much faster than fetch it from the memory - Improves ececution speed.

Contd

PC or base relative addressing Calculate displacement Displacement must be small enough to fit in the 12-bit field (-2048..2047 for PC relative mode, 0..4095 for base relative mode) Can save one byte from using format 3 rather than format 4. Reduce program storage space Reduce program instruction fetch time Relocation will be easier. Extended instruction format (4-byte) 20-bit field for direct addressing

Contd

Immediate addressing mode is used whenever possible. Operand is already included in the fetched instruction. There is no need to fetch the operand from the memory. Indirect addressing mode is used whenever possible. Just one instruction rather than two is enough.

Examples:
Relocatable programs 5 0000 COPY START 0

Starting address is 0.
Register to register instructions Simple addressing 125 1036 RDREC CLEAR X B410 150 1049 COMPR A,S A004

Use extended format instructions (bit e = 1).


15 0006 CLOOP +JSUB RDREC 4B101036

Contd
PC-relative 10 0000 FIRST STL RETADR 17202D

RETADR (0030) 3 = 2D. Bits p, n, & i = 1(set to 1). 40 0017 J CLOOP 3F2FEC Operand address is 0006, PC is 0001A. Displacement is 6 1A = 14 (FEC in 2s complement).

Contd
Base relative: 12 LDB #LENGTH 13 BASE LENGTH Declare value of base register. Address of identifier LENGTH (0033). Directives BASE& NOBASE do not generate code. 160 104E STCH BUFFER,X 57C003 Address of BUFFER is 0036. Contents of BASE are 0033. Displacement 0036- 0033= 0003. Note: Bits x& b are 1.

Contd
Immediate addressing 55 0020 LDA #3 01003

Operand (= 3) part of instruction. Bit i = 1, indicates immediate addressing. 133 103C +LDT #4096 75101000 Operand (4096) > 12 bits. + char indicates extended format (bit e = 1). 12 0003 LDB #LENGTH 69202D Directive # is address-of operator.

Program Relocation

SESSION 7

Program Relocation

Absolute Program : Program with starting address specified at assembly time. Program relocation: Programs with absolute addresses must be loaded at a specific starting address. so that they can be loaded and execute correctly at any place in the memory. The address may be invalid if the program is loaded into some where else.

To have relocatable programs Assembler identifies object records that must be modified. Loader modifies these records.

Contd
Need for Program Relocation:

To increase the productivity of the machine Want to load and run several programs at the same time (multiprogramming) Must be able to load programs into memory wherever there is room Actual starting address of the program is not known until load time

Contd
Example :
Consider the following instructions Instruction +JSUB RDREC Instruction STL RETADR Assembler inserts address of RDREC relative to start of program. Assembler instructs loader to add programs beginning address to address of field in JSUB instruction at load time.

Contd
Modification Record:

When the assembler generate an address for a symbol, the address to be inserted into the instruction is relative to the start of the program. The assembler also produces a modification record, in which the address and length of the need-to-be-modified address field are stored. The loader, when seeing the record, will then add the beginning address of the loaded program to the address field stored in the record.

Contd

Instructions need to be modified:


The address portion of those instructions that use absolute (direct) addresses.

Instructions need not be modified:


Immediate addressing (no memory references) Register-to-register instructions (no memory references) PC or base-relative addressing (relative displacement remains the same regardless of different starting addresses)

Contd
Modification Record
Col. 1 M Col. 27 Starting location of the address field to be modified, relative to the beginning of the program (hex) Col. 89 Length of the address field to be modified in half-bytes.

ExampleJSUB RDREC Instruction


Instruction JSUB RDREC assembles into 4B101036. Starts at address 0006. Modification record M00000705. Load address to be added to field at relative address, 00007. Field to be modified is 5 half-bytes long (20 bits).

Contd

Fig 1.6 Examples of Relocation Program

Machine Independent Feature

SESSION 7

Literals

Literal Operand whose value appears literally (constant) in instruction. Identified by the prefix = C chars (1 per byte); X hexadecimals (2 per byte). Assembler defines constant in memory. Operand becomes reference to this location. Literal pools Literals are assembled into literal pools. LTORG creates literal pool and inserts accumulated literals. Ensures short addresses are valid. Duplicate Literals Assembler must recognize duplicate literals and store only one copy of the specified data value . Special literals (e.g., =*) must be duplicated.

Literal - Implementation

LITTAB
Literal name, the operand value and length, the address assigned to the operand

Pass 1

Build LITTAB with literal name, operand value and length, leaving the address unassigned When LTORG statement is encountered, assign an address to each literal not yet assigned an address
search LITTAB for each literal operand encountered generate data values using BYTE or WORD statements generate modification record for literals that represent an address in the program

Pass 2

Symbols

Labels on instructions or data areas EQU Directive symbol EQU value Creates entry in symbol table (SYMTAB) & assigns value to it. Value may be expression involving constants and symbols previously defined. ORG Directive ORG value Resets LOCCTR to value specified.

Contd
Examples Simple constants
MAXLEN EQU 4096 ... +LDT #MAXLEN

Array of records
STAB RESB 1100 ORG STAB SYMBOL RESB 6 VALUE RESB 1 FLAGS RESB 2 ORG STAB+1100 ... LDA VALUE,X

For an ordinary two-pass assembler, all symbols must be defined during Pass 1. Hence, the following sequences could not be processed by an ordinary two-pass assembler. All terms used to specify the value of the new symbol must have been defined previously in the program. BETA EQU ALPHA ALPHA RESW 1 BETA EQU ALPHA ALPHA RESW 1 Allowed Disallowed ORG ALPHA BYTE1 RESB 1 BYTE2 RESB 1 BYTE3 RESB 1 ORG ALPHA RESB 1 Disallowed

Expressions

Expression may use constants, user-defined terms, special terms.


Location counter is one such special term.

Expressions can be classified as absolute expressions or relative expressions Absolute vs. Relative Expressions
An absolute expression is independent of program location. Expressions that only contain absolute terms are absolute. The difference of two relative terms is absolute. Expressions with pairs of relative terms with opposite signs are absolute. The absolute expression may contains relative terms provided the relative terms occur in pairs and the terms in each such pair have opposite signs. No relative term can enter multiplication or division operation. e.g. MAXLEN EQU BUFEND-BUFFER

Contd
A relative expression depends on program location.
The value of a relative expression is relative to the beginning address of the object program. All of the relative terms except one have opposite signs. The remaining relative term is positive. A relative expression is one in which all of the relative terms except one can be paired as described above. The remaining unpaired term must have a positive sign. No relative term can enter multiplication or division operation. BUFEND+BUFFER, 100-BUFFER, and 3*BUFFER are neither relative expressions nor absolute expressions.

Expressions that are neither relative nor absolute should be flagged by the assembler as errors. Symbol table entries must be tagged as relative or absolute.

Contd
Example Consider some of the symbols
RETADR LENGTH BUFFER BUFEND MAXLEN RESW RESW RESB EQU EQU 1 1 4096 * BUFFEND-BUFFER

Symbol
RETADR LENGTH BUFFER BUFEND MAXLEN

Type
R R R R A

Value
0030 0033 0036 1036 1000

Program Blocks

Definition

Code segments that are rearranged within a single object program unit.

Control Sections Code segments that are translated into independent object program units. USE Directive USE [ Block_Name] Indicates which portions of program belong to various blocks: Default unnamed block, or Named block. Used to reduce addressing problems in a program. Rearranged at link time or load time.

If no USE statements are included, the entire program belongs to this single block unit.

Program Blocks - Implementation

Pass 1

Each program block has a separate location counter . Each label is assigned an address that is relative to the start of the block that contains it . At the end of Pass 1, the latest value of the location counter for each block indicates the length of that block . The assembler can then assign to each block a starting address in the object program . The address of each symbol can be computed by adding the assigned block starting address and the relative address of the symbol to that block .

Pass 2

Contd

Each source line is given a relative address assigned and a block number Example Block Table Block Name Name Address Length (default) 0 0000 0066 CDATA 1 0066 000B CBLKS 2 0071 1000

Program Linking

Control Sections

Code segments translated into independent object program units. Each section can be loaded & relocated independently. A section is made one or more related routines. Sections must be linked together to form a program.

CSECT Directive

label CSECT
Starts and names a new control section.

External Definition and References

External definition EXTDEF name [, name] EXTDEF names symbols that are defined in this control section and may be used by other sections External reference EXTREF name [,name] EXTREF names symbols that are used in this control section and are defined elsewhere

Contd

EXTREF Directive EXTREF symbol(,symbol)* EXTDEF symbol(,symbol)* Example 15 0003 CLOOP +JSUB 160 0017 +STCH 190 0028 MAXLEN WORD

RDREC BUFFER,X BUFEND-BUFFER

4B100000 57900000 000000

Implementation
The assembler must include information in the object program that will cause the loader to insert proper values where they are required
Object File Records

Define record

Col. 1 D Col. 2-7 Name of external symbol defined in this control section Col. 8-13 Relative address within this control section (hexadeccimal) Col.14-73 Repeat information in Col. 2-13 for other external symbols

Refer record

Col. 1 Col. 2-7 Col. 8-73

D Name of external symbol referred to in this control section Name of other external reference symbols

Contd

Modification record (New & Improved) Col. 1 M Col. 2-7 Starting address of the field to be modified (hexiadecimal) Col. 8-9 Length of the field to be modified, in half-bytes (hexadeccimal) Col. 10 Modification flag (+ or ). Col.11-16 External symbol whose value is to be added to or subtracted from the indicated field Note: control section name is automatically an external symbol, i.e. it is available for use in Modification records.

Assembler Design

Assembler Design can be done in:


Single pass Two pass Does everything in single pass Cannot resolve the forward referencing

One Pass Assembler:


Contd

Multi pass assembler:


Does the work in two pass Resolves the forward references

First pass:

Scans the code Validates the tokens Creates a symbol table

Second Pass:

Solves forward references Converts the code to the machine code

One Pass Assembler

Problems in One-pass assembler


Forward references to Data items Forward references to labels on instructions

Solution
Require all such areas be defined before they are referenced Labels on instructions: no good solution

Two types of one-pass assembler


Load-and-go Produce code for immediate execution. The other Produce code for later execution

Load-and-go Assembler
Characteristics

Useful for program development and testing Avoids the overhead of writing the object program out and reading it back Both one-pass and two-pass assemblers can be designed as load-and-go. However one-pass also avoids the over head of an additional pass over the source program For a load-and-go assembler, the actual address must be known at assembly time, we can use an absolute program

Multi-Pass Assemblers
Restriction on EQU and ORG No forward reference, as symbols value cant be defined during the first pass . Example: ALPHA EQU BETA BETA EQU DELTA DELTA RESW 1 Assemblers with 2 passes cannot resolve .

Contd

Resolve forward references with as many passes as needed


Portions that involve forward references in symbol definition are saved during Pass 1. Additional passes through stored definitions. Finally a normal Pass 2. Use link lists to keep track of whose value depend on an undefined symbol.

Example implementation:

Implementation example: Microsoft MASM Assembler

SEGMENT

a collection segments, each segment is defined as belonging to a particular class, CODE, DATA, CONST, STACK registers: CS (code), SS (stack), DS (data), ES, FS, GS similar to program blocks in SIC e.g. ASSUME e.g. MOVE
MOVE ES:DATASEG2

ASSUME

AX, DATASEG2
ES,AX

similar to BASE in SIC

Contd

JUMP with forward reference


near jump: 2 or 3 bytes far jump: 5 bytes e.g. JMP TARGET Warning: JMP FAR PTR TARGET Warning: JMP SHORT TARGET Pass 1: reserves 3 bytes for jump instruction phase error similar to EXTDEF, EXTREF in SIC

PUBLIC, EXTRN

S-ar putea să vă placă și