Documente Academic
Documente Profesional
Documente Cultură
SIC is a hypothetical computer that includes the hardware features most often
found on real machines .SIC comes in two versions
•The standard model
•An XE version
“extra equipments”, “extra expensive”
The two versions have been designed to be upward compatible
SIC/XE Machine Architecture:-
Memory:
•8-bit bytes
•Subroutine linkage
•JSUB, RSUB: return address in register L
•Input and output
•Performed by transferring 1 byte at a time to or from the rightmost 8
bits of register A
•Each device is assigned a unique 8-bit code, as an operand of I/O
instructions
•Test Device (TD): < (ready), = (not ready)
•Read Data (RD), Write Data (WD)
SIC Programming Example
Arithmetic operations: BETA = ALPHA+INCR-1
LDA ALPHA
ADD INCR
SUB ONE
STA BETA
LDA GAMMA
ADD INCR
SUB ONE
STA DELTA
...
ONE WORD 1 one-word constant
ALPHA RESW 1 one-word variables
BETA RESW 1
GAMMA RESW 1
DELTA RESW 1
INCR RESW 1
Looping and indexing: copy one string to another
Memory:
• Maximum memory available on a SIC/XE system is 1 megabyte (220 bytes)
• An address (20 bits) cannot be fitted into a 15-bit field as in SIC Standard
• Must change instruction formats and addressing modes
Registers:
Additional registers are provided by SIC/XE
1 11 36
s exponent fraction
f*2(e-1024)
Instruction formats:
8
Format 1 (1 byte) op
8 4 4
Format 2 (2 bytes) op r1 r2
6 1 1 1 1 1 1 12
Format 3 (3 bytes) op n i x b p e disp
6 1 1 1 1 1 1 20
n i x b p e
opcode 1 0 disp
n i x b p e
opcode 0 1 disp
n i x b p e
opcode 0 0 disp
n i x b p e
opcode 1 0 0 disp
n i x b p e
opcode 0 1 0 disp
n i x b p e
opcode 1 0 0 disp
n i x b p e
opcode 0 0 disp
n i x b p e
opcode 1 1 disp
BETA=ALPHA+INCR-1
LDS INCR
LDA ALPHA
ADDR S,A
SUB #1
STA BETA
...
...
ALPHA RESW 1 one-word variables
DELTA RESW 1
INCR RESW 1
Looping and Indexing operation
LDT #11
LDX #0
MOVECH LDCH STR1, X
STCH STR2, X
TIXR T
JLT MOVECH
STR1 BYTE C ‘HELLO WORLD’
STR2 RESB 11
Assemblers
An assembler is a program that accepts assembly program as
input and produces its machine language equivalent along with
information for the loader.
4.Specify algorithm
6.Repeat 1 to 5 on modules
Statement of problem
BETA = ALPHA+INCR-1
Sum Start 0
LDA ALPHA 00 0 ?
ADD INCR
SUB ONE
STA BETA
...
ONE WORD 1
ALPHA RESW 1
BETA RESW 1
INCR RESW 1
End
Data structure
1. A table, the Machine operation table(MOT),that indicates for each instruction (a)symbolic
mnemonic (b)length (c)binary machine opcode and (d)format.
2. A table, the Pseudo-operation table(POT),that indicates for each pseudo-op the symbolic
mnemonic and the action to be taken.
3. A table, the Symbol table (ST) that is used to store each literal encountered and its
corresponding value.
4. A table, the Literal table that is used to store each literal encountered and its corresponding
assigned location.
Format for data bases
1. Machine-Operation table
Mnemonic Binary op- Instruction Instruction ……..
s op-code code length format
“LDA” 00 10 00 ……..
“ADDR” 90 10 01 ……..
…….. …….. …….. …….. ……..
3. Symbol Table
Source
program
Intermediate
Pass 1 Pass 2 Object
file codes
• Text
Col.1 T
Col.2~7 Starting address in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code (69-10+1)/6=10 instructions
• End
Col.1 E
Col.2~7 Address of first executable instruction (hex)
(END program_name)
The Object Code for COPY
SIC/XE
• PC-relative or Base-relative addressing: op m
• Indirect addressing: op @m
• Immediate addressing: op #c
• Extended format: +op m
• Index addressing: op m,x
• register-to-register instructions
• larger memory -> multi-programming (program allocation)
Translation
Register translation
• register name (A, X, L, B, S, T, F, PC, SW) and their values
(0,1, 2, 3, 4, 5, 6, 8, 9)
• preloaded in SYMTAB
Address translation
• Most register-memory instructions use program counter relative or base
relative addressing
• Format 3: 12-bit address field
• base-relative: 0~4095
• pc-relative: -2048~2047
• Format 4: 20-bit address field
Chap 2
PC-Relative Addressing Modes
• TA=RETADR=0030
• TA=(PC)+disp=002D+0003
Program Relocation
Except for absolute address, the rest of the instructions need not be modified
• not a memory address (immediate addressing)
• PC-relative, Base-relative
The only parts of the program that require modification at load time are those
that specify direct addresses
Modification record
Col 1 M
Col 2-7 Starting location of the address field to be
modified, relative to the beginning of the program
Col 8-9 length of the address field to be modified, in half-
bytes
Literals
• Let programmers to be able to write the value of a constant operand as a part of
the instruction that uses it.
• This avoids having to define the constant elsewhere in the program and make up a
label for it.
• Immediate Operands
The operand value is assembled as part of the machine instruction
e.g. 55 0020 LDA #3 010003
• Literals
The assembler generates the specified value as a constant at some other
memory location
e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
Literal - Implementation
• LITTAB
literal name, the operand value and length, the address assigned to the
operand
• Pass 1
– build LITTAB with literal name, operand value and length, leaving the
address unassigned
– when LTORG statement is encountered, assign an address to each
literal not yet assigned an address
• Pass 2
– search LITTAB for each literal operand encountered
– generate data values using BYTE or WORD statements
– generate modification record for literals that represent an address in
the program
Symbol-Defining Statements
Chap 2
Program Readability
• Program readability
– No extended format instructions on lines 15, 35, 65
– No needs for base relative addressing (line 13, 14)
– LTORG is used to make sure the literals are placed ahead of any large data
areas (line 253)
• Object code
– It is not necessary to physically rearrange the generated code in the object
program
Control Sections and Program Linking
• Control Sections
– are most often used for subroutines or other logical subdivisions of a
program
– the programmer can assemble, load, and manipulate each of these
control sections separately
– instruction in one control section may need to refer to instructions or data
located in another section
– because of this, there should be some means for linking control sections
together
External Definition and References
• External definition
– EXTDEF name [, name]
– EXTDEF names symbols that are defined in this control section and may be used
by other sections
• External reference
– EXTREF name [,name]
– EXTREF names symbols that are used in this control section and are defined
elsewhere
• Example
– 15 0003 CLOOP +JSUB RDREC 4B100000
– 160 0017 +STCH BUFFER,X 57900000
– 190 0028 MAXLEN WORD BUFEND-BUFFER 000000
Implementation
• The assembler must include information in the object program that will cause the
loader to insert proper values where they are required
• Define record
– Col. 1 D
– Col. 2-7 Name of external symbol defined in this control section
– Col. 8-13 Relative address within this control section (hexadeccimal)
– Col.14-73 Repeat information in Col. 2-13 for other external symbols
• Refer record
– Col. 1 D
– Col. 2-7 Name of external symbol referred to in this control section
– Col. 8-73 Name of other external reference symbols
Modification Record
• Modification record
– Col. 1 M
– Col. 2-7 Starting address of the field to be modified (hexiadecimal)
– Col. 8-9 Length of the field to be modified, in half-bytes
(hexadeccimal)
– Col.11-16 External symbol whose value is to be added to or subtracted
from the indicated field
– Note: control section name is automatically an external symbol, i.e. it is
available for use in Modification records.
• Example
– Figure 2.17
– M00000405+RDREC
– M00000705+COPY
External References in Expression
• Earlier definitions
– required all of the relative terms be paired in an expression (an
absolute expression), or that all except one be paired (a relative
expression)
• New restriction
– Both terms in each pair must be relative within the same control
section
– Ex: BUFEND-BUFFER
– Ex: RDREC-COPY
• In general, the assembler cannot determine whether or not the
expression is legal at assembly time. This work will be handled by a
linking loader.
One-Pass Assemblers
• Main problem
– forward references
• data items
• labels on instructions
• Solution
– data items: require all such areas be defined before they are referenced
– labels on instructions: no good solution
• Two types of one-pass assembler
– load-and-go
• produces object code directly in memory for immediate execution
– the other
• produces usual kind of object code for later execution
Load-and-go Assembler
• Characteristics
– Useful for program development and testing
– Avoids the overhead of writing the object program out and reading it back
– Both one-pass and two-pass assemblers can be designed as load-and-go.
– However one-pass also avoids the over head of an additional pass over
the source program
– For a load-and-go assembler, the actual address must be known at
assembly time, we can use an absolute program
• For any symbol that has not yet been defined
1. omit the address translation
2. insert the symbol into SYMTAB, and mark this symbol undefined
3. the address that refers to the undefined symbol is added to a list of forward
references associated with the symbol table entry
4. when the definition for a symbol is encountered, the proper address for the
symbol is then inserted into any instructions previous generated according to
the forward reference list
• When external working-storage devices are not available or too slow (for the
intermediate file between the two passes
• Solution:
– When definition of a symbol is encountered, the assembler must generate
another Tex record with the correct operand address
– The loader is used to complete forward references that could not be handled
by the assembler
– The object program records must be kept in their original order when they
are presented to the loader
Loader and Linker
A Loader should perform following three functions-
1. Loading: loading an object program into memory for execution.
2. Relocation: modify the object program so that it can be loaded at an address from the location
originally specified.
3. Linking: combines two or more separate object programs and supplies the information needed
to allow references between them.
A loader is a system program that performs the loading function. Many loaders also support
relocation and linking. Some systems have a linker to perform the linking and a separate loader to
handle relocation and loading.
Absolute Loader
1. An object program is loaded at the address specified on the START directive.
2. No relocation or linking is needed
3. Thus is very simple
No text record corresponds here.
XXX indicates that the previous
contents of these locations remain
unchanged.
Absolute Loader Implementation
“14” occupies two bytes if
it is represented in char form.
• Two methods to describe where in the object program to modify the address (add
the program starting address)
– Use modification records
• Suitable for a small number of changes
– Use relocation bit mask
• Suitable for a large number of changes
Program Written in SIC/XE
PC-relative
Direct addressing
Direct addressing
Direct addressing
• Program A
– The address of REF4 is 4054 (4000 + 54)
because program A is loaded at 4000 and the
relative address of REF4 within program A is 54.
– The value of REF4 is 004126 because
• The address of LISTC is 0040E2 (the loaded address
of program C) + 000030 (the relative address of
LISTC in program C)
• 0040E2 + 000014 (constant already calculated) =
004126.
REF4 after Linking
• Program B
– The address of REF4 is 40D3 (4063 + 70)
because program B is loaded at 4063 and the
relative address of REF4 within program A is 70.
– The value of REF4 is 004126 because
• The address of LISTC is 004112
• The address of ENDA is 004054
• The address of LISTA is 004040
• 004054 + 004112 – 004040 = 004126
Instruction Operands
• For references that are instruction operands, the calculated
values after loading do no always appear to be equal.
• This is because there is an additional address calculation step
involved for program-counter (base) relative instructions.
• In such cases, it is the target addresses that are the same.
• For example, in program A, the reference REF1 is a program-
counter relative instruction with displacement 1D. When this
instruction is executed, the PC contains the value 4023.
Therefore the resulting address is 4040. In program B, because
direct addressing is used, 4040 (4000 + 40) is stored in the
loaded program for REF1.
The Implementation of a Linking
Loader
• A linking loader makes two passes over its
input
– In pass 1: assign addresses to external references
– In pass 2: perform the actually loading, relocation,
and linking
• Very similar to what a two-pass assembler
does.
Data Structures
• External symbol tables (ESTAB)
– Like SYMTAB, store the name and address of each external
symbol in the set of control sections being loaded.
– It needs to indicate in which control section the symbol is defined.
• PROGADDR
– The beginning address in memory where the linked program is to
be loaded. (given by the OS)
• CSADDR
– It contains the starting address assigned to the control section
currently being scanned by the loader.
– This value is added to all relative addresses within the control
sections.
Algorithm
• During pass 1, the loader is concerned only with HEADER and
DEFINE record types in the control sections to build ESTAB.
• PROGADDR is obtained from OS.
• This becomes the starting address (CSADDR) for the first control
section.
• The control section name from the header record is entered into
ESTAB, with value given by CSADDR.
• All external symbols appearing in the DEFINE records for the
current control section are also entered into ESTAB.
• Their addresses are obtained by adding the value (offset) specified
in the DEFINE to CSADDR.
• At the end, ESTAB contains all external symbols defined in the set
of control sections together with the addresses assigned to each.
• A Load Map can be generated to show these symbols and their
addresses.
A Load Map
Algorithm (Cont’d)
• During pass 2, the loader performs the actual loading,
relocation, and linking.
• CSADDR is used in the same way as it was used in pass 1
– It always contains the actual starting address of the control
section being loaded.
• As each text record is read, the object code is moved to
the specified address (plus CSADDR)
• When a modification record is encountered, the symbol
whose value is to be used for modification is looked up in
ESTAB.
• This value is then added to or subtracted from the
indicated location in memory.
Reference Number
• The linking loader algorithm can be made more efficient if
we assign a reference number to each external symbol
referred to in a control section.
• This reference number is used (instead of the symbol
name) in modification record.
• This simple technique avoid multiple searches of ESTAB for
the same symbol during the loading of a control section.
– After the first search for a symbol (the REFER records), we put the
found entries into an array.
– Later in the same control section, we can just use the reference
number as an index into the array to quickly fetch a symbol’s
value.
Reference Number Example
Linkage Editor
• When the user is ready to run the linked program, a simple relocating
loader can be used to load the program into memory.
• The only object code modification necessary is the addition of an actual
address to relative values within the program.
• The linkage editor performs relocation of all control sections relative to
the start of the linked program.
• All items that need to be modified at load time have values that are relative
to the start of the linked program.
• This means that the loading can be accomplished in one pass with no external
symbol table required.
• Thus, if a program is to be executed many times without being reassembled,
the use of a linkage editor can substantially reduces the overhead required.
– Resolution of external references and library searching are only
performed once.
Dynamic Linking
• Linkage editors perform linking before the program is loaded for execution.
• Linking loaders perform these same operations at load time.
• Dynamic linking postpones the linking function until execution time.
– A subroutine is loaded and linked to the test of the program when it is first
called.
• Dynamic linking is often used to allow several executing programs to share one
copy of a subroutine or library.
• For example, a single copy of the standard C library can be loaded into memory.
• All C programs currently in execution can be linked to this one copy, instead of
linking a separate copy into each object program.
• In an object-oriented system, dynamic linking is often used for references to
software object.
• This allows the implementation of the object and its method to be determined at
the time the program is run. (e.g., C++)
• The implementation can be changed at any time, without affecting the program
that makes use of the object.
Dynamic Linking Advantage
• The subroutines that diagnose errors may never
need to be called at all.
• However, without using dynamic linking, these
subroutines must be loaded and linked every time
the program is run.
• Using dynamic linking can save both space for storing
the object program on disk and in memory, and time
for loading the bigger object program.
On PC Windows or UNIX operating
systems, normally you are using (e.g., ld)
a linkage editor to generate an
executable program.
Dynamic Linking Implementation
• A subroutine that is to be dynamically loaded must be called via an operating
system service request.
– This method can also be thought of as a request to a part of the loader that is
kept in memory during execution of the program
• Instead of executing a JSUB instruction to an external symbol, the program makes
a load-and-call service request to the OS.
• The parameter of this request is the symbolic name of the routine to be called.
• The OS examines its internal tables to determines whether the subroutine is
already loaded.
• If needed, the subroutine is loaded from the library.
• Then control is passed from the OS to the subroutine being called.
• When the called subroutine completes its processing, it returns to its caller
(operating system).
• The OS then returns control to the program that issues the request.
• After the subroutine is completed, the memory that was allocated to it may be
released.
• However, often this is not done immediately. If the subroutine is retained in
memory, it can be used by later calls to the same subroutine without loading the
same subroutine multiple times.
• Control can simply pass from the dynamic loader to the called routine directly.
Implementation Example
Source Target
Program Compiler
Program
Source
Program
Interpreter Output
Input
Error messages
157
Compilers and Interpreters
(cont’d)
• Compiler: a program that translates an
executable program in one language into an
executable program in another language
158
The Analysis-Synthesis Model of
Compilation
• There are two parts to compilation:
– Analysis
• Breaks up source program into pieces and imposes a
grammatical structure
• Creates intermediate representation of source
program
• Determines the operations and records them in a tree
structure, syntax tree
• Known as front end of compiler
159
The Analysis-Synthesis Model of
Compilation (cont’d)
– Synthesis
• Constructs target program from intermediate
representation
• Takes the tree structure and translates the
operations into the target program
• Known as back end of compiler
160
Other Tools that Use the Analysis-
Synthesis Model
• Editors (syntax highlighting)
• Pretty printers (e.g. Doxygen)
• Static checkers (e.g. Lint and Splint)
• Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
161
A language-processing system
Skeletal Source Program
Preprocessor
Source Program
Try for example:
Compiler
gcc -v myprog.c
Target Assembly Program
Assembler
Relocatable Object Code
Libraries and
Linker
Relocatable Object Files
163
Lexical analysis
• Characters grouped into tokens.
164
Syntax analysis (Parsing)
• Grouping tokens into grammatical phrases
• Character groups recorded in symbol table
• Represented by a parse tree
165
Syntax analysis (cont’d)
• Hierarchical structure usually expressed by
recursive rules
• Rules for definition of expression:
166
Semantic analysis
• Checks source program for semantic errors
• Gathers type information for subsequent code
generation (type checking)
• Identifies operator and operands of
expressions and statements
167
Phases of a compiler
168
Symbol-Table Management
• Symbol table – data structure with a record
for each identifier and its attributes
• Attributes include storage allocation, type,
scope, etc
• All the compiler phases insert and modify the
symbol table
169
Intermediate code generation
• Program representation for an abstract
machine
• Should have two properties
– Easy to produce
– Easy to translate into target program
• Three-address code is a commonly used form
– similar to assembly language
170
Code optimization and generation
• Code Optimization
– Improve intermediate code by producing code
that runs faster
• Code Generation
– Generate target code, which is machine code or
assembly code
171
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis Parse tree or abstract syntax tree ;
|
based on the grammar of the =
programming language) / \
A +
/ \
B C
172
The Grouping of Phases
• Compiler front and back ends:
– Front end:
• Analysis steps + Intermediate code generation
• Depends primarily on the source language
• Machine independent
– Back end:
• Code optimization and generation
• Independent of source language
• Machine dependent
173
The Grouping of Phases (cont’d)
• Compiler passes:
– A collection of phases is done only once (single pass) or
multiple times (multi pass)
• Single pass: reading input, processing, and producing output by
one large compiler program; usually runs faster
• Multi pass: compiler split into smaller programs, each making a
pass over the source; performs better code optimization
174