Documente Academic
Documente Profesional
Documente Cultură
Module 2
Execution of complete instruction
To execute an instruction processor has to perform following steps
Fetch the contents of the memory location pointed to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR [[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase).
PC [PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase).
In general execution of instruction involves
Transfer a word of data from one processor register to another or to the ALU.
Perform an arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor register.
Store a word of data from a processor register into a given memory location.
Consider the single bus organization of datapath shown in figure below.Consider the instruction ADD
(R3),R1 which adds contents of memory location pointed to by R3 to register R1.Execution of this
instruction involves following actions
Fetch the instruction
Fetch the first operand (the contents of the memory location pointed to byR3)
Perform the addition
Load the result into R1
In step1 instruction fetch operation is initiated by loading cntents of PC intoMAR and sending a read
request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This value
is added to operand at input B which is content of PC and result is stored in register Z.The updated value
is moved from register Z back to PC during step 2 while waiting for memory function to complete.In
step 3 word fetched from memory is loaded into IR.
Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents of
IR at beginning of step 4.This enable the control circuitry to activatecontrol signals for steps 4-7 which
constitute execution phase.The contents of register R3 are transferred to MAR in step 4 and a memory
read operation is initiated.Then the contents of R1 are transferred to register Y in step 5 to prepare for
addition operation.When the read opearation is completed the memory operand is available in register
MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thus
also the B input of ALU and register Y is selected as second input of ALU by choosing select y.The sum
is stored in register Z then transferred to R1 in step 7.The End instructionncauses a new instruction fetch
cycle to begin by returning to step 1.
WMC stands for Wait for Memory operation to Complete. Generally the addressed device on the
memory bus is slower than the microprocessor. Therefore,the microprocessor has to wait for the
addressed device to complete its operation. This indication that the memory operation has been
completed is given to the processor by the control signal WMC.
Execution of Branch Instructions
Unconditional branch
A branch instruction replaces the contents of PC with branch target address.The address is obtained
by adding an offset X which is given in instruction to updated value of PC.
Processing of unconditional branch instruction begins with fetch phase. This phase ends when an
instruction is loaded into IR in step 3.The offset value is extracted from IR by instruction decoding
circuit. Since the value of updated PC is already available in register Y the offset X is gated onto bus in
step 4 and an addition operation is performed. The result which is branch target address is loaded into PC
in step 5.The offset X is difference between branch target address immediately following branch
instruction.
Conditional branch
For a conditional branch check the status of condition codes before loading new
value into PC.
If n=0 processor returns to step 1 else step 5 is performed to load a new value into PC.
Single bus Organisation
In step1 instruction fetch operation is initiated by loading contents of PC into MAR and sending a read
request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This value
is added to operand at input B which is content of PC and result is stored in register Z. The updated value
is moved from register Z back to PC during step 2 while waiting for memory function to complete. In
step 3 word fetched from memory is loaded into IR.
Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents of
IR at beginning of step 4.This enable the control circuitry to activate control signals for steps 4-7 which
constitute execution phase. The contents of register R3 are transferred to MAR in step 4 and a memory
read operation is initiated. Then the contents of R1 are transferred to register Y in step 5 to prepare for
addition operation. When the read operation is completed the memory operand is available in register
MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thus
also the B input of ALU and register Y is selected as second input of ALU by choosing select y. The sum
is stored in register Z then transferred to R1 in step 7.The End instruction causes a new instruction fetch
cycle to begin by returning to step 1.
Multi bus Organization
In a multibus organization several data transfers can take place in parallel. Three bus structures are used
to connect registers and ALU of a processor. All general purpose registers are combined into single
register called register file. The register file has 3 ports. There are two outputs allowing contents of two
registers to be accessed simultaneously and have their contents placed on buses A and B. The third port
allows data on bus C to be loaded into third register during some clock cycle.
Buses A and B are used to transfer source operands to A and B inputs of ALU where arithmetic and logic
operation may be performed. Result is transferred to destination over bus C. Three bus organization
eliminates the need of temporary registers y and z in single bus organization. Another feature of this
organization is introduction of incrementer unit
which is used to increment the PC by 4.Using the incrementer eliminates the need to add 4 to PC using
ALU.The source for constant 4 at ALU input is used to increment other address such as memory address
in load,multiple and store instructions
Control sequence for three bus organisation for the instruction ADD R4,R5,R6
In step 1 contents of PC are passed through ALU using R=B control signal and loaded into MAR to start
a memory read operation. At the same time
PC is incremented by 4.The incremented value is loaded into PC at the end clock cycle. In step 2
processor waits for MFC and loads data received into MDR ,then transfers them to IR in step 3.finally
the execution phase of instruction requires only one control step to complete step 4.
Control Unit
The control unit issues control signals external to the processor to cause data exchange with memory and
I/O modules. The control unit also issues control signals internal to the processor to move data between
registers, to cause the ALU to perform a specified function, and to regulate other internal operations. The
control unit performs two basic tasks: Sequencing: The control unit causes the processor to step through
a series of micro-operations in the proper sequence, based on the program being executed. Execution:
The control unit causes each micro-operation to be performed.
Figure is a general model of the control unit, showing all of its inputs and outputs. The inputs are
Clock: This is how the control unit keeps time. The control unit causes one micro-operation (or a set
of simultaneous micro-operations) to be performed for each clock pulse.This is sometimes referred to as
the processor cycle time, or the clock cycle time.
Instruction register: The opcode and addressing mode of the current instruction are used to determine
which micro-operations to perform during the execute cycle.
Flags: These are needed by the control unit to determine the status of the processor and the outcome of
previous ALU operations. For example, for the increment-and-skip-if-zero (ISZ) instruction, the control
unit will increment the PC if the zero flag is set.
Control signals from control bus: The control bus portion of the system bus provides signals to the
control unit. The outputs are as follows:
Control signals within the processor: These are two types: those that cause data to be moved from one
register to another, and those that activate specific ALU functions.
Control signals to control bus: These are also of two types: contro l signals tomemory, and control
signals to the I/O modules. Techniques for control unit implementation
Hardwired implementation
Microprogrammed implementation
Hardwired control
In a hardwired implementation, the control unit is essentially a state machine circuit. Its input logic
signals are transformed into a set of output logic signals, which are the control signals.Control unit is a
combinational circuit that generates the required control outputs depending on state of all its inputs.Basic
block diagram of hardwired control unit is shown below.
The control unit makes use of the opcode and will perform different actions (issue a different
combination of control signals) for different instructions. To simplify the control unit logic, there should
be a unique logic input for each opcode. This function can be performed by a decoder,which takes an
encoded input and produces a single output. In general, a decoder will have n binary inputs and 2n binary
outputs. Each of the 2n different input patterns will activate a single unique output. The clock portion of
the control unit issues a repetitive sequence of pulses. This is useful for measuring the duration of microoperations. Essentially, the period of the clock pulses must be long enough to allow the propagation of
signals along data paths and through processor circuitry.A counter is used to keep track of control
steps.Each count of this counter corresponds to control step.The required control signals are determined
by
contents of control step counter
contents of instruction register
contents of condition code flags
External input signals and interrupt requests
By separating decoding and encoding functions more detailed block diagram is shown below
10
RUN control signal when set to 1,RUN causes counter to be incremented by one at the end of every
clock cycle .When RUN equal to zero counter stops counting.
Generation of Zin control signal
11
Advantage
Hardwired system can operate at high speed
Disadvantage
Little flexibility
Application
Used in RISC processor
Micro programmed Control
In a micro programmed control unit the logic of the control unit is specified by a microprogram.A micro
program consists of a sequence of instructions in a microprogramming language. A micro program
consists of sequence of instructions in micro programming language similar to machine language. These
are very simple instructions that specify micro-operations. A micro programmed control unit is a
relatively simple logic circuit that is capable of (1) sequencing through microinstructions and (2)
generating control signals to execute each microinstruction.
12
13
The micro routines for all instructions in the instruction set of computer are stored in special memory
called controlled store or control memorythe control unit can generate the control signals for any
instruction by reading Control word of corresponding microroutine from control store.(A control word is
a word whose individual bits represent the various control signals.A sequence of control word
corresponding to control sequence of machine instruction constitute the micro routine for that instruction.
And the individual control words in micro routine are reffered to as micro instructions. To read control
words sequentially from control store a microprogram counter is used.every time a new micro instruction
is loaded into IR the output of starting address generator is loaded into PC. PC is then automatically
incremented by clock causing successive micro instructions to be read from control store.hence the
control signals are delivered to various parts of processor in correct sequence. To support microprogram
branching organisation of control unit is modified as follows
Starting and branch address generator block loads a new address into PC when a micro instruction
instructs it to do so. PC is incremented every time a new instruction is fetched from control store except
14
in following situations.when a new micro instruction PC is loaded into IR is loded with starting address
of that instruction .when a branch micro instruction is encountered PC is loaded with branch target
address if branch condition is satisfied.when an END micro instruction PC is loaded with address of
first
CW in micro routine for that instruction cycle.
Advantages of micro programmed control unit
Simplifies design of control unit
Cheaper and less error prone to implement
Disadvantage
Slower than hardwired unit
Application
Used in CISC processor
Micro program sequencing
If all micro programs require only straightforward sequential execution of microinstructions except for
branches, letting a PC governs the sequencing would be efficient. However, this has two
disadvantages:
Having a separate micro routine for each machine instruction results in a large total number of
microinstructions and a large control store.
Longer execution time because it takes more time to carry out the required branches.
A powerful alternative approach is to include an address field as a part of every microinstruction to
indicate the location of the next microinstruction to be fetched. Separate branch microinstructions are
virtually eliminated. Microinstructions with Next-Address Field is shown below.
15
The simplest form of representation that employs a sign bit is sign magnitude representation .In an n bit
word rightmost n-1 bits hold magnitude of integer The general representation of signed integer is
16
n-bit adder
Cascade n full adder (FA) blocks to form a n-bit adder.
Carries propagate or ripple through this cascade, n-bit ripple carry adder
17
K n-bit adder
K n-bit numbers can be added by cascading k n-bit adders
n-bit subtractor
X Y is equivalent to adding 2s complement of Y to X.
2s complement is equivalent to 1s complement + 1.
XY=X+Y+1
2s complement of positive and negative numbers is computed similarly.
n-bit adder/subtractor
The two inputs x and y represent the arguments to be added/subtracted. The control input ADD/SUB
determines whether an add or a subtract operation is to be performed such that if the control input is 0
then an add operation is performed while if the control input is 1 then a subtract operation is performed
18
Detecting overflows
Overflows can only occur when the sign of the two operands is the same. Overflow occurs if the sign
of the result is different from the sign of the operands.
xn-1, yn-1, sn-1 represent the sign of operand x, operand y and result s respectively.
Circuit to detect overflow can be implemented by the following logic expressions:
19
Where
20
Thus using a cascade of 2 -two input XOR gate to realize sum the basic cell shown below can be used in
each bit stage
All carries can be obtained 3 gate delays after X, Y and c0 are applied.
-One gate delay for Pi and Gi
21
C4 is available after after 3 gate delays and S3 after 4 gate delays where as a 4 bit ripple carry adder C4 is
available after 8 gate delays and S3 after 7 gate delays. Performing n-bit addition in 4 gate delays
independent of n is good only theoretically because of fan-in constraints.
Last AND gate and OR gate require a fan-in of (n+1) for a n-bit adder.
of 5 is required.
22
And
C16 is obtained as
23
Product of 2 n-bit numbers is at most a 2n-bit number. Unsigned multiplication can be viewed as
addition of shifted versions of the multiplicand.
Multiplication involves the generation of partial products, one for each digit in the multiplier. These
partial products are then summed to produce the final product.
The partial products are easily defined. When the multiplier bit is 0, the partial product is 0.When the
multiplier is 1, the partial product is the multiplicand.
The total product is produced by summing the partial products. For this operation, each successive
partial product is shifted one position to the left relative to the preceding partial product.
Array multiplier
Multiplicand is shifted by displacing it through an array of adders.
24
25
26
The multiplier and multiplicand are loaded into two registers (Q and M).
A third register, the A register, is also needed and is initially set to 0. There is
also a 1-bit C register, initialized to 0, which holds a potential carry bit resulting
from addition. The operation of the multiplier is as follows. Control logic reads
the bits of the multiplier one at a time. If is 1, then the multiplicand is added to
the A register and the result is stored in the A register, with the C bit used for
overflow.Then all of the bits of the C, A, and Q registers are shifted to the right
one bit, so that the C bit goes into goes into and is lost. If is 0, then no addition
is performed,just the shift.This process is repeated for each bit of the original
multiplier. The resulting -bit product is contained in the A and Q
registers.Example is shown below
Signed Multiplication
Considering 2s-complement signed operands, what will happen to (-13)
if following the same method of unsigned multiplication
(+11)
27
28
29
30
Fast Multiplication
Bit-Pair Recoding of Multipliers
Bit-pair recoding halves the maximum number of summands (versions of
the multiplicand).
Bit pair recording is derived from booth multiplier scheme
31
32
33
34
The outputs S4 and C4 from third CS level are available 6 gate delays later assuming two gate delays per
CSA level.The final two vectors can be added in further 8 gate delays using carry look ahead adder.The
total gate delay is there fore 15.by comparison total gate delay in performing this multiplication using
nxn array multiplier is 6(n-1)-1=29( substituten=6).in general 1.7 log2 K - 1.7 t levels of CSA steps are
needed to reduce K summands to 2 vectors which when added produce desired sum.
Issues with carry save method
If negative summnds are involved it is necessary to accommodate sign extension
2n bit CLA is needed to add final S and C vectors.Fewer bits are actually needed.
n summands are used for nxn multiplication. If bit pair recording is used this will be reduced to
n/2.This reduces no:of CSA levels from 1.7 log2 n -1.7 to 1.7 log2n-3.4
Integer division
Manual Division
35
remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back the
divisor, and the divisor is repositioned for another subtraction.
Restoring Division
36
37
Non-restoring Division
Avoid the need for restoring A after an unsuccessful subtraction.
Step 1: Do the following n times
If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A and
Q left and add M to A.
Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
Step2: If the sign of A is 1, add M to A
Example
38
39
IEEE notation assumes that all numbers are normalized so that the MSB of the mantissa is a 1 and does
not store this bit.
So the real MSB of a number in the IEEE notation is either a 0 or a 1.
The values of the numbers represented in the IEEE single precision notation are of the form:
In single precession case normalized representation requires an exponent less than -126 or greater than
127.In the first case underflow occurred and in second case an overflow occurred.This is because the
IEEE uses the exponents -127 and 128 (and -1023 and 1024), that is the actual values 0 and 255 to
represent special conditions:
- Exact zero
- Infinity