Computer Architecture Lecture Notes

1
Module 2
Execution of complete instruction
To execute an instruction processor has to perform following steps
Fetch the contents of the memory location pointed to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR [[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase).
PC [PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase).
In general execution of instruction involves
Transfer a word of data from one processor register to another or to the ALU.
Perform an arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor register.
Store a word of data from a processor register into a given memory location.
Consider the single bus organization of datapath shown in figure below.Consider the instruction ADD
(R3),R1 which adds contents of memory location pointed to by R3 to register R1.Execution of this
instruction involves following actions
Fetch the instruction
Fetch the first operand (the contents of the memory location pointed to byR3)
Perform the addition
Load the result into R1
Control sequence for execution of this instruction is shown below.
In step1 instruction fetch operation is initiated by loading cntents of PC intoMAR and sending a read
request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This value
is added to operand at input B which is content of PC and result is stored in register Z.The updated value
is moved from register Z back to PC during step 2 while waiting for memory function to complete.In
step 3 word fetched from memory is loaded into IR.
Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents of
IR at beginning of step 4.This enable the control circuitry to activatecontrol signals for steps 4-7 which
constitute execution phase.The contents of register R3 are transferred to MAR in step 4 and a memory
read operation is initiated.Then the contents of R1 are transferred to register Y in step 5 to prepare for
addition operation.When the read opearation is completed the memory operand is available in register
MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thus
also the B input of ALU and register Y is selected as second input of ALU by choosing select y.The sum
is stored in register Z then transferred to R1 in step 7.The End instructionncauses a new instruction fetch
cycle to begin by returning to step 1.
WMC stands for Wait for Memory operation to Complete. Generally the addressed device on the
memory bus is slower than the microprocessor. Therefore,the microprocessor has to wait for the
addressed device to complete its operation. This indication that the memory operation has been
completed is given to the processor by the control signal WMC.
Execution of Branch Instructions
Unconditional branch
A branch instruction replaces the contents of PC with branch target address.The address is obtained
by adding an offset X which is given in instruction to updated value of PC.
Processing of unconditional branch instruction begins with fetch phase. This phase ends when an
instruction is loaded into IR in step 3.The offset value is extracted from IR by instruction decoding
circuit. Since the value of updated PC is already available in register Y the offset X is gated onto bus in
step 4 and an addition operation is performed. The result which is branch target address is loaded into PC
in step 5.The offset X is difference between branch target address immediately following branch
instruction.
Conditional branch
For a conditional branch check the status of condition codes before loading new
value into PC.
If n=0 processor returns to step 1 else step 5 is performed to load a new value into PC.
Single bus Organisation
The datapath often consists of the following functional blocks
The Instruction register stores the current instruction to be executed.

The Program Counter (PC) stores the address of the next instruction to be fetched.
Memory address register (MAR) - A register that either stores the memory address from which data
will be fetched to the CPU or the address to which data will be sent and stored.
Memory data register (MDR) - A register of a computer's control unit that contains the data to be
stored in the computer storage (e. g. RAM), or the data after a fetch from the computer storage. Registers
R0Rn-1 are provided for general purpose use.Register Y Z are provided for temporary storage during
execution of some instructions. This organisation is called single bus organization since ALU and all
registers are interconnected via single common bus. The data and address lines of external memory bus
are connected to internal processor bus via memory data register and memory address register MAR
.Register MDR has two inputs and two outputs. Data may be loaded into MDR either from memory bus
or from internal processor bus .The data stored in MDR may be stored in either bus. The input of MAR is
connected to instruction decoder and control block. This unit is responsible for issuing signals that
control the operation of all units inside the processor and for interfacing with memory bus. The
multiplexer selects either the output of register y or a constant value 4 to be provided as input A of ALU.
The constant 4 is used to increment the contents of PC.As instruction execution proceeds data are
transferred from one register to another passing through ALU to perform arithmetic or logic operation.
The instruction decode and control logic unit is responsible for implementing the actions specified by
instructions loaded in IR. Decoder generates the control signals needed to select registers involved and
direct the Transfer of data .The registers ,ALU and interconnecting bus are collectively referred to as
datapath. To execute an instruction processor has to perform following steps
Fetch the contents of the memory location pointed to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR [[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase).
PC [PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase). In general execution of
instruction involves
Transfer a word of data from one processor register to another or to the ALU.
Perform an arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor register.
Store a word of data from a processor register into a given memory location. Control sequence for
execution of this instruction for a single bus organization
In step1 instruction fetch operation is initiated by loading contents of PC into MAR and sending a read
request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This value
is added to operand at input B which is content of PC and result is stored in register Z. The updated value
is moved from register Z back to PC during step 2 while waiting for memory function to complete. In
step 3 word fetched from memory is loaded into IR.
Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents of
IR at beginning of step 4.This enable the control circuitry to activate control signals for steps 4-7 which
constitute execution phase. The contents of register R3 are transferred to MAR in step 4 and a memory
read operation is initiated. Then the contents of R1 are transferred to register Y in step 5 to prepare for
addition operation. When the read operation is completed the memory operand is available in register
MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thus
also the B input of ALU and register Y is selected as second input of ALU by choosing select y. The sum
is stored in register Z then transferred to R1 in step 7.The End instruction causes a new instruction fetch
cycle to begin by returning to step 1.
Multi bus Organization
In a multibus organization several data transfers can take place in parallel. Three bus structures are used
to connect registers and ALU of a processor. All general purpose registers are combined into single
register called register file. The register file has 3 ports. There are two outputs allowing contents of two
registers to be accessed simultaneously and have their contents placed on buses A and B. The third port
allows data on bus C to be loaded into third register during some clock cycle.
Buses A and B are used to transfer source operands to A and B inputs of ALU where arithmetic and logic
operation may be performed. Result is transferred to destination over bus C. Three bus organization
eliminates the need of temporary registers y and z in single bus organization. Another feature of this
organization is introduction of incrementer unit
which is used to increment the PC by 4.Using the incrementer eliminates the need to add 4 to PC using
ALU.The source for constant 4 at ALU input is used to increment other address such as memory address
in load,multiple and store instructions
Control sequence for three bus organisation for the instruction ADD R4,R5,R6
In step 1 contents of PC are passed through ALU using R=B control signal and loaded into MAR to start
a memory read operation. At the same time
PC is incremented by 4.The incremented value is loaded into PC at the end clock cycle. In step 2
processor waits for MFC and loads data received into MDR ,then transfers them to IR in step 3.finally
the execution phase of instruction requires only one control step to complete step 4.
Control Unit
The control unit issues control signals external to the processor to cause data exchange with memory and
I/O modules. The control unit also issues control signals internal to the processor to move data between
registers, to cause the ALU to perform a specified function, and to regulate other internal operations. The
control unit performs two basic tasks: Sequencing: The control unit causes the processor to step through
a series of micro-operations in the proper sequence, based on the program being executed. Execution:
The control unit causes each micro-operation to be performed.
Figure is a general model of the control unit, showing all of its inputs and outputs. The inputs are
Clock: This is how the control unit keeps time. The control unit causes one micro-operation (or a set
of simultaneous micro-operations) to be performed for each clock pulse.This is sometimes referred to as
the processor cycle time, or the clock cycle time.
Instruction register: The opcode and addressing mode of the current instruction are used to determine
which micro-operations to perform during the execute cycle.
Flags: These are needed by the control unit to determine the status of the processor and the outcome of
previous ALU operations. For example, for the increment-and-skip-if-zero (ISZ) instruction, the control
unit will increment the PC if the zero flag is set.
Control signals from control bus: The control bus portion of the system bus provides signals to the
control unit. The outputs are as follows:
Control signals within the processor: These are two types: those that cause data to be moved from one
register to another, and those that activate specific ALU functions.
Control signals to control bus: These are also of two types: contro l signals tomemory, and control
signals to the I/O modules. Techniques for control unit implementation
Hardwired implementation
Microprogrammed implementation
Hardwired control
In a hardwired implementation, the control unit is essentially a state machine circuit. Its input logic
signals are transformed into a set of output logic signals, which are the control signals.Control unit is a
combinational circuit that generates the required control outputs depending on state of all its inputs.Basic
block diagram of hardwired control unit is shown below.
The control unit makes use of the opcode and will perform different actions (issue a different
combination of control signals) for different instructions. To simplify the control unit logic, there should
be a unique logic input for each opcode. This function can be performed by a decoder,which takes an
encoded input and produces a single output. In general, a decoder will have n binary inputs and 2n binary
outputs. Each of the 2n different input patterns will activate a single unique output. The clock portion of
the control unit issues a repetitive sequence of pulses. This is useful for measuring the duration of microoperations. Essentially, the period of the clock pulses must be long enough to allow the propagation of
signals along data paths and through processor circuitry.A counter is used to keep track of control
steps.Each count of this counter corresponds to control step.The required control signals are determined
by
contents of control step counter
contents of instruction register
contents of condition code flags
External input signals and interrupt requests
By separating decoding and encoding functions more detailed block diagram is shown below
10
RUN control signal when set to 1,RUN causes counter to be incremented by one at the end of every
clock cycle .When RUN equal to zero counter stops counting.
Generation of Zin control signal
Generation of END control signal
11
Advantage
Hardwired system can operate at high speed
Disadvantage
Little flexibility
Application
Used in RISC processor
Micro programmed Control
In a micro programmed control unit the logic of the control unit is specified by a microprogram.A micro
program consists of a sequence of instructions in a microprogramming language. A micro program
consists of sequence of instructions in micro programming language similar to machine language. These
are very simple instructions that specify micro-operations. A micro programmed control unit is a
relatively simple logic circuit that is capable of (1) sequencing through microinstructions and (2)
generating control signals to execute each microinstruction.
12
Basic organization of micro programmed control unit
13
The micro routines for all instructions in the instruction set of computer are stored in special memory
called controlled store or control memorythe control unit can generate the control signals for any
instruction by reading Control word of corresponding microroutine from control store.(A control word is
a word whose individual bits represent the various control signals.A sequence of control word
corresponding to control sequence of machine instruction constitute the micro routine for that instruction.
And the individual control words in micro routine are reffered to as micro instructions. To read control
words sequentially from control store a microprogram counter is used.every time a new micro instruction
is loaded into IR the output of starting address generator is loaded into PC. PC is then automatically
incremented by clock causing successive micro instructions to be read from control store.hence the
control signals are delivered to various parts of processor in correct sequence. To support microprogram
branching organisation of control unit is modified as follows
Starting and branch address generator block loads a new address into PC when a micro instruction
instructs it to do so. PC is incremented every time a new instruction is fetched from control store except
14
in following situations.when a new micro instruction PC is loaded into IR is loded with starting address
of that instruction .when a branch micro instruction is encountered PC is loaded with branch target
address if branch condition is satisfied.when an END micro instruction PC is loaded with address of
first
CW in micro routine for that instruction cycle.
Advantages of micro programmed control unit
Simplifies design of control unit
Cheaper and less error prone to implement
Disadvantage
Slower than hardwired unit
Application
Used in CISC processor
Micro program sequencing
If all micro programs require only straightforward sequential execution of microinstructions except for
branches, letting a PC governs the sequencing would be efficient. However, this has two
disadvantages:
Having a separate micro routine for each machine instruction results in a large total number of
microinstructions and a large control store.
Longer execution time because it takes more time to carry out the required branches.
A powerful alternative approach is to include an address field as a part of every microinstruction to
indicate the location of the next microinstruction to be fetched. Separate branch microinstructions are
virtually eliminated. Microinstructions with Next-Address Field is shown below.
15
Arithmetic and logic design

An n bit sequence of binary digits an-1,an-2,.....a1,a0 is interpreted as unsigned integer A as
The simplest form of representation that employs a sign bit is sign magnitude representation .In an n bit
word rightmost n-1 bits hold magnitude of integer The general representation of signed integer is
Addition/subtraction of signed numbers

Addition
At the ith stage:Input:ci is the carry-in Output:si is the sum ci+1 carry-out to (i+1)st state
16
Addition logic for a single stage
n-bit adder
Cascade n full adder (FA) blocks to form a n-bit adder.
Carries propagate or ripple through this cascade, n-bit ripple carry adder
17
K n-bit adder
K n-bit numbers can be added by cascading k n-bit adders
n-bit subtractor
X Y is equivalent to adding 2s complement of Y to X.
2s complement is equivalent to 1s complement + 1.
XY=X+Y+1
2s complement of positive and negative numbers is computed similarly.
n-bit adder/subtractor
The two inputs x and y represent the arguments to be added/subtracted. The control input ADD/SUB
determines whether an add or a subtract operation is to be performed such that if the control input is 0
then an add operation is performed while if the control input is 1 then a subtract operation is performed
18
Detecting overflows
Overflows can only occur when the sign of the two operands is the same. Overflow occurs if the sign
of the result is different from the sign of the operands.
xn-1, yn-1, sn-1 represent the sign of operand x, operand y and result s respectively.
Circuit to detect overflow can be implemented by the following logic expressions:
Computing the add time
19
Consider 0th stage:

S0 is available after 1 gate delay.
c1 is available after 2 gate delays.
Computing the add time of n bit ripple carry adder
Consider a 4 bit ripple carry adder
s0 available after 1 gate delays, c1 available after 2 gate delays.

s3 available after 7 gate delays, c4 available after 8 gate delays
For an n-bit adder, sn-1 is available after 2n-1 gate delays cn is available
after 2n gate delays
Fast adders
One of the main drawbacks of the RIPPLE CARRY ADDER circuit is the expected long delay between
the time the inputs are presented to the circuit until the final output is obtained. This is because of the
dependence of each stage on the carry output produced by the previous stage. This chain of dependence
makes the adders delay. In order to speed up the addition process, it is necessary to introduce addition
circuits in which the chain of dependence among the adder stages must be broken. One fast adder circuit
is carry-look ahead
(CLA) adder
Carry-look-ahead (CLA) adder
C i+1 can be written as
we can write C i+1 as
Where
20
Gi is called generate function and Pi is called propagate function

Gi and Pi are computed only from xi and yi and not ci, thus they can be computed in one gate delay
after X and Y are applied to the inputs of an n-bit adder.
A simpler circuit can be realized as
Which differs from
only when x i=y i =1.
Thus using a cascade of 2 -two input XOR gate to realize sum the basic cell shown below can be used in
each bit stage
Expanding ci in terms of i-1 subscripted variables and substitute it in c i+1

Expression
Continuing in this way the final expression for any carry variable is
All carries can be obtained 3 gate delays after X, Y and c0 are applied.
-One gate delay for Pi and Gi
21
-Two gate delays in the AND-OR circuit for ci+1

All sums can be obtained 1 gate delay(XOR gate after the carries are computed. Independent of n, n-bit
addition requires only 4 gate delays.
This is called Carry Lookahead adder
C4 is available after after 3 gate delays and S3 after 4 gate delays where as a 4 bit ripple carry adder C4 is
available after 8 gate delays and S3 after 7 gate delays. Performing n-bit addition in 4 gate delays
independent of n is good only theoretically because of fan-in constraints.
Last AND gate and OR gate require a fan-in of (n+1) for a n-bit adder.
of 5 is required.
For a 4-bit adder (n=4) fan-in
Practical limit for most gates.

In order to add operands longer than 4 bits, we can cascade 4-bit Carry- Lookahead adders. Cascade of
Carry-Lookahead adders is called Blocked
Carry-Lookahead adder.
Figure shows a 16 bit adder built from 4 bit adders
22
Blocked Carry-Look ahead adder

In the first block
And
C16 is obtained as
After xi, yi and c0 are applied as inputs:

Gi and Pi for each stage are available after 1 gate delay.
PI is available after 2 and GI after 3 gate delays. All carries are available after 5 gate delays.
c16 is available after 5 gate delays.
s15 which depends on c12 is available after 8 (5+3)gate delays (Since for a 4-bit carry look ahead adder,
the last sum bit is available 3 gate delays after all inputs are available)
Multiplication
Multiplication of unsigned numbers
Consider nxn multiplication
23
Product of 2 n-bit numbers is at most a 2n-bit number. Unsigned multiplication can be viewed as
addition of shifted versions of the multiplicand.
Multiplication involves the generation of partial products, one for each digit in the multiplier. These
partial products are then summed to produce the final product.
The partial products are easily defined. When the multiplier bit is 0, the partial product is 0.When the
multiplier is 1, the partial product is the multiplicand.
The total product is produced by summing the partial products. For this operation, each successive
partial product is shifted one position to the left relative to the preceding partial product.
Array multiplier
Multiplicand is shifted by displacing it through an array of adders.
24
Where each multiplier cell is given as
Array multipliers are:

Extremely inefficient.
Have a high gate count for multiplying numbers of practical size such as 32-bit or 64-bit numbers.
Perform only one function, namely, unsigned integer product.
Assuming that there are 2 gate delays from input to output of a full adder block the worst case signal
propagation delay path (right end of first row to highest product bit output at the left end ,comprising all
cells in bottom row and two cells in right end of all other rows) has a total of 6x(n-1)-1 gate delays
including initial and gate delays in each cell. Since incoming partial product of first row is 0,only AND
gates are required which is included in delay expression.
Sequential multiplication
In this case, multiplication is performed as a series of (n) conditional addition and shift operations such
that if the given bit of the multiplier is 0 then only a shift operation is performed, while if the given bit of
the multiplier is 1 then addition of the partial products and a shift operation are performed.
Register configuration in sequential multiplier
25
Flow chart for unsigned binary multiplication
26
The multiplier and multiplicand are loaded into two registers (Q and M).
A third register, the A register, is also needed and is initially set to 0. There is
also a 1-bit C register, initialized to 0, which holds a potential carry bit resulting
from addition. The operation of the multiplier is as follows. Control logic reads
the bits of the multiplier one at a time. If is 1, then the multiplicand is added to
the A register and the result is stored in the A register, with the C bit used for
overflow.Then all of the bits of the C, A, and Q registers are shifted to the right
one bit, so that the C bit goes into goes into and is lost. If is 0, then no addition
is performed,just the shift.This process is repeated for each bit of the original
multiplier. The resulting -bit product is contained in the A and Q
registers.Example is shown below
Signed Multiplication
Considering 2s-complement signed operands, what will happen to (-13)
if following the same method of unsigned multiplication
(+11)
27
For a negative multiplier, a straightforward solution is to form the 2scomplement

of both the multiplier and the multiplicand and proceed as in
the case of a positive multiplier.
This is possible because complementation of both operands does not
change the value or the sign of the product.
A technique that works equally well for both negative and positive
multipliers Booth algorithm.
Booth algorithm
28
The multiplier and multiplicand are placed in the Q and M registers,

respectively. There is also a 1-bit register placed logically to the right of the
least significant bit of the Q register The results of the multiplication will appear
in the A and Q registers. A and Q-1 are initialized to 0. Control logic scans the
bits of the multiplier one at a time.Now, as each bit is examined, the bit to its
right is also examined. If the two bits are the same (11 or 00), then all of the
bits of the A,Q, and Q-1 registers are shifted to the right 1 bit. If the two bits
differ, then the multiplicand is added to or subtracted from the A register,
depending on whether the two bits are 01 or 10. Following the addition or
subtraction, the right shift occurs. In either case, the right shift is such that the
leftmost bit of A, namely A n-1not only is shifted into A n-2 but also remains in
A n-1.This is required to preserve the sign of the number in A and Q. It is known
as an arithmetic shift, because it preserves the sign bit.
Booth multiplier recording table
29
In general, in the Booth scheme, -1 times the shifted multiplicand is

selected when moving from 0 to 1, and +1 times the shifted multiplicand is
selected when moving from 1 to 0, as the multiplier is scanned from right to
left.
Booth multiplication with a positive multiplier
Consider in a multiplication, the multiplier is positive 0011110
Multiplier 0 0 1 1 1 1 0
Booth recorded multiplier 0 +1 0 0 -1 0
Booth multiplication with negative number
30
Booth recorded multiplier

Best case a long string of 1s (skipping over 1s)
Worst case 0s and 1s are alternating
Fast Multiplication
Bit-Pair Recoding of Multipliers
Bit-pair recoding halves the maximum number of summands (versions of
the multiplicand).
Bit pair recording is derived from booth multiplier scheme
31
Multiplier bit pair recording

Example of bit Pair recording of multiplier 11010
Example multiplication using bit pair recording
Using booth multiplication
32
Hence using bit pair recording no;of summands is reduced by n/2,where n

no:of bits in multiplier.
Carry-Save Addition of Summands
CSA speeds up the addition process
Consider the addition of many summands, we can:

Group the summands in threes and perform carry-save addition on each of these groups in parallel to
generate a set of S and C vectors in one fulladder delay
Group all of the S and C vectors into threes, and perform carry-save addition on them, generating a
further set of S and C vectors in one more full-adder delay
Continue with this process until there are only two vectors remaining They can be added in a RCA or
CLA to produce the desired product A multiplication example used to illustrate carry-save addition
33
Multiplication using normal scheme
34
Multiplication using CSA of summands

Schematic representation of carry saves addition operations
The outputs S4 and C4 from third CS level are available 6 gate delays later assuming two gate delays per
CSA level.The final two vectors can be added in further 8 gate delays using carry look ahead adder.The
total gate delay is there fore 15.by comparison total gate delay in performing this multiplication using
nxn array multiplier is 6(n-1)-1=29( substituten=6).in general 1.7 log2 K - 1.7 t levels of CSA steps are
needed to reduce K summands to 2 vectors which when added produce desired sum.
Issues with carry save method
If negative summnds are involved it is necessary to accommodate sign extension
2n bit CLA is needed to add final S and C vectors.Fewer bits are actually needed.
n summands are used for nxn multiplication. If bit pair recording is used this will be reduced to
n/2.This reduces no:of CSA levels from 1.7 log2 n -1.7 to 1.7 log2n-3.4
Integer division
Manual Division
Steps for Manual Division

Position the divisor appropriately with respect to the dividend and performs a subtraction.
If the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended by
another bit of the dividend, the divisor is repositioned, and another subtraction is performed. If the
35
remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back the
divisor, and the divisor is repositioned for another subtraction.
Restoring Division
Circuit arrangement for restoring division

The divisor is placed in the M register, the dividend in the Q register. At each step, the A and Q registers
together are shifted to the left 1 bit. M is subtracted from A to determine whether A divides the partial
remainder.3 If it does, then gets a 1 bit. Otherwise, gets a 0 bit and M must be added back to A to restore
the previous value. The count is then decremented, and the process continues for n steps. At the end, the
quotient is in the Q register and the remainder is in the A register.
Steps
Repeat these steps n times
Shift A and Q left one binary position
Subtract M from A, and place the answer back in A
If the sign of A is 1, set q0 to 0 and add M back to A (restore A);
otherwise, set q0 to 1
Flow chart summarizing the restoring method
36
Example for restoring method
37
Non-restoring Division
Avoid the need for restoring A after an unsuccessful subtraction.
Step 1: Do the following n times
If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A and
Q left and add M to A.
Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
Step2: If the sign of A is 1, add M to A
Example
38
Floating point numbers

Fixed point numbers:Binary point is fixed
Representation of n bit binary fraction
B=b0.b-1b-2b-3..b-(n-1)
In 2s complement system signed value F is given by
F(B)=-b0x20+ b-1x2-1+ b-2x2-2.+ b-(n-1
)x2-(n-1)
Where the range of F
-1<-F<=1-2-(n-1)
Fixed point representation suffers from a drawback that the representation can only represent a finite
range (and quite small) range of numbers.
Floating point numbers: Binary point is said to float or it is variable. Floating point representation is
one in which a number is represented by its sign ,a string of significant digits commonly called mantissa
and an exponent to an implied base for the scale factor.When decimal point is placed to the right of first
nonzero significant digit, number is said to be normalized.
Eg;
6.0247x1023
significant digits-5
scale factor-1023
39
A sample representation of 32 bit number

IEEE notation
IEEE Floating Point notation is the standard representation in use. There are two representations:
- Single precision.
- Double precision.
Both have an implied base of 2.
Single precision:
- 32 bits (23-bit mantissa, 8-bit exponent in excess-127 representation) Double precision:
- 64 bits (52-bit mantissa, 11-bit exponent in excess-1023 representation) Fractional mantissa, with an
implied binary point at immediate left
IEEE notation assumes that all numbers are normalized so that the MSB of the mantissa is a 1 and does
not store this bit.
So the real MSB of a number in the IEEE notation is either a 0 or a 1.
The values of the numbers represented in the IEEE single precision notation are of the form:
The hidden 1 forms the integer part of the mantissa.

excess-127 and excess-1023 (not excess-128 or excess-1024) are used to represent the exponent.
In the IEEE representation, the exponent is in excess-127 (excess-1023) notation. The actual exponents
represented are
In single precession case normalized representation requires an exponent less than -126 or greater than
127.In the first case underflow occurred and in second case an overflow occurred.This is because the
IEEE uses the exponents -127 and 128 (and -1023 and 1024), that is the actual values 0 and 255 to
represent special conditions:
- Exact zero
- Infinity

Computer Architecture Lecture Notes

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Computer Architecture Lecture Notes

Încărcat de

Drepturi de autor:

Formate disponibile

1

Control sequence for execution of this instruction is shown below.

The datapath often consists of the following functional blocks

The Instruction register stores the current instruction to be executed.

Generation of END control signal

Basic organization of micro programmed control unit

Arithmetic and logic design

Addition/subtraction of signed numbers

Addition logic for a single stage

Computing the add time

Consider 0th stage:

s0 available after 1 gate delays, c1 available after 2 gate delays.

C i+1 can be written as

we can write C i+1 as

Gi is called generate function and Pi is called propagate function

Which differs from

only when x i=y i =1.

Expanding ci in terms of i-1 subscripted variables and substitute it in c i+1

-Two gate delays in the AND-OR circuit for ci+1

For a 4-bit adder (n=4) fan-in

Practical limit for most gates.

Blocked Carry-Look ahead adder

After xi, yi and c0 are applied as inputs:

Where each multiplier cell is given as

Array multipliers are:

Flow chart for unsigned binary multiplication

For a negative multiplier, a straightforward solution is to form the 2scomplement

The multiplier and multiplicand are placed in the Q and M registers,

In general, in the Booth scheme, -1 times the shifted multiplicand is

Booth multiplication with negative number

Booth recorded multiplier

Multiplier bit pair recording

Example multiplication using bit pair recording

Using booth multiplication

Hence using bit pair recording no;of summands is reduced by n/2,where n

Consider the addition of many summands, we can:

Multiplication using normal scheme

Multiplication using CSA of summands

Steps for Manual Division

Circuit arrangement for restoring division

Example for restoring method

Floating point numbers

A sample representation of 32 bit number

The hidden 1 forms the integer part of the mantissa.

S-ar putea să vă placă și