Documente Academic
Documente Profesional
Documente Cultură
Computer organization – is the operation unit and their interconnection that realize the
architecture specification. Example: hardware detail transparent to the programmer, such
as control signal, the interface between the computer and the peripheral, and the memory
technology used.
For example, an architecture issue is that whether a system will have the multiply
instruction. It organization issue is whether the instruction will be implement using the
multiply unit or by the mechanism that make repeated used of the add unit of the system.
Computer structure – is the way that all the components are interrelated.
-data processing
-data storage
-data movement
-control
Center Processing Unit (CPU) – is used to control the operation of a computer and
performed its data processing functions, also referred as a processor.
I/O – move the data between the computer and its external environment.
System Interconnection – is the mechanism that provides the communication among the
CPU, main memory, and the I/O devices. A system bus is an common example of system
interconnection.
4) List and briefly describe the structural components of a processor.
Arithmetic Logic Unit (ALU) – Perform the data processing function of a computer.
CPU Interconnection – is the mechanism that provides the communication between the
control unit, arithmetic logic unit (ALU) and the register.
Chapter 2:
Von Neumann architecture is a design modal that using the stored-program concept to
design the computer that the computer should have the processor separate from their
memory, this allowed the processor have the free time to perform other action while
waiting for memory to return the value. So, the Von Neumann architecture computer
using a central processing unit (CPU) and a single separate storage structure (memory) to
hold both the instruction and data. The computer could get its instruction by reading them
from memory and the program can set or altered by setting the portion of the value of a
portion of memory.
2) Discuss the processing techniques used to maximize the high raw frequency of modern
processors.
Branch prediction – the processor will look ahead in the instruction code that fetch from
memory and predict which branch or group of instruction will be processed next. The
processor predict not just the next branch but multiple branch ahead, thus the branch
prediction will increase the amount of work available for the processor to execute.
Data flow analysis – the processor will analysis which instruction are dependent on each
other, either depend on the result or the data, this will optimized the schedule of
instruction. To prevent unnecessary delay, the instructions are scheduled to be executed
when ready, independent of the original program order.
Speculative execution – using the branch prediction and data flow analysis, some
processor speculatively execution instruction ahead of their actual appearance in the
program execution, holding the result in a temporary locations. This enables the
processor keep its execution engines as busy as possible by executing instructions that are
likely to be needed.
3) Discuss the solutions that can be taken to balance the performance of logic circuits
(processors) and memory circuits (memory devices).
-Increase the number of bits that are retrieved at one time by making DRAMs “wider”
rather that “deeper” and by using wide bus data paths.
-Change the DRAM interface to make it more efficient by including a cache on the
DRAM chip.
-Reduce the frequency of memory access by using the complex and efficient cache
structure between the processor and memory.
-Increase the interconnect bandwidth between the processor and memory by using
higher-speed buses and by using the hierarchy of buses to buffer and structure data flow.
Increase the size and speed of the cache that are interposed between the processor
and the main memory. If the cache is a portion of the processor chip, the cache
access time will also drop significantly and this will increase the processor
performance.
Make change to the processor organization and architecture that increase the
effective speed of instruction execution. Typically this involves using parallelism
in one form or another.
5) Define the terms clock cycle, instruction cycle, cycle per instruction (CPI), millions of
instructions for second (MIPS) and millions of floating-point operation per second
(MFLOPS).
A benchmarking is the act of running a same set of program or operation on the different
machine to compare the execution times for checking the performance.
Amdahl’s Law is use to find the maximum expected improvement to an overall system
when only a part of the system is improve.
For example, a task makes extensive use of the floating point operation with 40% of the
time is consumed by floating point operation, with the new hardware design the floating
point module is speedup by a factor of K, then the overall speed up is
=
Thus, independent of the K, the maximum speed up is 1.67.
Chapter 3:
The basic execution flows of the instructions can be divided into 2 steps, that is
instruction fetch and instruction execute. At the beginning of each instruction, the
processor will fetches an instructions from the memory, and a register call program
counter (PC) is used to address the next instruction to be fetch, each time a instruction is
fetch the program counter will automatic increment by 1. The instruction fetching form
the memory will load into a register call the instruction register (IR) that use to decode
the instruction to specify the action that the processors required to be execute.
3) List all commonly known system registers and briefly describe their purposes.
Interrupt provide the primarily as a way to improved the processing efficiency. For
example, most of the external devices are much slower than the processor. If processor is
transferring the data to the printer, the processor need to pause and remain idle until the
printer can catches up, at this time the original program of the processor is stop until the
printer can receive all the data. The duration of waiting is very wasteful use of the
processor. By using interrupt, while the printer is received the data for printing, the
processor can continue the current program while waiting the printer to receive the data,
when the printer is ready to receive more data from the processor, the I/O module for the
external device will send an interrupt request to the processor and the processor will
suspending operation of the current program, branching off to a program to serve that
particular I/O device and resuming the original execution after the device is serve.
Bus Type – divided into 2 types, dedicated and multiplexed. For multiplexed bus type,
the data and the address will share using the same bus to perform the operation by using
some controller that can separate the control the flow of content for the bus either is data
or addresses, this method will save the cost because less bus is use but the circuit will
become more complex. For the dedicated bus type, all the data and address is using
separate bus so the circuit become bigger and the cost also become expensive, but the
dedicated bus type can perform the operation fast because the bus is not share by other.
Method Of Arbitration – divided into 2 types, centralized and distributed. For centralized
that is using a single hardware device call bus controller or arbiter to control the module
for using the bus. For the distributed method, each module contains the access control
logic and all modules act together to share the same bus. Both method is purpose to
designate which module is act as master and which module is act as slave, with the
master will initiate the data transfer(read or write) to the slave.
Timing – divided into 2 types, synchronous and asynchronous. For synchronous timing,
the occurrence of events is determined by a clock and all the events start at the beginning
of the clock. For asynchronous timing, the occurrence of one event is depends on the
occurrence of a previous events.
Bus width – The wider the data bus, the faster the performance because more bits can be
transfer at one time. The wider the address bus, the bigger the capacity means the bigger
range of the location can be referenced.
Data Transfer Type – all type of bus also supports the read and writes operation. There is
also some combinational operation that support by some busses, that is read-modify-write
and the read-after-write process, the whole operation is typically indivisible to prevent
any other module to share the same memory resources.
8) Give you own opinion on why PCI maintained 32 bit bus as a default configuration.
9) Consider a generic bus controller that supports 4 masters and 32 clients. Discuss the
mechanism that you will use to handle the arbitration for multiple bus request from 3 bus
masters.
10) Describe the RS-232 and USB2.0 interfaces. Specify the relative implementation
cost, the supported data transfers, the maximum cable length, and the interface pins.
Chapter 4
1) List out the key characteristics of computer memory systems and give examples of
each characteristic.
Unit of transfer – is equal to the number of electrical lines into and out of the memory
module. Normally equal to the word length but is often larger such as 64,128 or 256 bits,
also refer as a block of data.
Method of accessing – likes sequential access (example: tapes), direct access (example:
disk unit), random access (example: cache or main memory) and associative (example:
cache memories).
Performance – likes access time, memory cycle times and transfer rate. For the access
time, the time for non-random access is the time takes to position the read and write
mechanism at the desired location. For memory cycle time, is the access time of the
random access memory plus the time required before the second access can started.
The transfer rate is the rate at which a data can be transfer into or out from the memory,
the transfer rate for random access memory is equal to 1/(cycle time).
2) Discuss the access methods used for memory devices and relate them to the physical
design of the devices that is using the method.
There are 3 performance parameter is using, access time, memory cycle time and the
transfer rate.
Access time – for the random access memory, it is the time taken to perform the read or
write operation, that is the time from the instant that an address is presented to the
memory to the instant that data have stored or make available for use. For non random
access, access time is the time taken to position the read – write mechanism at the desired
location.
Memory cycle time – this concept is primarily applied to the random access memory and
consists of the time plus any additional time required before a second access and started.
This additional time may be required for transient to die out on the signal lines or
regenerate the data if they are destructively. Note that the memory cycle time is
concerned with the system bus not the processor.
Transfer rate – this is the rate at which data can be transferred into or out of a memory
unit. For random access memory, it is equal to 1/ (cycle time). For non-random access
memory the following relationship is hold
= + where = average time to read or write N bits.
= average access time
n = number of bits
R= Transfer rate, in bit per second (bps)
4) Describe the memory hierarchy in terms of access time, capacity, cost per bit.
From the memory hierarchy, at the top level of memory hierarchy is the inboard memory
(register, cache and main memory), secondly is the outboard storage (magnetic disk, CD-
ROM, CD-RW, DVD-RW and DVD-RAM) and the lastly is the off-line storage
(magnetic tape). As going down the hierarchy, the access time for the memory increases,
the cost per bit is decreases, the capacity increases and the frequency of the processor to
access the memory decreases.
5) Describe the usage of direct mapping method in a cache implementation. Highlight the
advantages and the disadvantages of this method.
Direct mapping is the simplest technique for maps each block of the main memory into
only one possible cache line. The main memory address can be view as 3 fields (tag, line
and word). Tag is used to keep track the main memory block in the cache and the line is
used to identify the block in cache and the word is use to identify the bytes inside the
block of the main memory. The usage of this direct mapping method in cache will be
increase the performance of the processor because this will reduces the frequency to
access the main memory to get the data, all the data can getting from the cache. The
advantage of direct mapping is that it is easy to implement and the disadvantage of the
direct mapping is that if a program need to reference the words repeatedly from two
different block that map into the same line, then the blocks will be continually swapped
in the cache, and the hit ration will be low, so the performance is not optimal compared to
the other techniques.
In an associative mapping cache, all block of the main memory can be mapped into any
slot of the cache. The mapping from main memory block to cache is performed by
partitioning an address into field for the tag and the word. The tag is an identifier use to
keep-track the block of the main memory in the cache and the word is use to identifier the
byte in the block. When a reference to the main memory is made, the cache will intercept
the reference and searches the cache tag memory to see if the request block is in the
cache. All the tag is search in parallel using an associative memory, if the tag in the cache
tag memory is match the tag field of the memory reference, the word is taken from the
position in the slot specified by the word field, if the reference word not in the cache,
then the block contain the word will be brought into the cache and the reference word
will taken from the cache. The advantage is the flexibility of the cache that a block can be
replace by the new block when read into the cache. The disadvantage is that complex
circuitry required to examine the tags of all cache in parallel.
7) Describe how set associative mapping merges the two methods mentioned above.
Least recently use (LRU) – replace that block in the set that has been in the cache longest
with no reference to it.
First-in-first-out (FIFO) – replace that block in the set that has been in the cache longest.
Least frequently use (LFU) – replace that block in the set that has experienced the fewest
reference.
Chapter 5:
1) Describe the properties of a semiconductor memory cell.
The semiconductor memory cell can be use to represent the binary number of 1 and 0.
The memory cell content can be altered by written into the memory cell to set it and the
content also can be read to sense the state. The cell normally contains 3 functional
terminals capable to carrying an electrical signal. The selects signal that use to select the
cell, the control terminal that specify the operation either is read or write. For writing, the
other terminal will provides an electrical signal to set the state of the cell to 1 or 0 and for
read operation the terminal is use as the output of the cell state.
2) Compare the features of SRAM and DRAM. Discuss their usage in a computer system.
DRAM – is an analog device that use to store data, it store the data as a charge at the
capacitor. DRAM required a periodic charge refreshing to maintain data storage because
the capacitors have a natural tendency to discharge.
SRAM – is a digital devices that use to store data, the data is stored by using the
traditional flip-flop logic gate, the SRAM will hold the data as long as the power is
supply to it.
Both the DRAM and SRAM are volatile, that is the power is required to preserve the bit
values. DRAM is simpler, smaller and less expensive than SRAM but it required the
refresh circuitry. For larger memories, the fixed cost of the refresh circuitry is more than
compensated for by the smaller variable cost of DRAM cells. Thus DRAM is tend to be
favored for large memory requirement. SRAM is generally faster than the DRAM so it is
use for cache memory (both on and off chip) and DRAM is used for main memory.
Chapter 6:
1) Describe RAID technology. List out all levels and highlight how RAID 0 is different
from others.
2) Describe how data can be organized on a circular physical storage device such as the
magnetic disc and the optical disc.
For optical disc, the data are organized as a sequence of blocks. The block consists of the
following fields.
-Sync is use to identified the beginning of a block. It consist a bytes of all 0s, 10 bytes of
all 1s and a byte of all 0s.
-Header – contain the block address and the mode byte. Mode 0 mean the blank data,
mode 1 mean use of an error-correcting code and 2048 bytes of data, mode 2 mean 2336
bytes of data without using the error-correction code.
-Data – is the user data.
-Auxiliary – additional user data in mode 2. In mode 1, this is an error-correcting code.
Chapter 7:
1) Discuss why computer peripherals are not connected directly to the system bus, thus
requiring separate I/O interface.
-There are a lot of peripheral devices with various method of operation. It would be
impractical to incorporate the necessary logic within the processor to control a range of
devices.
-The data transfer rate of the peripheral is slower than that of the memory or the
processor. It is impractical to using the high-speed system bus to communicate directly
with the peripheral.
-There would have some other peripheral with the higher transfer rate than the memory or
processor. Again, the mismatch would lead to inefficiencies if not managed properly.
-peripheral often use different data format and word length that the computer to which
they are attached.
2) Describe the main I/O techniques that can be used to implement I/O interfaces.
Programmed I/O – when the processor is execute a program and the program involve the
I/O operation, the processor will direct control the I/O operation by sending a I/O
command to the I/O module and the processor must be wait until the I/O operation is
complete. This process will waste a lot of time if the processor is faster than the I/O
module because the I/O module will not alert the processor when the operation is
complete, it is the responsibility of the processor to check the status of I/O module until it
find the operation is complete.
Interrupt-driven I/O – when the processor issue a command to I/O module, it can
continue to execute other instruction and the processor will be interrupted by the I/O
module when the I/O device is ready for use. This technique is more efficient than the
programmed I/O because eliminates the unwanted waiting time of the I/O by the
processor. However still consumed a lot of processor time because every process must
pass through the processor.
Direct Memory Access (DMA) – transfer the data between the I/O devices and the
memory without passing through the processor and the processor only involve in the start
and the end of the I/O data transfer. At the beginning of a process if required the data
transfer between the I/O devices, the processor will issue a command to the DMA by
providing the request (read or write), address and the data size to the DMA and the DMA
will instead the processor to do the data transfer between the I/O and the memory and
when the data transfer is finish, the DMA will send an interrupt signal to the processor
and is ready for other I/O transfer.
Chapter 9:
1) Discuss all the possible 8-bit representations of a signed integer. Is it possible to have a
signed integer representation that include infinity?
The possible range for the 8-bit representation of a signed integer is in the range - -1
to – 1, where n is the number of bit for a signed integer, the range is -127 to 127.
Because the first bit or the MSB is represent the sign of the number, so the range of the
integer is determined by the remaining 7 bit of the signed integer.
To represent a signed integer as an infinity, it depend on the range of the infinity and the
number of bits that can use to represent a signed number, but normally it is not possible
to represent a signed number as infinity number.
2) Describe two logic circuits that can be used to implement integer multiplication. Show
them in block diagrams.
-It can used to perform the multiplication of any 2 integer (positive and negative integer).
-multiplicand and multiplier are placed in the M and Q register.
-a 1 bit register is placed logically to the right of the LSB bit of the Q register ( ) and
designate .
-another 1 register A will used together with register Q to stored the result of the
multiplication of the 2 two complement number.
-A and are initially to 0 and the multiplication operation is started.
-control logic will check the bit value for and ,
If the 2 is same (1-1 or 0-0) then all bit of A, Q and register is shift to the
right 1 bit.
If the 2 is 1-0, then A will subtract with M and the result is stored into A, all the
bit of A, Q and register is shift to the right 1 bit.
If the 2 is 0-1, then A will added with the M and the result is stored into A, all the
bit of A, Q and register is shift to the right 1 bit.
-For the shift right operation, the MSB of A ( ) not only shifted to and also
remains in . There is no need the overflow bit for addition operation and the borrow
bit for subtraction operation.
Booth algorithm is a multiplication algorithm that multiplies two signed binary number in
two’s complement notation. It performs fewer additions and subtractions than a
straightforward algorithm. It can use to multiple 2 positive number, 2 negative number or
either 1 side is negative. The result is in two complement form either is positive or
negative depend on the sign of the multiplicand and the multiplier.
Booth algorithm using this method to perform the multiplication operation, subtraction is
perform when the first 1 of the block is encountered (1-0) and addition is perform when
the end of the block is encountered (0-1). For example,
M x (01111010) = M x ( + + + + )
=Mx( - + - )
Bit 7 and bit 6 is (0-1) mean addition (+ ), bit 3 and bit 2 (1-0) mean subtraction (- ),
bit 2 and bit 1 (0-1) mean addition (+ ) and the bit 1 and bit 0 (1-0) mean subtraction (
).
There are two IEEE standards for floating points, that is single precision (32-bits) and
double precision (64-bits).
-For the 32bit floating number, the most significant bit (MSB) is the sign bit. The 8 bit
after the MSB is the bit used to represent the biased exponent, and the remaining 23 bit is
used to represent the significant of the floating point number.
-For the 64 bit floating number, the most significant bit (MSB) is the sign bit. The 11 bit
after the MSB is the bit used to represent the biased exponent, and the remaining 52 bit is
used to represent the significant of the floating point number.
-The sign bit for both is the same, 0s represent positive and 1s represent negative.
-For the exponent of a floating point number, it exponent is biased exponent, mean that
the biased value for the exponent can use to represent a positive and negative value. The
biased numbers that add to the original exponent to become a biased exponent can be
determined by the formula -1, where n is the original bit number of the exponent. If
n=8 then the biased value is 127 and the range for the biased number is from 0 to 255,
therefore the range for the exponent can show is from 0 – 127 to 255 – 127, that is from
-127 to 128.
-For a normalized floating point number, there is an addition 1 bit that will not show in
the floating point and we must consider it when the operation is performs. That bit is 1s
in front of the radix point, because for the normalized value the first bit before the radix
point is always 1 and for this reason that bit is no need to include in the floating point
format.
-For a de-normalized number , the bit before the radix point is not 1 show we need to
shift the radix point to the right until the first 1 is find and the number of bit shifted will
be balanced by added that number into the exponent of the floating point number.
-not all the bit patterns in the IEEE format are interpreted in the usual way, there have
some bit value use to represent special values. For example
If the biased exponent and the significant are all 0s, then it is a value represent 0
and the sign bit will use to represent the sign of the value 0.
If the biased exponent is all 1s and the significant is all 0s then the value is an
infinity value and the sign but will used to represent the sign of the infinity value.
If the biased exponent is all 1s and the significant is not 0s, weather the sign bit is
1 or 0, the number showed is not a number (NaN).
If the biased exponent is 0s and the significant is not 0s, then the number is a de-
normalized number and it sign is depend on the sign bit.
For example, by using a 32 bit binary number that use to represent a fixed-point number
and floating point number, the total different number that can be represent by these 2
representation is the same, that is different number but the range of the value their
showing is different.
-For a fixed-point representation likes two complement value, the range is from to
-1.
-For a floating point representation, the range of the value can be represent depend on the
sizes of the exponent, for a 8 bit biased exponent, the range can be show is as follow,
Negative number between – (2 )x and -
Positive number between and (2 )x
The range can be change become larger by modify the number of bit that use to represent
the exponent, but the precision of the number is reduces because as the exponent bit
increases the significant bit in the floating representation is decreases.
Chapter 10:
Machine instruction is a set of information that required by the processor to execute the
operation. The machine instruction is different for each different operation.
Element of a machine instruction:
-Operation code: specify the operation to be performed (example: ADD, MOVE). The
operation is specified by the binary code, known as the operation code or opcode.
- Source operand reference: The operation may involve one or more sources operand, the
operand is the inputs for the operation.
-Result operand reference: The operation may produce a result.
-Next instruction reference: This tells the processor where to fetch the next instruction
after the execution of this instruction is complete.
3) List and briefly explain the main issues of instruction set design.
-Operation repertoire: how many and which operation to provide, and how complex the
operation should be.
-Data types: the various types of data upon which operation are performed.
-Instruction format: instruction length (in bit), number of address, sizes of various field
and so on.
- Register: number of processor register that can be referenced by instructions and their
use.
-Addressing: The mode or modes by which the address of an operand is specify.
-Address: a form of data, some calculation must be performed on the operand reference
in an instruction to determine the main or virtual memory.
-Number: is a numeric data that used by the computer to perform the operation, for
example: binary integer or binary fixed point, binary floating point and decimal. The
number that stored in a computer is limited so programmer is faced with understanding
the consequences of rounding, overflow and underflow.
-Character: ACSII key is one type of the data use to represent the character, each
character represented by a unique 7-bit pattern.
-Logical data: is bit-oriented, easy to manipulate the bit of data in this form.
5) Describe the difference between logical shift operation and arithmetic shift operation.
Logical shift operation is the operation that the bit for a word is shifted left or right. On
one end the bit shifted out is lost, on the other end, a 0s is shifted in.
Arithmetic shift operation treat the data as signed integer and does not shift the sign bit.
On the right arithmetic shift, the sign is replicated into the bit position to its right. On the
left arithmetic shift, a logical left shift is performed on all bits but the sign bit, which is
retained.
6) List and briefly describe the types of operations that a processor has to handle.
There are a number of different opcode varies widely from machine to machine, however
the same general types of operation are found on each machine and list as below:
Stack is a location in the computer that used to reserve the data or storing the data
temporary. Stack operation will follow the last-in-first-out (LIFO) method, it mean that
only one item can be accessed at one time and that item always is at the top of the stack.
-The stack is use by the computer to stored some data temporary,
For example when the processor execute the push instruction, an item will be
stored to the top of the stack and when a pop instruction is execute, the item on
the top of the stack is remove. The operation of a computer will depend on the
flow of the program, and the flow of the program will always not follow the
sequences, it maybe sometimes jump to other location to execute and return to the
location of the jump to execute the remaining instruction. For this reason, when
the flow of the program is jump to another memory location to execute, the
address that need to return is required to stored, so it is stored in the stack
temporary while execution of branch instruction.
-The stack can use to perform some operation (unary and binary) that involves zeros
addressing.
Unary operation is involves the use of the element of the top of stack as one
operand, for example the NOT operation, the processor will pop that operand out
and perform the NOT operation on it and the data will be push back into the stack.
Binary operation is involves the use of the top 2 stack item as operands, for
example the add instruction, the processor will pop the top 2 stack item out and
perform the addition operation and the result is push back into the stack.
8) Describe byte ordering concepts of little-endian and big-endian. Discuss the difference
of the two concepts, and its effect in program storage.
The little-endian and big-endian method is used when a number that bigger than one byte
(multiples byte) is needs to store in the memory location, this two method stored the
multiple bytes in different way. Let take an example to store a value 123456 in hex
decimal form into the memory location start at 100.
For little-endian concepts, the least significant bytes are stored in the lowest
numerical bytes address. That is the 56h stored in the address 100, 34h stored in
the address 101 and the most significant bytes is stored in the address 102.
For big-endian concepts, the most significant bytes are stored in the lowest
numerical bytes address. That is the 12h stored in the address 100, 34h stored in
the address 101 and the least significant bytes is stored in the address 102.
-For little-endian method, it is easy to check the number either is odd number or even
number because the first bytes read is always the least significant bytes. Because this
method put the number backwards, so it allow us to extend the sizes of the number to the
limits of the memory without actually changing it values, for example, “21 43”, “21 43
00” and “ 21 43 00 00” are all the same number because the little-endian will read the
word with least significant bytes first.
-For big-endian method, it is easy to check the sign of the number either is positive or
negative because the first value it read is the most significant bytes. The big-endian
method will stored the number as the same way as our human think about the data, so this
make the low-level debugging easier.
9) Give an opinion on implementing the above concepts for bit ordering.
Chapter 11:
2) Discuss how those addressing techniques are related to the size of an instruction.
For a fixed-length instruction there is clearly a trade-off between the numbers of opcodes
and the power of the addressing capability. More opcodes mean more bits in the opcodes
field and this will reduce the number of bits available for the addressing.
Chapter 12:
-User-visible register is one set of register that can be reference by the user or machine to
minimize the main memory reference and optimized the use of register.
-Control/status register is one set of register that is not visible to user and used by the
control unit to control the operation of the processor.
By using the pipelining strategy, the instruction can be execute in parallel instead of
executes in sequences. For an instruction execution, it can normally divide into two
stages, fetch instruction and execute instruction. There are times during execution of an
instruction when the main memory is not being accessed. This time could be used to fetch
the next instruction in parallel with the execution of the current one. This will save the
time waiting time for the next instruction fetch. This two stages pipelining method also
not maximum improved the performance of the processor, because the time using for
fetch the next instruction is less than the time use to execute the next instruction. There
still got the time the processor to wait the instruction execution while the parallel process
of fetching the next instruction is finish, so to maximum the performance of the
processor, the pipelining must be decompose into more stage and the various stage will
be of more nearly equal duration. For example, by using a six-stage pipeline the
execution time for 9 instructions can be reduces from 54 time unit to 14 time unit, which
save the 40 time unit.
A pipeline hazards occur when the pipeline, or some portion of the pipeline, must stall
because condition do not permit continued execution. There are 3 types of hazards:
resource, data, and control.
Resource hazards – occurs when 2 instructions that are already in the pipeline need the
same resources. At this time the instruction need to be executed in series rather than
parallel for a portion of the pipeline. It sometimes also refers as structural hazard. For
example, a main memory has a single port and that all the instruction fetches, data reads
and writes must be performed at one time. So if an instruction required to fetch the
operand from the main memory rather than in the register, then it should be fetch the
operand from the main memory and next instruction in the pipeline if required the use
from memory, it should include a idle stage wait for the before instruction to take the
operand first and then continue it stage as normal. Solution for resources hazards is to
increases the available resources, such as having multiple ports into the main memory.
Data hazards – A data hazards occur when there is a conflict in the access of an operand
location. For example, we can state the hazards in this form, two instructions in a
program are to be executes in sequences and both access a particular memory or register
operand. If the two instructions are executed in strict sequence, no problem to occur but if
the two instructions are execute in a pipeline, then it is a possible for the operand value to
be updated in such a way as to produces a different result than would occur with strict
sequential execution. In other word the program will produces an incorrect result because
of the using of pipeline.
5) Describe branch prediction. Illustrate the flow of branch prediction using taken/not
taken switch.
Branch prediction is the techniques can be used to predict whether a branch will be taken.
Branch prediction using taken /not taken switch is dynamic branch prediction, because it
depend on the execution history. For example, one or more bit can be associated with
each conditional branch instruction that reflect the recent history of the instruction, these
bit can be referred to as a taken/not taken switch that directs the processor to make a
particular decision next time the instruction is encountered.
From the branch prediction flowchart, as long as each succeeding conditional branch
instruction that is encountered is taken, the decision process predicts that the next branch
will be taken. If a single-prediction is wrong, the algorithms continue to predict that the
next branch is taken. Only if two successive branches are not taken does the algorithm
shift to the right-hand side of the flowchart. Subsequently, the algorithm will predict that
branches are not taken until two branches in a row are taken. Thus, the algorithm requires
two consecutive wrong predictions to change the prediction decision.
Chapter 13:
The Reduced instruction set computing (RISC) is a CPU design strategy based on the
simplified instructions to provide higher performance if the simplicity enables much
faster execution of each instruction.
The main characteristic of RISC processor is as below:
One machine instruction per machine cycle – A machine cycle is defined to be the
times to fetch two operands form the register, perform ALU operation, and stored the
result in a register.
Register to register – all the operation should be perform between the register and only
the instruction with operation LOAD and STORE accessing memory. This design
features simplified the instruction set and therefore the control unit. With the
optimization of register use, so that frequently accessed operands remain in high-speed-
storage.
Simple addressing modes – all RISC instruction use simple register addressing. Several
additional modes, such as displacement and pc-relative, may be included. Other, more
complex modes can be synthesized in software from the simple ones. Again, this design
features simplify the instruction set and the control unit.
Simple instruction format – there are only a few format are used and the instruction
length is fixed. Field location, especially the opcode are fixed and the fixed fields make
the opcode decoding and register operand accessing can occur simultaneously. The
instruction fetching also can optimized because word-length unit are fetched.
Chapter 14:
True data dependency – also call read after write dependency (RAW).
For example by look at the 2 instruction below:
I1) ADD A, B
I2) MOVE C, A
The second instruction can be fetched and decoded but cannot be execute until the first
instruction is executes. The reason is that the second instruction required the data
produced by the first instruction.
Procedural dependency – the instruction following a branch (taken or not taken) haves a
procedural dependency on the branch and cannot be executed until the branch is
executed.
I1 and I2 is the true data dependency because I2 need the result from the I1. But for I1
and I3, there is no data dependency, if I3 execute to completion prior to I1, then the
wrong value of the R3 will be fetch for execution of the next instruction after I3.
Consequently, I3 must complete after the I1 produce the correct output value. This is the
output dependency between the I1 and I3. A wrong result maybe will be produce if the
instruction execution is in reverse order. To ensure this, issuing the third instruction must
be stalled if its result might later be overwritten by an older instruction that takes longer
to complete.
A superscalar machine can issue several instructions per cycle depend on the degree n of
the superscalar machine. If a superscalar machine with degree 2 mean that it can executes
two instructions per cycle while the superpipelined machine can only be executed 1
instruction on 1 cycle, but the cycle time is depend on the degree m of the superpipelined
machine, for a superpipelined machine with degree 2 mean that it execute an instruction
with the cycle time ½ the cycle time of the base machine.
The simplest instruction issue policy to issue instructions in the exact order that would be
achieved by sequential execution (in-order issue) and write results in that same order (in-
order completion). To guarantee in-order completion, when there is a conflict for a
functional unit or when a functional unit required more than 1 cycle to generate the result,
the issuing of the instructions temporarily stalls.
In-order issue with out-of-order completion policy mean that the execution stage of the
instruction need to follow the sequence of the original program but the write stage or the
completion of an instruction can be different from the original program. Out-of-order
completion is used in scalar RISC processor to improve the performance of instruction
that required multiple cycles. With out-of-order completion, any number of instructions
may be in the execution stage at any one time, up to the maximum degree of machine
parallelism across all the function units. The out-of-order completion required more
complex instruction issue logic than the in-order completion. In addition it is more
difficult to deal with the instruction interrupt and exceptions.
With the in-order issue, the processor will only decode instruction up to the point of
dependency or conflict, no additional instruction are decode until the conflict is resolved.
As a result the processor cannot look ahead of the point of conflict to subsequent
instruction that may be independent of those already in the pipeline and that maybe
usefully introduced into the pipeline. To allow out-of-order issue, it is necessary to
decouple the decoded and executed stage of the pipeline. This one can be done with a
buffer referred to as an instruction window. With this organization, after the processor
has finished decoding an instruction, it is placed in the instruction window. As long as the
buffer is not full the processor can continue to fetched and decode new instructions, when
a functional unit become becomes available in the execution stage, an instruction may be
issue to the execution stage. Any instruction can be issue and depend on two conditions,
first if the particular functional unit that is available and second that is no conflict or
dependency block the instruction. This result of this organization is that the processor has
a look ahead capability, allowing it to identify independent instructions that can brought
into the execution stage.
Chapter 15:
Where Y and Z are new register that need for the proper operation of the ALU. AC is the
accumulator that is always holding one operand for the operation of the ALU and the
other operand is temporary holding by the register Y and the register Z use to holding the
temporary result that produce by the ALU. MBR is the memory buffer register that used
to hold the data to be write to memory or to holding the data read form memory. MAR is
the memory address register, it holding the address for the read and write operation. All
of the components are connect to the same internal bus and the control unit will assert the
control signal to make the micro-operation of an instruction to perform.
Micro-programmed control unit is a control units that generate the control signal
depend on a set of program stored in the control memory. Because that set of program is
describe the behavior of the control unit.
The different between the hardwired control unit and the micro-programmed control
unit:
-A micro-programmed control unit is easy to implement than a hardwired control unit.
-A micro-programmed control unit is easy to modify than the hardwired control unit, for
example an instruction is add to the processor, a simple change in the program for the
micro-programmed control unit can settle the problem while the hardwired control unit is
difficult to change it design.
-A micro-programmed control unit is slower than the hardwired control unit of
comparable technology. Despite this, micro-programmed is the dominant technique for
implementing control unit in pure CISC architecture, due to its ease implementation but
for the RISC processor with the simpler instruction format, then the hardwired control
unit is using.