Sunteți pe pagina 1din 16

Section 3.

3
3.3 RISC Processors
• Over the past 30 years the explosion in device technology (SSI, LSI,
VLSI, .... ) has enabled processors to become increasingly complex.
• Functions that once had to be coded - perhaps as a subroutine - may now
be supported directly in hardware and accessed through a single machine
instruction.
• For example - VAX - 11 series machines had:-
• 304 instructions
• 18 addressing modes
• 20 data types

(CISC : Complex Instruction Set Computer)

Advanced Computer Architecture 1


Section 3.3
• The basic idea behind the migration from software to hardware support for
low level functions has been to provide an instruction set as much as
possible like high-level language instructions.
• The aim has been to reduce the 'Semantic Gap' between high-level
language instructions and low-level machine instructions.
• This approach gives rise to the following problems:-
– Large instruction sets need complex and potentially time consuming
hardware steps to decode and execute them.
– Complex machine instructions may not match high-level language
instructions exactly in which case they may be of little use
– Complex instruction sets present difficulties for compilers:-
• which instructions to choose in a particular set of circumstances
• difficult to optimise
Advanced Computer Architecture 2
Section 3.3
– An instruction set designed with a particular high-level language in
mind may not be particularly suited to a different language.
• The basic design aim of a general-purpose processor designer is to provide
a processor capable of high throughput over a range of applications.
• The traditional wisdom in achieving this goal is to follow the CISC
philosophy.
• The alternative RISC philosophy attempts to balance instruction set
complexity with compiler complexity.
• The common characteristics of a RISC processor are:-
• a load/store architecture
• an easily decoded instruction set
• a large register set
• an emphasis on pipelining efficiency.
• the use of modern compiler optimisation technology
Advanced Computer Architecture 3
Section 3.3
• RISC is a generic term used to refer to a complete processor design
philosophy
a small instruction set is not necessarily a requirement for a processor
to be called a RISC.

3.3.1 Hardware Considerations


• The justification for a RISC approach comes from the observation that the
vast bulk of a computer's time is spent:-
• loading data from memory into registers
• storing data from registers to memory
• performing program branches
• arithmetic operations
• other instructions are used infrequently

Advanced Computer Architecture 4


Section 3.3.1
• Throughput may be improved if:-
– least used instructions removed from the instruction set
– remaining often used instructions made execute as fast as possible
- max 1 clock cycle.
• Instruction will execute faster if a register-to-register architecture is
adopted:-
– Compilers can keep operands that will be reused in registers - only
load and store instructions are permitted to access memory
• Chip area freed used to provide a large register set.
• Instructions executing in one clock cycle remove the need for microcode
and permit simpler and faster hardwired control.
• Multiple-cycle instructions such as floating-point arithmetic are either
executed in software or in a special-purpose coprocessor.
Advanced Computer Architecture 5
Section 3.3.1
• Only relatively simple addressing modes are provided; more complicated
forms can be synthesised from the simple ones.

3.3.2 Software Considerations


• Improving performance in the RISC system is achieved by trading off
complexity across system boundaries
– CISC processors tend to be very complex in comparison to their
compilers
– RISC implementations attempt to balance complexity

• RISC processors aim to complete execution of one instruction every clock


cycle. This is achieved (approached) by:-
– Using an easily decoded instruction set
– An efficient pipelined execution unit.
Advanced Computer Architecture 6
Section 3.3.2
• Assume execution of each instruction involves five steps:-
instruction fetch (IF) instruction decode (ID)
operand fetch (OF) operand execution (OE)
operand store (OS)
• If a series of such instructions are executed in sequence then one
instruction completes every five cycles.
• In pipelined execution one instruction completes every cycle
Sequential Execution

i IF ID OF OE OS

i+1 IF ID OF OE OS

i+2 IF ID OF OE OS

Sequential Execution
Advanced Computer Architecture 7
Section 3.3.2 Pipelined Execution

i IF ID OF OE OS

i+1 IF ID OF OE OS

i+2 IF ID OF OE OS

Pipelined Execution
• A conventional pipeline can breakdown for a number of reasons:-
– A branch instruction is encountered, if taken the pipeline must be
cleared and new instructions fetched.
– Data used by an instruction is dependent on the result of the previous
instruction; the instruction can’t proceed until the data is valid.
– Instructions in the pipeline require access to the same resource - the
memory bus, the register, or the ALU - simultaneously.
Advanced Computer Architecture 8
Section 3.3.2
• In such situations a 'pipeline hazard' or 'bubble' is introduced into the
pipeline reducing the average instruction execution rate
Pipelined with Data Interlock

Data dependency
ADD ID B,C + A

INC ID Bubble A +1 A

Pipelined with Branch Interlock


Branch address
JMP ID PC + PC dependency

Bubble IF ID OF OE OS

Bubbles in a Pipeline
Advanced Computer Architecture 9
Section 3.3.2
• Conventional designs solve these pipeline problems with special hardware
- RISC designs attempt to avoid the introduction of the bubble in the first
place.
• RISC systems use an optimising compiler to rearrange programs and
schedule instructions so that they don't interfere with one another
• We consider how an optimising compiler might reduce the size of the
bubble in the branch instruction
Delayed Branch Instruction
• Introduce a delayed branch instruction to the architecture:-
– the instruction following the branch is always executed
– control is then transferred to the branch destination
– this gives the CPU time to fetch the proper instruction from the
destination and start it through the pipeline.
Advanced Computer Architecture 10
Section 3.3.2
Address Normal Branch Delayed Branch Optimised
Delayed Branch
100 LOAD X,A LOAD X,A LOAD X,A
101 ADD 1,A ADD 1,A JUMP 105
102 JUMP 105 JUMP 106 ADD 1,A
103 ADD A,B NO-OP ADD A,B
104 SUB C,B ADD A,B SUB C,B
105 STOR A,Z SUB C,B STOR A,Z
106 STOR A,Z

Traditional Branch v Delayed Branch


Advanced Computer Architecture 11
Section 3.3.2
• Only instructions that should be executed should the branch be taken or
not can safely be put in the slot immediately following a branch:-
– about 70% of the time the position can be filled with an instruction
that will be required whether the branch is taken or not
– about 20% of the time it is filled by an instruction that does not
advance the state of the computation
– about 10% of the slots are filled with no-operation instructions,
• Since about 20% of all instructions executed by a 'typical' program are
branches, the delayed branch in conjunction with an optimising compiler
yields a gain of about 15% in overall execution speed.

Advanced Computer Architecture 12


Section 3.3.2
Branch Destination Prediction
• Maintain a Branch Target Buffer which records the destinations of a
branch on the most recent occasions it was executed.
• Use this to predict destination for next execution of branch, and so
fetch probable next instruction after branch from predicted destination
address to start.

3.3..3 Problems With Risc


• RISC processors have several potential weaknesses that must be addressed
in real, commercial applications.

• Memory bandwidth requirements for RISC processors are very high - an


instruction bandwidth of several 100s Mbytes/s is typical

Advanced Computer Architecture 13


Section 3.3.3
• The fastest RISC machines also need a data word plus an instruction in the
same clock, doubling the required memory bandwidth

• Normal semiconductor main memories cannot provide such bandwidth so


RISC processors need cache memories for instructions and data

• Some RISC implementations use separate instruction and data buses -


Harvard architecture

• The separate instruction and data caches may be internal or external to the
processor - normally use on-chip caches for speed

• Floating-point support is often lacking in RISC processors - use either a


separate memory mapped coprocessor or an on-chip FPU.

Advanced Computer Architecture 14


Section 3.3.3
• The debate continues as to whether RISC or CISC offer the best method of
processor design. Critics of the RISC philosophy point to the following
objections:
– Because of the small instruction set size operations that might take
only a few instructions on a conventional machine require complex
subroutines.
– RISCs require larger programs than CISCs for equivalent problems -
need for more storage. Also, increased likelihood of page faults
– Although RISC computers have simpler hardware, their program
compilers must be proportionately more complex to achieve high
performance.

Advanced Computer Architecture 15


Section 3.3.3
• RISC machines have significant advantages in implementing virtual
memory:-
– only instructions fetches, loads, and stores can cause page faults that
require swapping in a new page.
– In any of these cases it is relatively easy to back up execution and
restart the instruction after the required page has been brought in,
because the instructions executed have no side-effect that must also be
undone.
• While RISC offers significant design advantages it has not taken over
entirely from CISC. Today many manufacturers combine both RISC and
CISC design philosophies in processors designs.
• E.g. Intel, from the 486 series on, contains a RISC core for the
simplest and most common instructions. More complex instructions
still use CISC. The reason - backward compatibility with older CISC
processors. Advanced Computer Architecture 16

S-ar putea să vă placă și