Sunteți pe pagina 1din 92

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE

MIPS PROCESSOR IMPLEMENTATION



A graduate project submitted in partial fulfillment of the requirement

For the degree of Master of Science

In Electrical Engineering

By

Raghunandan Srinidhi








May 2012


The graduate project of Raghunandan Srinidhi is approved:








Ali Amini, Ph.D. Date











Shahnam Mirzaei, Ph.D. Date









Ramin Roosta, Ph.D., Chair Date




















California State University, Northridge

ii

Acknowledgement


I would like to dedicate this project to my family for their unconditional love and
great care throughout my life. In addition, I would like to thank Prof. Ramin
Roosta for his unlimited support and advisement.


































iii

Table of Contents

Signatures...ii
Acknowledgement iii
List of figures.. vi
List of Tablesvii
Abstract...viii

CHAPTER 1: INTRODUCTION... 1

1.1 MIPS Multicycle Processor ..... 1

1.2 Design Environment .... 2


CHAPTER 2: MULTICYCLE PROCESSOR DESIGN..3

2.1 MIPS subset for implementation..3

2.2 Single Cycle Datapath.....6

2.3 Analyzing Performance of Single Cycle Datapath .. 7

2.4 Clock Period Comparisons.....8

2.5 Multicycle Datapath ...10

2.6 Multicycle CPU with control ...12

2.7 Operation Flow..13


CHAPTER 3: DESIGN IMPLEMENTATION.... 14

3.1 ALU .... 14

3.2 Memory .. 14

3.3 Control ..16




iv
3.3.1 Micro operations and control groups .17

3.3.2 Control signal generation stages .18

3.3.3 Controller Implementation.... 19

3.4 Datapath ..22

3.4.1 Instruction Fetch ......22

3.4.2 Instruction Decode.... 23

3.4.3 Execution 24

3.4.4 Memory Write back .. 25

3.4.5 Datapath .... 25

3.5 Datapath and control ... 26

3.6 Processor and Memory 27


CHAPTER 4: RESULTS OF SIMULATION AND SYNTHESIS .. 28

4.1 Simulation with a Program in memory .....28

4.2 Synthesis Report .. 32


CHAPTER 5: Conclusion and Future Enhancements ... 33

APPENDIX - VHDL Codes of Multi cycle CPU . 34

REFERENCES . 84










v

List of Figures


Fig 2.1: Single cycle design.... 6
Fig 2.2: Clock Period in single cycle design....... 8
Fig 2.3: Clock Period in Multicycle design..... 8
Fig 2.4: Unbalanced delay in Multicycle design..... 9
Fig 2.5: Multicycle Datapath... 10
Fig 2.6: Multicycle Datapath with Control.. 12
Fig 2.7: Operation flow for the instructions..... 13
Fig 3.1: Memory Block Diagram. 14
Fig 3.2: Internal view of the memory.. 15
Fig 3.3: Inferred 256 x 8 memory....... 15
Fig 3.4: Top Level Block diagram of Controller 16
Fig 3.5: Control signal generation stages... 18
Fig 3.6: Sequence Controller Block Diagram. 20
Fig 3.7: ALU Controller Block Diagram 20
Figure 3.8: Instruction Fetch Block 22
Figure 3.9: Instruction Decode Block. 23
Figure 3.10: Instruction Execution Block... 24
Figure 3.11: Memory Writeback Block.. 25
Figure 3.12: Data path Block.. 25
Figure 3.12: Data path and Control Unit.... 26
Figure 3.13: Processor Sub modules.......27
Figure 3.14: Completed processor...27
Figure 4.1: Results in memory after simulation.. 30
Figure 4.2: Memory before and after simulation......31
Figure 4.3: Synthesis Report....32











vi
List of Tables



Table 2.1: R-type Instruction Field..3
Table 2.2: Load and Store Instruction Field4
Table 2.3: Branch Instruction Field.4
Table 2.4: Jump Instruction Field5
Table 2.5: Component delays..7
Table 3.1: PC group...17
Table 3.2: Memory group..17
Table 3.3: Register file group17
Table 3.4: ALU group... 18
Table 3.5: State transition table for Sequence Controller. 19
Table 3.6: Signals for ALU controller Implementation 21
Table 4.1: Instructions in memory before simulation.. 28
Table 4.2: Data in memory before simulation...28
Table 4.3: Expected values in memory. 29























vii

Abstract

MIPS PROCESSOR IMPLEMENTATION


By


Raghunandan Srinidhi


Master of Science in Electrical
Engineering


Computers and computer systems are a pervasive part of the modern world. Aside from
just the common desktop PC, there are a number of other types of specialized computer
systems. The central component of these computers and computer systems is the
microprocessor, or the CPU. The CPU (short for "Central Processing Unit") is essentially
the brains behind the computer system.

The scope of the project was to implement the design of a Multi Cycle Central Processing
Unit(CPU) in Very-High-Speed Integrated Circuits(VHSIC) Hardware Description
Language or commonly known as VHDL. The implementation was carried out to
understand the development of processor hardware as the design and customization of
embedded processors has become a mainstream task in the development of complex
SoCs (Systems-on-Chip).


















viii
"

CHAPTER 1: INTRODUCTION


1.1 MIPS Multi cycle Processor


The Multi cycle approach breaks instructions down into multiple steps. Each step is
designed to take one clock cycle. It allows each functional block to be used more than
once per instruction if they are used on different clock cycles.

This implementation has several key advantages over a Single cycle implementation.
First, it can share modules, allowing the use of fewer hardware components. Instead of
multiple arithmetic logic units (ALUs), the Multi cycle implementation uses only one.
Only one memory is used for the data and the instructions. Breaking complex instructions
into steps also allows us to significantly increase the clock cycles because we no longer
have to base the clock on the instruction that takes the longest to execute.

The Multi cycle implementation also uses several registers to temporarily hold the output
of the previous clock cycle. These include an Instruction register, Memory data register,
ALU Output register, etc.

The Multi cycle machine breaks simple instructions down into a series of steps.
These steps typically are the:
1. Instruction fetch step
2. Instruction decode and Register fetch step
3. Execution, memory address computation, or branch completion step
4. Memory access or R-type instruction completion step
5. Memory read completion step

During the instruction fetch step the Multi cycle processor fetches instructions from the
memory and computes the address of the next instruction, by incrementing the program
counter (PC). During the second step, the Instruction decode and register fetch step, we
decode the instruction to figure out what type it is: memory access, R-type, I-type,
branch. The third step, the Execution, memory address computation, or branch
completion step functions in different ways depending on what type of instruction the
processor is executing. For a memory access instruction the ALU computes the memory
address. An R-type instruction uses this third step to perform the actual arithmetic. This
third step is the last step for branch and jump instructions. It is the step where the next PC
address is computed and stored. The fourth step only takes place in load word, store
word, R-type, and I-type instructions. This step is when the load and store word
instructions access the memory and use an arithmetic-logical instruction to write its
result. Values are either loaded from memory and stored into the memory data register, or
loaded from a register and stored back into the memory. This fourth step is the last step
for R-type and I-type instructions.


#

For R and I type instructions this is the step where the result from the ALU computation
is stored back into the destination register. Only load instructions need the fifth step to
finish up. This is the memory read completion step. In a load instruction the value of the
memory data register is stored back into the register file.

These different steps are all controlled and orchestrated by the brain of the multi cycle
CPU. This brain is the controller. The controller is a finite state machine that works
with the Opcode to walk the rest of the components through all the different steps, or
states. The controller controls when each register is allowed to write and controls which
operation the ALU is performing.


1.2 Design Environment


Xilinx ISE Design Suite 13.2 is used as an environment for compiling and synthesis.
Very-High-Speed Integrated Circuit ( VHSIC) Hardware Description Language or
commonly known as VHDL programming is used to describe the behavior of the
design. Bottom-up methodology is implemented to design the modules. In this way,
the lowest level of hierarchy is tested first to eliminate possible errors on top
level design. For the simulator, ISIM Simulator is invoked.

























$

CHAPTER 2: MULTI CYCLE PROCESSOR DESIGN


2.1 MIPS subset for implementation

The Designed Multi cycle CPU could handle nine instructions. Of these, there are five R-
type instructions: add, subtract, and, or, and set less than. There are three I-Type
instructions: load word, store word, and branch on equal. The jump instruction is also
supported. All the instructions are 32 bits in width.

1. Arithmetic logic instructions [R- format]

The instructions we use all read two registers, perform an ALU operation and write back
the result. These arithmetic-logical instructions are also called R-type instructions. This
instruction class considers add, sub, slt, and and or.
The 32 registers of the processor are stored in a Register File. To read a data word two
inputs and two outputs are needed. The inputs are 5 bits wide and specify the register
number to be read, the outputs are 32 bits wide and carry the value of the register. To
write the result back two inputs are needed: one to specify the register number and one to
supply the data to be written. To process the data from the Register, an ALU with two
data inputs is used.

The instruction field for an R-Type instruction is shown in Table 2.1


Example: add $t0, $s1, $s2


000000

10001

10010

01000

00000

100000
op rs rt rd shamt funct

Table 2.1: R-type Instruction Field


The meaning of the fields is:
op: basic operation
rs : first source register
rt : second source register
rd: destination register
shamt: shift amount
funct: function


%

2. Load and store instructions [I- format]

The sw- and lw-instructions compute a memory address by adding a register value
to the 16-bit signed offset field contained in the instruction. Because the ALU has 32-bit
values, the instruction offset field must be sign extended from 16 to 32 bits simply by
concatenating the sign-bit 16 times to the original value.

The instruction field for a lw- or sw-instruction is shown in Table 2.2

Example: lw $t0, 32($s2)


100011

10010

01001

0000000000100000
op rs rt 16 bit number

Table 2.2: Load and Store Instruction Field

3. Branch instruction [I- format]

The beq instruction has three operands, two registers that are compared for equality, and
a 16-bit offset used to compute the branch target address relative to the branch instruction
address. The datapath for Branch instruction must do two operations: compare the
register contents and compute the branch target.
The address field of the branch instruction must be sign extended from 16 bits to 32 bits
and must be shifted left 2 bits so that it is a word offset. The branch target address is
computed by adding the address of the next instruction (PC + 4) to the computed offset.

The instruction field for a branch instruction is shown in Table 2.3

Example: beq $t8, $t9, 16


000100

11000

11001

0000000000000001
op rs rt rd

Table 2.3: Branch Instruction Field






&


4. Jump instruction [J- format]

The jump instruction is similar to the branch instruction, but computes the target PC
differently and not conditionally. The destination address for a jump is formed by
concatenating the upper 4 bits of the current PC + 4 to the 26-bit address field in the jump
instruction and adding 00 as the last two bits.

The instruction field for a jump-instruction is shown in Table 2.4

Example: j24


000010

00000000000000000000000110
op 26 bit number

Table 2.4: Jump Instruction Field



























'

2.2 Single Cycle Datapath

The Single Cycle Datapath attempts to execute all instructions in one clock cycle. This
means that any element can be used only once per instruction. So these elements have to
be duplicated. If possible datapath elements can be shared by different instruction flows.
Therefore multiple connections to the input must be realised. This is commonly done by a
multiplexer.
Figure 2.1 shows the combined datapath including a memory for instructions and one for
data, the ALU, the PC-unit and the mentioned multiplexers.








Fig 2.1: Single cycle design








After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4
th
Edition
(

2.3 Analyzing Performance of Single Cycle Datapath

Assuming the delay of each component in the data path is as shown in Table 2.5,
performance analysis could be performed on the Single cycle data path.


Component Delay
Register 0
Adder t
+

ALU t
A

Multiplexer 0
Register file t
R

Program memory t
I

Data memory t
M


Table 2.5: Component delays


Datapath delay for R-type {add, sub, and, or, slt} instruction in the Single cycle
implementation : max (t
+
, t
I
+ t
R
+ t
A
+ t
M
)
Delay for SW : max (t
+
, t
I
+ t
R
+ t
A
+ t
M
)
For LW : max (t
+
, t
I
+ t
R
+ t
A
+ t
M
+ t
R
)
For beq : max (t
+
+ t
+
, t
I
+ t
+
, t
I
+ t
R
+ t
A
)
For jmp : max (t
I
, t
+
)

Hence the Critical path delay of the single cycle design in figure 2.1 will be
max(t
I
+ t
R
+ t
A
+ t
M
+ t
R
, t
I
+ t
+
, t
I
+ t
+
).

The above expression proves that the Performance is pulled down by the slowest
instruction.

Some of the other problems with single cycle design include poor resource utilization and
there are some instructions, which are impossible to be implemented in this manner.
For example, an instruction that can transfer data from/to multiple memory locations
cannot be implemented in a single cycle approach.








)


2.4 Clock Period Comparisons






Fig 2.2: Clock Period in single cycle design




Fig 2.3: Clock Period in Multicycle design


Fig 2.2 and Fig 2.3 illustrate the clock periods in Single and Multicycle design. The
single cycle instructions clock period is dependent on the longest instruction but the
Multicycle designs clock period is dependent on the longest time taken by any major
functional block. The latency of lw instruction has increased slightly in the case of
Multicycle approach due to quantization of time in terms of clock period.


After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4
th
Edition
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4
th
Edition
*

As long as the time taken by each functional unit is similar, the performance of
Multicycle design is better. If there is a wide disparity in the delays of major functional
units, the Multicycle implementation may give poor performance as shown in Fig 2.4







Fig 2.4: Unbalanced delay in Multicycle design




















After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4
th
Edition
"+


2.4 Multicycle Datapath

To avoid the disadvantages of the single cycle implementation described in the section
before, a multicycle implementation is used.
This technique divides each instruction into steps and each step is executed in one clock
cycle. Fig 2.5 shows the design of a multicycle datapath.





Fig 2.5: Multicycle Datapath

Comparing to the single-cycle datapath the differences are that only one memory unit is
used for instructions and data, there is only one ALU instead of an ALU and two adders
and several output registers are added to hold the output value of a unit until it is used in
a later clock cycle.

The instruction register (IR) and the memory data register (DR) are added to save the
output of the memory. The registers A and B hold the register operands read from the
register file and the Res register holds the output of the ALU.
With exception of the IR all these registers hold data only between a pair of adjacent
clock cycles. Because the IR holds the value during the whole time of the execution of an
instruction, it requires a write control signal.




After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4
th
Edition
""


The reduction from former three ALUs to one cause the following changes in the
datapath:
An additional multiplexer is added for the first ALU input to choose between the A
register and the PC.
The multiplexer at the second ALU input is changed from a two-way to a four-way
multiplexer. The two new inputs are a constant 4 to increment the PC and the sign-
extended and shifted offset field for the branch instruction.
In order to handle branches and jumps more additions in the datapath are required.

The three cases of R-type instructions, branch instruction and jump instruction cause
three different values to be written into the PC:
1. The output of the ALU which is PC + 4 should be stored directly to the PC.
2. The register Res after computing the branch target address.
3. The lower 26 bits of the IR shifted left by two and concatenated with the upper 4 bits
of the incremented PC, when the instruction is jump.

If the instruction is branch, the write signal for the PC is conditional. Only if the two
compared registers are equal, the computed branch address has to be written to the PC.

Therefore the PC needs two write signals, which are PWu if the write is unconditional
(value is PC + 4 or jump instruction) and PWc if the write is conditional. The output of
the ALU Zero bit is ANDed with PWc and the result is ORed with PWu to get the write
control signal PW of the Program Counter.






















"#

2.5 Multicycle CPU with control


Figure 2.6 shows the completed datapath for a multicycle implementation including the
whole control.







Fig 2.6: Multicycle Datapath with Control









After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4
th
Edition
"$

2.6 Operation flow

Figure 2.7 illustrates the operation flow for instructions in the multicycle implementation.
The execution of an instruction is broken into clock cycles, which mean that each
instruction is divided into a series of steps.





Fig 2.7: Operation flow for the instructions











"%


CHAPTER 3: DESIGN IMPLEMENTATION

3.1 ALU

The arithmetic-logic unit (ALU) performs basic arithmetic and logic operations which are
controlled by the opcode. The result of the instruction is written to the output. An
additional zero-bit signalizes a high output if the result equals zero.
At the present time, the basic arithmetic operations add and sub and the logic operations
and, or and slt can be applied to inputs. The inputs are 32 bit wide with type unsigned.

3.2 Memory

Data is synchronously written to or read from the memory with a data bus width of 32 bit.
The memory consists of four ram blocks with 8 bit data width each.
A control signal enables the memory to be written, otherwise data is only read. In order to
store data to the memory the data word is subdivided into four bytes which are separately
written to the ram blocks. Vice versa, the single bytes are concatenated to get the data
word back again.

At the moment, it is only possible to read and write data words. An addressing of half-
words or single bytes is not allowed. In order to write or read data words, all ram blocks
have to be selected. Hence, the lowest two bits are not examined for chip-select logic.
Data is addressed by the MIPS-processor with an address width of 32 bit, while the
address width of a ram block is 8 bit each. All ram blocks are connected to the same
address, namely from mem_Adr(9 downto 2). Since we do not use the full address width
for addressing and chip selects, data words are addressed by multiple addresses.

The Block diagram of the memory with its I/O interface is shown in Fig 3.1



Fig 3.1: Memory Block Diagram
"&


Fig 3.2 illustrates the internal view of the memory with four instantiated 256 x 8 RAM
modules.



Fig 3.2: Internal view of the memory

Fig 3.3 shows the inferred 256 x 8 memory.

Fig 3.3: Inferred 256 x 8 memory



"'

3.3 Control

The Control Unit is basically divided into two parts. They are,

! Sequence Controller
! ALU Controller

Fig 3.4 illustrates the Top level view of the Controller which takes in specific bit fields of
the instruction to generate the required control signals. In the case of Branch instruction,
it should also consider Zero output of the ALU as an input.





Fig 3.4: Top Level Block diagram of Controller











"(

3.3.1: Micro operations and Control groups

A number of micro operations are identified and control groups defined to ease the task
of setting control signals. As an example, PC = PC+4 comes under the Program counter
group and is identified as a micro operation. The PC related signals are given required
values to accomplish the micro operation. PC = PC + 4 is given a simpler name as PCinc
for the ease of identification. Similar procedure is carried out for other control groups
such as Memory group, Register file group and ALU group as illustrated in Table 3.1
Table 3.4


Operation Pwu PWc Psrc
PC = PC + 4 Pcinc 0 X X
if (A==B) PC = res branch 0 1 0
PC=PC[31-28]||s2(IR[25:0]) jump 1 X 2

Table 3.1: PC group

Expression for PC write enable, PW = PWu + Z.PWc


Operation MW MR IorD IW
IR = Mem[PC] fetch 0 1 0 1
DR = Mem[Res] memLoad 0 1 1 0
Mem[Res] = B m_wr 1 0 1 0

Table 3.2: Memory group


Operation RW Rdst M2R AW BW
A=RF[IR[25-21]] rs2A 0 X X 1 0
B=RF[IR[20-16]] rt2B 0 X X 0 1
RF[IR[15-11]]=Res reg_state 1 1 0 0 0
RF[IR[20-16]]=DR mem_done 1 0 1 0 0

Table 3.3: Register file group


")


Operation opc Asrc1 Asrc2
PC=PC+4 Pcinc 0 0 1
Res=A opB alu_exec 2 1 0
Res=A+sx(IR[15-0]) memAddr 0 1 2
Res=PC+s2(sx(IR[15-11])) Paddr 0 0 3
if(A==B)PC=Res branch 1 1 0


Table 3.4: ALU group

3.3.2 Control signal generation stages

Fig 3.5 shows the flow for each instruction in different clock cycles. The micro operation
name identifies the required control signal values at that particular stage as illustrated in
table 3.1 3.4 in the previous section.





Fig 3.5: Control signal generation stages




"*


3.3.3 Controller Implementation:


The control of the processor is realized by a Finite State Machine. The input to the State
Machine is the upper 6 bits of the function field containing the instruction.
The outputs of the state machine are the control signals of the single functional units of
the processor implementation especially the multiplexers of the datapath.



R-Class sw lw beq j
fetch_inst decode_inst decode_inst decode_inst decode_inst decode_inst
decode_inst alu_exec mem_Addr

mem_Addr branch jump
alu_exec reg_state X X X X
reg_state fetch_inst X X X X
mem_Addr X mem_store mem_load X X
mem_store X fetch_inst X X X
mem_load X X mem_done X X
mem_done X X fetch_inst X X
branch X X X fetch_inst X
jump X X X X fetch_inst

Table 3.5: State transition table for Sequence Controller











#+


The Sequence Controller is implemented as a FSM using the State transition table 3.5 and
the respective outputs in each state are set using the values in the Tables 3.1 to 3.4.
The Block Diagram of the Sequence Controller is as shown in Fig 3.3.2.





Fig 3.6: Sequence Controller Block Diagram


The Operation Code of the ALU is stored in a truth table and the corresponding Opcode
is produced depending on the opc signal of the state machine and the lower 6 bits of the
function field containing the information which of the arithmetic or logic instruction is to
use.


Fig 3.7: ALU Controller Block Diagram





#"



Table 3.6 gives the required select outputs (OP) to control the operation of the ALU.

The inputs to the ALU controller are the opc bits from the Sequence controller block and
the function field in the instruction as shown in Fig 3.3.3



TYPE opc ins FUNC Action OP
R-type 10 Add 100000 Add 010
R-type 10 Sub 100010 Sub 110
R-type 10 And 100100 And 000
R-type 10 Or 100101 or 001
R-type 10 slt 101010 setOnLess 111
SW 00 Sw

xxxxxx Add 010
LW 00 lw xxxxxx Add 010
beq 01 beq xxxxxx sub 110
j

xx j xxxxxx xxx xxx

Table 3.6: Signals for ALU controller Implementation















##


3.4 Data path

The datapath is divided into four sections with respect to the pipelining structure of a
processor. The four parts are the Instruction Fetch, Instruction Decode, Execution and
Memory Writeback.

3.4.1 Instruction Fetch

The Instruction Fetch Block contains the PC, Instruction Register and the Memory Data
Register.
This part provides the data and instructions from the memory.

Fig 3.4.1 shows the Block view of the Instruction fetch module.



Figure 3.8: Instruction Fetch Block










#$

3.4.2 Instruction Decode

The Instruction Decode Block decodes the instruction in Instruction Register and
addresses the Register File and computes the second operand for a Branch Instruction or
a sw- or lw-instruction. The sub components of this block are the register file, register A
and register B.

Fig 3.9 shows the Block view of the Instruction decode module.



Figure 3.9: Instruction Decode Block












#%

3.4.3 Instruction Execution

The Execution Block contains the ALU as main element and computes the desired result
of the instruction. It also computes the jump target address. The operands loaded to the
ALU are chosen by two multiplexers, which have select signals Asrc1 and Asrc2.
Main Component of Execution Block is ALU that is Capable of executing 9 instructions.

Fig 3.10 shows the Block view of the Instruction execute module.




Figure 3.10: Instruction Execution Block













#&

3.4.4 Memory Writeback

The Memory Writeback Block consists of the Res register and a multiplexer with select
signal Psrc. This block leads the result of the computation either back to memory or to
the register file. This stage calculates value to be loaded into PC.

Fig 3.11 shows the Block view of the Memory Writeback module.



Figure 3.11: Memory Writeback Block

3.4.5 Data Path

Fig 3.12 shows the Block view of the Data path module.



Figure 3.12: Data path Block



#'

3.5 Datapath and control

Both the Datapath and Control are combined to form the processing unit as illustrated in
Fig 3.12





Figure 3.12: Data path and Control Unit





#(


3.6 Processor and Memory

Data plus control block, which represents the datapath and control, is combined with
Memory to complete the processor as shown in Figure 3.13 and Figure 3.14



Figure 3.13: Processor Sub modules



Figure 3.14: Completed processor
#)


CHAPTER 4: RESULTS OF SIMULATION AND SYNTHESIS

4.1 Simulation with a Program in memory

The following instructions were written into the memory to verify the functionality of the
Processor.

Memory
Address(hex)
Instruction op rs rt rd shamt funct (In Hex)
000 lw $t8,896($zero) 100011 00000 11000 0000001110000000 8C180380
004 lw $t9,900($zero) 100011 00000 11001 0000001110000100 8C190384
008 beq $t8, $t9,16 000100 11000 11001 0000000000000001 13190001
00C UNDEFINED
UUUUUU UUUUU UUUUU UUUUU UUUUU UUUUUU

010 j24 000010 00000000000000000000000110 08000006
014 UNDEFINED
UUUUUU UUUUU UUUUU UUUUU UUUUU UUUUUU

018 and $t0, $t9, $t8 000000 11001 11000 01000 00000 100100 03384024
01C sw $t0, 960($zero) 101011 00000 01000 0000001111000000 AC0803C0
020 lw $t1, 904($zero) 100011 00000 01001 0000001110001000 8C090388
024 or $t2, $t8, $t1 000000 11000 01001 01010 00000 100101 03095025
028 sw $t2, 964($zero) 101011 00000 01010 0000001111000100 AC0A03C4
02C add $t3, $t8, $t9 000000 11000 11001 01011 00000 100000 03195820
030 sw $t3, 968($zero) 101011 00000 01011 0000001111001100 AC0B03C8
034 sub $t4, $t1, $t8 000000 01001 11000 01100 00000 100010 01386022
038 sw $t4, 972($zero) 101011 00000 01100 0000001111000000 AC0C03CC
03C slt $t5, $t8, $t1 000000 11000 01001 01101 00000 101010 0309682A
040 sw $t5, 976($zero) 101011 00000 01101 0000001111010000 AC0D03D0

Table 4.1: Instructions in memory before simulation

The following data was written to the memory as in Table 4.2

Memory
Address
(hex)
Memory
Address
(Dec)

Contents
(hex)
380 896 10
384 900 10
388 904 11

Table 4.2: Data in memory before simulation

#*

The expected values as shown in Table 4.3 should be stored back into the memory.

Memory
Address
(hex)
Memory
Address
(Dec)

Contents
(hex)

Description
F0 960 00000010 10 AND 10 = 10
F1 964 00000011 10 OR 11= 11
F2 968 00000020 10 ADD 10 = 20
F3 972 00000001 11 SUB 10 = 01
F4 976 00000001 10 SLT 11 = 01(true)

Table 4.3: Expected values in memory

The simulation starts at memory address 000 with a load word instruction. The value of
memory address 896 is written into register $t8. The PC is incremented and the next
instruction at memory address 004 is executed. It is also a load word instruction which
loads the value of memory address 900 to register $t9.

Then a branch instruction follows which compares the two operands in registers $t8 and
$t9. As the data stored in two addresses 896 and 900 were the same, the values in two
registers are equal and hence beq affects the program counter to jump to address 16.Then
a Jump is executed to address (24)
d
or (18)
h
.

An and instruction follows which ands the two operands in registers $t8 and $t9 and puts
the result in $t0. Then a store word instruction writes the content of register $t0 to the
memory at address (960)
d
or (F0)
h
.

A load word instruction loads a new data value from address (964)
d
or (F1)
h
.
The following instructions are for or, add, sub and slt. The result of a computation is
always stored to the memory by a store word instruction.












$+

Memory window is illustrating the values in memory as shown in Fig 4.1
Highlighted in blue are the bytes of the first instruction distributed in 4 separate
memories. Other instructions are stored in a similar way. The final result of simulation is
highlighted in black.

Figure 4.1: Results in memory after simulation


$"

Fig 4.2 shows the memory before and after simulation and illustrates the results as they
are stored back in memory.




Figure 4.2: Memory before and after simulation



$#

4.2 Synthesis Report





Figure 4.3: Synthesis Report








$$


Chapter 5: CONCLUSION AND FUTURE ENHANCEMENTS


5.1 Conclusion:

The multicycle implementation of a CPU is a great improvement over a single cycle
implementation. It allows for the use of fewer design modules and a faster clock speed by
breaking instructions into up to five different steps. These steps are controlled by a finite
state machine controller. The design shows implementation of a multicycle CPU capable
of handling nine different instructions. These instructions are in a variety of categories:
R-type, I-type, and j-type. Each of these categories has a different instruction format. A
multi-stepped design methodology was implemented to break this big project into many
smaller steps. This project shows the wide variety of things to consider and components
of a multicycle processor implementation.


5.2 Future Enhancements:

! Realize a hardware implementation of Processor and memory in order to verify
the behavior of the desired hardware on Xilinx Spartan 3E development board.
! Introduce the pipelining of instructions to improve the performance of the
processor.


















$%

References


David A. Patterson, John L. Hennessy: Computer Organization and Design - The
Hardware/Software Interface. Fourth Edition (2006). Morgan Kaufmann Publisher, Inc.

http://www.iitg.ernet.in/asahu/cs222/ Lectures on Computer Organization and
Architecture. Retrieved: October 2011

http://nptel.iitm.ac.in/video.php?subjectId=106102062 Video tutorials on
Computer Architecture Principles. Retrieved: November 2011

http://www.seas.upenn.edu/~ese171/vhdl/vhdl_primer.html VHDL Primer. Retrieved:
October 2011

http://www.xilinx.com/itp/xilinx10/books/docs/sim/sim.pdf Xilinx Synthesis and
Simulation Guide 10.1. Retrieved: December 2011

http://www.gstitt.ece.ufl.edu/courses/eel4712/lectures/vhdl/xst.pdf Xilinx Synthesis
Technology (XST) User Guide. Retrieved: January 2012


























$&

APPENDIX

VHDL Codes of Multi cycle CPU

$'



$(



$)



$*



%+



%"



%#



%$



%%



%&



%'



%(



%)



%*



&+



&"



&#



&$



&%



&&



&'



&(



&)



&*



'+



'"



'#



'$



'%



'&



''



'(



')



'*



(+



("



(#



($



(%




(&



('



((



()



(*



)+



)"



)#



)$



)%

S-ar putea să vă placă și