Implementation of A Verilog Multicycle CPU-FinalDraft

Implementation of a Verilog
Multicycle CPU
Joey Nirschl, Benjamin Holland
Iowa State University

Department of Computer Engineering
Ames, Iowa 50011
(515) 294-4111
jnirsch@iastate.edu, bholland@iastate.edu
Keywords: Verilog, Simulation, Multicycle Processor, CPU,

Datapath, Instruction set
Abstract: This project was a semester term project to solidify our
gained knowledge in CPU datapath design. The project

requirements include having at least 15 different instructions,
including branch and jump instructions. Each module should be
separately testable. The entire design should be implemented and
have the ability to run a small sample program which is easily
changeable to demonstrate functionality.
Group Contributions:
Joey Nirschl: High level design, testing, implementation.
Benjamin Holland: Individual component design, testing,
implementation.
Time Contribution:
Joey Nirschl’s Hours: 25 (50% of work)
Ben Holland’s Hours: 25 (50% of work)
Project Work Breakdown
Design (10%)
Programming (20%)
Testing (50%)
Documentation (20%)
Table of Contents
Purpose of Machine
Instruction Set Definition
Instruction Format
Design Methodology
Design
Testing Methodology
Conclusion
Lessons Learned
Appendix A – Verilog Code & Testbench
Appendix B – Simulation Results
Appendix C – Commonly Made Verilog Mistakes
Appendix D – Figures and Diagrams
Appendix E – Sources
1. Purpose of Machine:
The multicycle CPU design is an improvement on the single cycle design. In this
implementation the multicycle design allows for instructions to be executed in multiple
stages. This is a great improvement to the signal cycle design because it allows instructions
to be executed completely in three to five stages per instruction.
These stages include:
Stage 1: Instruction fetch

Stage 2: Instruction decode/register fetch
Stage 3: Memory address computation, execution, branch or jump completion
Stage 4: Memory access load, memory access store, R-type instruction completion
Stage 5: Memory read completion
In our implementation every instruction shares the first two stages which are the
instruction fetch and instruction decode/register fetch stages. In the first stage data is fetched
from memory and stored in the memory data register and the instruction register. The second
stage decodes the instruction to either a R-type instruction, a branch instruction, a jump
instruction or a memory address.
At stage three the instruction may take separate logical paths depending on the instruction
type which was decoded in stage 2. A finite state machine of these logical paths is described
in Figure 1 of Appendix D. Stage three is the last stage for instructions of branch or jump
types. After either of these two instructions have completed the next instruction is fetched in
stage one, and the logic cycle restarts at the beginning of stage one.
Stage four occurs for R-type and I-type instructions and for instructions which require
memory access (load word, store word). Both store word and R/I-type instructions end in
stage four. R/I-type instructions must now store the ALU result in the register file. Store
word must store values to memory in this stage. The logic cycle then begins again at stage
one with the instruction fetch.
Stage five is only responsible for the load word instruction which after reading the word
from memory still needs to store the word to the register file. Load word instructions must
load values from memory and store values into a register. After the writing of data to the
register file is complete the cycle then will again continue with the fetching of the next
instruction in stage one.
The control module of the datapath is responsible for organizing and updating the stages
of instructions. The advantage of breaking instructions up into stages is that fast instructions
can be completed in fewer stages than slower instructions whereas in a single cycle design,
all instructions are implemented in one stage requiring the system to wait during every
instruction for the time it would take the longest instruction to finish. Since some
instructions can now finish in one to two cycles sooner than in the single cycle
implementation, the overall average number of clock cycles required to execute instruction
code is drastically reduced.
2. Instruction Set Definition
This implementation of a multicycle CPU has support for R-type instructions, I-type
instructions, as well as branch and jump instructions. Special logic has been added to the
control unit to support I-type instructions because I-type instructions were not previously
implemented in the design by Patterson and Hennessy. The instruction set is modeled off of
the MIPS (millions of instructions per second) instruction set. Aside from a few minor
differences in operation codes the implemented instruction follows the MIPS instruction set
convention. The instruction format is discussed in more detail in the next section.
*Stages were added to the finite state machine to support additional functionality. The
FSM can be viewed in Appendix D. (The additional logic of each figure is indicated in red.)
The instructions included in this set are as listed below:
add – Add, stores the addition of register source (rs) and register target (rt) into register
destination (rd).
R[rd]=R[rs] + R[rt]
sub – Subtract, stores the difference of register source (rs) and register target (rt) into register
destination (rd).
R[rd]=R[rs] - R[rt]
and – And, stores the bitwise and operation of register source (rs) and register target (rt) into
register destination (rd).
R[rd]=R[rs] & R[rt]
or – Or, stores the bitwise or operation of register source (rs) and register target (rt) into
R[rd]=R[rs] | R[rt]
xor - Xor, stores the bitwise xor operation of register source (rs) and register target (rt) into
R[rd]=R[rs] ^ R[rt]
slt – Set Less Than, conditionally stores a value 1 or 0 in register destination (rd) if register
source (rs) is less than register target (rt).
if(R[rs}<R[rt]){R[rd]=1}
else{R[rd]=0}
beq – Branch Equal, Conditionally upon equality of register source (rs) and register target
(rt) branch to current pc value + 4 + immediate value.
if(R[rs]==R[rt]){PC=PC+4+BranchAddress}
lw – Load Word, loads a 32-bit quantity at memory address in register source (rs) + sign
extended immediate into the register target (rt). R[rt]=M[R[rs]+SignExtendedImmediate]
sw – Store Word, stores a 32-bit quantity in register target (rt) to register source (rs) +sign
extended immediate .
M[R[rs]+ SignExtendedImmediate]=R[rt]
addi – Add Immediate, stores the addition of register source (rs) and the sign extended
immediate value into register target (rt).
R[rt]=R[rs] + SignExtendedImmediate
andi - And Immediate, stores the bitwise and operation of register source (rs) and the zero
extended immediate value into register target (rt).
R[rt]=R[rs] & ZeroExtendedImmediate
xori - Xor Immediate, stores the bitwise xor operation of register source (rs) and the zero
R[rt]=R[rs] ^ ZeroExtendedImmediate
ori - Or Immediate, stores the bitwise or operation of register source (rs) and the zero
R[rt]=R[rs] | ZeroExtendedImmediate
slti – Set Less Than Immediate, conditionally stores a value 1 or 0 in register destination (rt)
if register source (rs) is less than the sign extended immediate value.
if(R[rs}<SignExtendedImmediate){R[rt]=1}
else{R[rt]=0}
j – Jump, unconditionally jumps to the instruction at the specified address.

PC = PC[31:28]+address<<2
3. Instruction Format
The instruction format is different for each of the three instruction types. An R-
type instruction has six fields which include the opcode, rs, rt, rd, shamt, and function
fields. The opcode for an R-type instruction is always zero. The function field defines
the type of the R-type instruction (ex: add, sub, and, or, ect.). The shamt field is used in
shifting operations (not implemented in this design). RD is the register destination which
is where the operation result is stored after execution of the instruction. RS (register
source) and RT (register target) are the fields referencing the register values to be used in
the computational operation of the instruction.
The I-type instruction has 4 fields. The opcode for an I-type instruction defines
the operation of the instruction. RS (register source) and RT (register target) are the
fields referencing the register values to be used in the computational operation of the
instruction. The immediate field of the instruction can either be used as a constant value
or as way to compute a memory address by sign extending the value.
The J-type instruction has an opcode just like the other two instructions in order to
define the instruction operation. The J-type instruction also has an address field which
can be used to jump to the specified memory address.
The figures below show the individual fields of each instruction type.
Instruction Instruction Type Opcode Function
add R 0x00 0x20
sub R 0x00 0x22
and R 0x00 0x24
or R 0x00 0x25
xor R 0x00 0x01
slt R 0x00 0x2A
xori I 0x0F N/A
beq I 0x04 N/A
lw I 0x23 N/A
sw I 0x2B N/A
addi I 0x08 N/A
andi I 0x0C N/A
ori I 0x0D N/A
slti I 0x0A N/A
j J 0x02 N/A
4. Design Methodology
The general approach to this design was to map out a high level design of the
system. Our design was based off of the ideas presented in the Computer Organization
and Design textbook written by David A. Patterson and John L. Hennessy. Figures 2 and
3 show a general outline of how our implementation was planned out on paper before
implementation. The red markings on the figures in Appendix D are the modifications
that were made to the design.
After the high level design we broke the datapath down in separate modules so
that we could divide work among team members and test functionality of each module
individually. Testing each module individually was extremely important because it
allowed us to catch many errors in a controlled environment before it became cluttered in
the traffic of the entire system. Modularizing code also allows team members to assign
responsibility and let team members specialize in specific areas of the code creating more
efficient code than if the system were not modularized. Having the system be modular
allows for a greater resistance to change for the overall system because if new
functionality is needed either a new modular is added or logic is modified within a
module to accommodate the added requirement. Although having a system be modular is
an important aspect, it is also important to note that the original design must be good
enough to allow for code functionality to be modularized in the first place.
Once the system has been designed and implemented in pieces it is only a matter
of combining the pieces of the system to make the entire CPU. This is easy to do in
theory, but with every project there are unforeseen mistakes and poor logic errors.
Thankfully, because the system was designed well to begin with, there was room for
change and modifications to correct the mistakes of the early implementation.
After an intensive debug period, the system was complete. At this point we were
able to fully document the entirety of the project and consider features to add or subtract
as well as other design changes.
5. Design
As mentioned earlier the original design was roughly based off of Figures 2 and 3
of Appendix D. Also as mention above, each of the core datapath functionalities was
implemented in a separate module. Actual project code can be seen in Appendix A.
Below is a top level view of the final datapath implementation. (The additional logic of
each figure is indicated in red.)
Top Level Design - Final Implementation
m em Address~[5..0] InstructionDecode:IDStage zero_extend:extender
pc[0]~reg0
opcode[5..0]
PRE SEL DataM em ory:data clock
pc~_OUT 0 D Q rs[4..0] zerovalue[31..0] zero_extend:extender_zerovalue
value[15..0]
memWrite rt[4..0] pc[0]~reg0_OUT 0
clock
ENA memRead rd[4..0]
DATAA readData[31..0] instruction[31..0]
CLR ALUM ulticycle:ALU OUT0 Address[5..0] funct[5..0]
writeData[31..0] immediate[15..0] M ulticycleControlFSM :m ainControl
aluctrl[3..0] ALUOut[31..0]
zero address[25..0]
valueA[31..0] PRE ALUSrcA
result[31..0] D Q DATAB
FiveT oOneM ux32:registerBm ux_out valueB[31..0] IorD iord
RegDst regdst
ENA
MUX21 MemtoReg m em toreg
CLR
MemRead m em read
PCWriteCondition pcwritecond
clock clk
PCWrite pcwrite
pc[1]~reg0 regFileRT L:RT L twom ux32:registerAm ux opcode[5..0]
MemWrite m em write
PRE twom ux32:writedata register_A[31..0]
D Q clock
PRE
a IRWrite irwrite
a regWrite D Q x1[31..0] RegWrite regwrite
ENA x1[31..0] x[31..0] inData[31..0] regA[31..0] forceAdd aluscra
CLR x0[31..0] wrReg[4..0] regB[31..0] ENA ALUSrcB[2..0] alu_out[31..0]
readA[4..0] CLR PCSource[1..0] pcsource[1..0]
readB[4..0] alusrcb[2..0]
pc[2]~reg0
PRE register_B[31..0]
D Q sign_extend:extendIm m ediate InstructionDecode:IDStage_address
PRE
D Q
ENA clock
signvalue[31..0] sign_extend:extendIm m ediate_signvalue
CLR ENA value[15..0]
M ulticycleControlFSM :m ainControl_PCWriteCondition
CLR
M ulticycleControlFSM :m ainControl_PCWrite
pc[3]~reg0 M ulticycleControlFSM :m ainControl_ALUSrcB
PRE M ulticycleControlFSM :m ainControl_PCSource
D Q
pc[3]~reg0_OUT 0
pc[2]~reg0_OUT 0
ENA
x[31..0] pc[1]~reg0_OUT 0
CLR
x0[31..0] ALUM ulticycle:ALU_zero
ALUM ulticycle:ALU_result
pc[4]~reg0
PRE
D Q pc[4]~reg0_OUT 0
ENA
CLR
pc[5]~reg0
PRE
ENA
CLR
ALUControlM ulti:alucontrol
forceadd
funct[5..0] ALUIn[3..0]
opcode[5..0] m em _out[31..0]
m em DataReg[31..0]
PRE
D Q
ENA
CLR
twom ux5:writereg
a
x1[4..0] x[4..0]
x0[4..0]
pc[31]~reg0
PRE
D Q
ENA
CLR
pc[30]~reg0
PRE
D Q
ENA
CLR
pc[29]~reg0
PRE
D Q
pc[31..0]
ENA
CLR
pc[28]~reg0
PRE
D Q
ENA
CLR
pc[27]~reg0
PRE
D Q
ENA
CLR
pc[27]~reg0_OUT 0
pc[26]~reg0 pc[28]~reg0_OUT 0
pc[29]~reg0_OUT 0
PRE
pc[30]~reg0_OUT 0
ENA pc[31]~reg0_OUT 0
CLR
pc[25]~reg0
PRE
ENA
CLR
pc[24]~reg0
PRE
ENA
CLR
pc[23]~reg0
PRE
ENA
CLR
pc[22]~reg0
PRE
ENA
CLR
pc[21]~reg0
PRE
ENA
CLR
pc[20]~reg0
PRE
ENA
CLR
pc[19]~reg0
PRE
ENA
CLR
pc[18]~reg0
PRE
ENA
CLR
pc[17]~reg0
PRE
ENA
CLR
pc[16]~reg0
PRE
ENA
CLR
pc[15]~reg0
PRE
ENA
CLR
pc[14]~reg0
PRE
ENA
CLR
pc[13]~reg0
PRE
ENA
CLR
pc[12]~reg0
PRE
ENA
CLR
pc[11]~reg0
PRE
ENA
CLR
pc[10]~reg0
PRE
ENA
CLR
pc[9]~reg0
PRE
ENA
CLR
pc[8]~reg0
PRE
ENA
CLR
pc[7]~reg0
PRE
ENA
CLR
pc[6]~reg0
PRE
ENA
CLR
Add0
cycle[31..0]~reg0
A[31..0] PRE
32' h00000001 --
OUT[31..0] D Q cycle[31..0]
B[31..0]
ENA
ADDER CLR
ALUOut_OUT 0
zero
register_B_OUT 0
*To examine details of design please use the zoom feature of your PDF viewer
Datapath Control - Final Implementation
current_state WideOr9 RegWrite~reg0
Equal0
PRE
next_state:E D Q RegWrite
next_state 0000
next_state:0000
opcode[5..0] A[5..0] B
OUT Equal0:OUT next_state:B ENA
6' h23 -- B[5..0] C
Equal1:OUT next_state:C CLR
D
Equal5:OUT next_state:D
E
EQUAL Equal6:OUT next_state:F
F PCWrite~0 PCWrite~reg0
Equal7:OUT next_state:G
Equal1 G PRE
Equal8:OUT next_state:H D Q PCWrite
H
Equal9:OUT I next_state:I
I
A[5..0] Equal4:OUT G next_state:J ENA
OUT J
6' h2B -- B[5..0] Equal3:OUT C next_state:K CLR
K
Equal2:OUT B next_state:L
L
current_state.K 0000 next_state:M
EQUAL M MemRead~0 MemRead~reg0
current_state.M J clk
PRE
Equal5 current_state.C K D Q MemRead
current_state.B M
WideOr2 D ENA
A[5..0]
OUT current_state.G F CLR
6' h08 -- B[5..0]
WideOr3 E
current_state.D H
IorD~0 IorD~reg0
EQUAL WideOr4 L
PRE
Equal6
WideOr5 D Q IorD
WideOr6
current_state.0000 ENA
A[5..0] WideOr8 CLR
OUT
6' h0A -- B[5..0] WideOr7
clk
ALUSrcB~0 forceAdd~reg0
EQUAL PRE
D Q forceAdd
Equal7
ENA
CLR
A[5..0]
OUT
6' h0D -- B[5..0]
WideOr1 ALUSrcB[1]~reg0
PRE
EQUAL D Q
Equal8 ALUSrcB[2..0]
ENA
CLR
A[5..0]
OUT
6' h0C -- B[5..0]
WideOr0 ALUSrcA~reg0
PRE
D Q ALUSrcA
EQUAL
Equal9 ENA
CLR
A[5..0]
OUT
6' h0F -- B[5..0]
EQUAL
Equal4
A[5..0]
OUT
6' h02 -- B[5..0]
EQUAL
Equal3
A[5..0]
OUT
6' h04 -- B[5..0]
EQUAL
Equal2
A[5..0]
OUT
6' h00 -- B[5..0]
EQUAL
WideOr2
WideOr3
WideOr4
WideOr5
WideOr6
WideOr8
WideOr7
ALUSrcB[2]~reg0 RegDst~reg0
PRE PRE
D Q D Q RegDst
ENA ENA
CLR CLR
PCWriteCondition~reg0
PRE
D Q PCWriteCondition
ENA
CLR
PCSource[1..0]~reg0
PRE
D Q PCSource[1..0]
ENA
CLR
MemWrite~reg0
PRE
D Q MemWrite
ENA
CLR
MemtoReg~reg0
PRE
D Q MemtoReg
ENA
CLR
IRWrite~reg0
PRE
D Q IRWrite
ENA
CLR
ALUSrcB[0]~reg0
PRE
D Q
clk
ENA
CLR
ALU - Final Implementation
Mux32
SEL[3..0]
16' h00E7 --
OUT Mux32_OUT
DATA[15..0]
MUX
Mux31
SEL[3..0]
node[319..1]
319' h00000000000000000000000000000000000000000000000000000000000000000000000000000000 --
1' h0 --
BUF (DC)
Add1 OUT Mux31_OUT

DATA[15..0]
valueA[31..0]
A[32..0]
1' h1 --
OUT[32..0]
valueB[31..0]
B[32..0]
result~32_OUT0 1' h1 --
MUX
ADDER
Mux30
result~38_OUT0
result~69_OUT0
SEL[3..0]
result~37_OUT0
result~36_OUT0
1' h0 --
result~35_OUT0
result~95_OUT0 OUT Mux30_OUT

DATA[15..0]
result~65_OUT0
result~33_OUT0
result~63_OUT0
result~31_OUT0
result~39_OUT0
result~40_OUT0
MUX
result~41_OUT0
result~42_OUT0 Mux29
result~43_OUT0
result~44_OUT0
SEL[3..0]
result~45_OUT0
result[31]~0_OUT0
1' h0 --
result~64_OUT0

DATA[15..0]
result~96_OUT0
result~62_OUT0
result~30_OUT0
MUX
Mux28
SEL[3..0]
1' h0 --

DATA[15..0]
result~61_OUT0
result~29_OUT0
MUX
Mux27
SEL[3..0]
1' h0 --

DATA[15..0]
result~60_OUT0
result~28_OUT0
MUX
Mux26
SEL[3..0]
1' h0 --

DATA[15..0]
result~59_OUT0
result~27_OUT0
MUX
Mux25
SEL[3..0]
1' h0 --

DATA[15..0]
result~58_OUT0
result~26_OUT0
MUX
Mux24
SEL[3..0]
1' h0 --

DATA[15..0]
result~57_OUT0
result~25_OUT0
MUX
Mux23
SEL[3..0]
1' h0 --

DATA[15..0]
result~56_OUT0
result~24_OUT0
MUX
Mux22
SEL[3..0]
1' h0 --

DATA[15..0]
result~55_OUT0
result~23_OUT0
MUX
Mux21
SEL[3..0]
1' h0 --

DATA[15..0]
result~54_OUT0
result~22_OUT0
MUX
Mux20
SEL[3..0]
1' h0 --

DATA[15..0]
result~53_OUT0
result~21_OUT0
MUX
Mux19
SEL[3..0]
1' h0 --

DATA[15..0]
result~52_OUT0
result~20_OUT0
MUX
Mux18
SEL[3..0]
1' h0 --

DATA[15..0]
result~51_OUT0
result~19_OUT0
MUX
Mux17
SEL[3..0]
1' h0 --

DATA[15..0]
result~50_OUT0
result~18_OUT0
MUX
Mux16
SEL[3..0]
1' h0 --

DATA[15..0]
result~49_OUT0
result~17_OUT0
MUX
Mux15
SEL[3..0]
1' h0 --

DATA[15..0]
result~48_OUT0
result~16_OUT0
MUX
Mux14
SEL[3..0]
1' h0 --

DATA[15..0]
result~47_OUT0
result~15_OUT0
MUX
Mux13
SEL[3..0]
1' h0 --

DATA[15..0]
result~46_OUT0
result~14_OUT0
MUX
Mux12
SEL[3..0]
1' h0 --

DATA[15..0]
result~13
MUX
Mux11
SEL[3..0]
1' h0 --

DATA[15..0]
result~12
MUX
Mux10
SEL[3..0]
1' h0 --

DATA[15..0]
result~11
MUX
Mux9
SEL[3..0]
1' h0 --

DATA[15..0]
result~10
MUX
Mux8
SEL[3..0]
1' h0 --

DATA[15..0]
result~9
MUX
Mux7
SEL[3..0]
1' h0 --

DATA[15..0]
result~8
MUX
Mux6
SEL[3..0]
1' h0 --

DATA[15..0]
result~7
MUX
Mux5
LessThan0
SEL[3..0]
A[31..0]
OUT
B[31..0]
OUT Mux5_OUT
DATA[15..0]
LESS_THAN
result~1
MUX
Mux4
SEL[3..0]
1' h0 --

DATA[15..0]
result~2
MUX
Mux3
SEL[3..0]
1' h0 --

DATA[15..0]
result~3
MUX
Mux2
SEL[3..0]
1' h0 --

DATA[15..0]
result~4
MUX
Mux1
aluctrl[3..0] SEL[3..0]
1' h0 --
OUT Mux1_OUT
DATA[15..0]
Add0_OUT
result~5
MUX
result~34_OUT0
Mux0
SEL[3..0]
1' h0 --

DATA[15..0]
result~6
MUX
Equal0
A[31..0]
OUT zero
B[31..0]
EQUAL
ALU Control - Final Implementation
Decoder0
opcode[5..0] IN[5..0] OUT[63..0]
DECODER
Selector1
WideOr0
SEL[3..0] ALUIn[0]$latch
node[3..1] PRE
OUT D Q
3' h0 -- ENA
CLR
2' h1 -- DATA[3..0]
BUF (DC)
0
0
1 1
0 1
ALUIn[0]~1 SELECTOR
ALUIn[0]~0
Selector4
PRE
OUT D Q
ENA
CLR
0 2' h2 -- DATA[3..0]
0
0 1
0 1
ALUIn[1]~9
ALUIn[1]~12
Selector5 SELECTOR
Selector3
SEL[3..0]
OUT
PRE
OUT D Q 1' h0 --
3' h3 --
DATA[3..0] ENA
2' h1 --
CLR ALUIn[3..0]
DATA[3..0]
WideOr5
SELECTOR
SELECTOR
WideOr6
Equal4
funct[5..0] A[5..0]
OUT 0
6' h22 -- B[5..0] 0
1 1
0 1
ALUIn[0]~3
EQUAL ALUIn[0]~2
WideOr4 Selector0
Equal3
A[5..0]
OUT
6' h01 -- B[5..0] SEL[3..0]
OUT
EQUAL
ALUIn[3]~13 3' h3 --
DATA[3..0]
Equal2
SELECTOR
A[5..0]
OUT
6' h20 -- B[5..0] Selector6
EQUAL
WideOr3
Equal1 SEL[3..0]
OUT
A[5..0] 3' h3 --
OUT DATA[3..0]
6' h25 -- B[5..0]
0
EQUAL 1 1 SELECTOR
Equal0 ALUIn[1]~6
A[5..0]
OUT WideOr2
6' h24 -- B[5..0]
EQUAL
Equal5
ALUIn~14
A[5..0]
OUT
6' h2A -- B[5..0]
EQUAL
0
0
0 1
0 1
ALUIn[2]~8
ALUIn[2]~11
forceadd
6. Testing Methodology
The general methodology to test the system directly stems from our design methodology.
In the design methodology we broke important system functionalities in separate modules
so that we could individually debug and assign responsibility. This way each module can
be tested individually eliminating possible interference from other modules. Once each
module has been individually tested and is working, the system can be implemented
using each of the smaller modules. At this point it is just a matter of working out any
system integration issues or finding any bugs that were missed in the first stage. Once
the system was completely integrated, we decided that the best way to test the system as a
whole was to write a program which would demonstrate the working functionality of the
entire system. Finally, after writing our test program, we found that we were able to
implement a working datapath that calculates the nth digit of the Fibonacci sequence
correctly.
7. Conclusion
Our Computer Engineering 305 project came from an accumulation of material from
Cpre305 and previous courses. The knowledge we needed to complete this project
included an understanding of multicycle CPUs, datapaths, control units, finite state
machines, digital logic, and Verilog. With our knowledge, we were able to build
individual logic modules and integrate those modules to create our multicycle processor.
The processor was capable of supporting fifteen MIPS instructions. In the process of
building the CPU, we added logic to the design presented in the textbook by Patterson
and Hennessy to fully support our multicycle design.
8. Lessons Learned
• Save often, ModelSim has a bad habit of crashing in the lab. The more you save, the less
amount of work will be lost after a program or computer crashes.
• Make backups, if all else fails, you have a backup.
• Use comments, when working with others, comments allow others to understand your
code. The less comments provided, the harder it may be for someone to understand your
code in the future.
• Create block schematics, block schematics help to understand the big picture. If the
block diagram created from the Verilog code, does not look correct, then the block
diagram can bring understanding to the high level design as well as help overcome
mistakes in code.
Appendix A – Verilog Code & Testbench
//MultiCycle is our multicycle cpu

module MultiCycle(cycle, pc, clock, alu_out, mem_out, regdst,
memread, memwrite, regwrite, memtoreg, zero, pcwritecond,
pcwrite,iord,irwrite, pcsource,aluscra,alusrcb);
// input/output
input clock;
output[31:0] cycle,alu_out, mem_out, pc;
output regdst, memread, memwrite, regwrite, memtoreg;
output zero;
output pcwritecond, pcwrite,iord,irwrite;
output aluscra;
output [1:0] pcsource;
output [2:0] alusrcb;
// for debug
reg[31:0] cycle=0;
always @ (posedge clock)

begin
cycle = cycle + 1;
end
// control variables
wire regdst, memread, memwrite, regwrite, memtoreg;
wire pcwritecond, pcwrite,iord,irwrite, aluscra, zero;
wire [1:0] pcsource;
wire [2:0] alusrcb;
wire [31:0] jumpaddress,alu_out, mem_out;
wire [31:0] branchCondition;
wire[3:0] aluCtrl;
// other variables
reg [31:0] pc = 32'b0;
reg [31:0] ALUOut;
reg [31:0] register_A, register_B;
wire [31:0] memAddress;

// Decode control signals
wire [5:0]opCode;
wire [4:0] regToWrite;
//Instruction decode variables

wire[4:0] rs,rt,rd;
wire [15:0] immediatevalue;
wire [4:0] shamt;
wire [5:0] funct;
wire [25:0] address;
reg [31:0] memDataReg;

wire [31:0] regA,regB;
wire [31:0] regWriteData;
wire [31:0] imm_value;
wire [31:0] valueA, valueB;
wire forceadd;
assign memAddress = iord? ALUOut:pc;
//Data Memory module holds both data and instructions

DataMemory data(memwrite,memread,memAddress[5:0], register_B,mem_out);
//Instruction decode decodes instructions and puts values into appropiate wires
InstructionDecode IDStage(clock,
mem_out,opCode,rs,rt,rd,shamt,funct,immediatevalue,address);
//Microcode Control FSM control control of multicycle cpu

MulticycleControlFSM
mainControl(opCode,clock,aluscra,iord,alusrcb,pcsource,regdst,memtoreg,
memread,pcwritecond, pcwrite, memwrite, irwrite, regwrite,forceadd);
//MemoryDataRegister holds data from memory that may be written into register
always@(posedge clock)
memDataReg = mem_out;
//Chooses appropiate write register depending on the control

twomux5 writereg(regdst, rd, rt,regToWrite);
//Chooses appropiate data to write depending on the control
twomux32 writedata(memtoreg,memDataReg,ALUOut,regWriteData);
regFileRTL RTL(clock,regwrite,regWriteData,regToWrite,rs,rt,regA,regB);
//Registers hold value until positive edge of clock, when they are updated
begin
register_A = regA;
register_B=regB;
end
//sign extend the immediate value

sign_extend extendImmediate(clock,immediatevalue,imm_value);
//xero extend the immediat value
wire [31:0] zeroextendvalue;
zero_extend extender(clock,immediatevalue, zeroextendvalue);
twomux32 registerAmux(aluscra,register_A,pc,valueA);
FiveToOneMux32 registerBmux(alusrcb,zeroextendvalue,imm_value<<2,
imm_value,4,register_B,valueB);
// ALU control control operation of alu

ALUControlMulti alucontrol(funct,opCode,forceadd,aluCtrl);
//Main ALU
ALUMulticycle ALU(aluCtrl,valueA,valueB,alu_out,zero);
//temp ALU out register holds value from alu until updated on posedge clock
begin
ALUOut= alu_out;
end
JumpAddress jumpTo(pc,address,jumpaddress);
//Mux chooses next data to pc depending on control
ThreeToOneMux32
branchesAndJumps(pcsource,jumpaddress,ALUOut,alu_out,branchCondition);
wire brachwritecond, gotoNextPc;
assign brachwritecond = pcwritecond & zero;
assign gotoNextPc =pcwrite | brachwritecond;
//PC update
always @ (posedge clock)
begin
if(gotoNextPc)
pc=branchCondition;
end
endmodule// END: MultiCycle
//The Testbench for our multicycle cpu

module AMultiCycleTest;
reg clock;
wire[31:0] cycle,alu_out, mem_out, pc;
wire regdst, memread, memwrite, regwrite, memtoreg;
wire zero;
wire pcwritecond, pcwrite,iord,aluscra,irwrite;
wire [1:0] pcsource;
wire [2:0] alusrcb;
initial
begin
clock =1'b0;
end
always
begin
#15 clock = ~clock;
end
MultiCycle testcpu(cycle, pc, clock, alu_out, mem_out, regdst,

memread, memwrite, regwrite, memtoreg, zero, pcwritecond,
pcwrite,iord,irwrite, pcsource,aluscra,alusrcb);
endmodule // END: AMultiCycleTest
//Control for multicycle cpu

module
MulticycleControlFSM(opcode,clk,ALUSrcA,IorD,ALUSrcB,PCSource,RegDst,MemtoReg,
MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite,forceAdd);
input [5:0]opcode;
input clk;
output ALUSrcA,IorD,RegDst,MemtoReg;
output [1:0]ALUSrcB;
output [1:0]PCSource;
output MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite,forceAdd;
reg MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite;
reg ALUSrcA,IorD,RegDst,MemtoReg, forceAdd;
reg [2:0]ALUSrcB;
reg [1:0]PCSource;
reg [3:0] current_state, next_state;
reg [3:0] debug;
parameter A=4'b0000, B=4'b0001, C=4'b0010, D=4'b0011, E=4'b0100, F=4'b0101, G=4'b0110,
H=4'b0111, I=4'b1000, J=4'b1001, K=4'b1010, L=4'b1011, M=4'b1100;
//parameter A=0, B=1, C=2, D=3, E=4, F=5, G=6, H=7, I=8, J=9; K=10;L=11,M=12;
//forceAdd 1=add (only in states A, B)

initial begin
current_state=4'b0000;
next_state=4'b0000;
end
always@(posedge clk)
begin
current_state=next_state;
end
always@(posedge clk or opcode)
begin
case(current_state)
A:begin
debug = 4'b0000; //added recently

MemRead = 1;
ALUSrcA=0;
IorD=1'b0;
IRWrite = 1;
ALUSrcB=3'b001;
PCWrite = 1;
PCSource=2'b00;
next_state=B;
RegDst=0;
MemtoReg=0;
PCWriteCondition=0;
MemWrite=0;
RegWrite=0;
forceAdd=1;
end
B:begin
debug = 4'b0001;
ALUSrcA=0;
ALUSrcB=3'b011;
IorD=0;
PCSource=0;
RegDst=0;
MemtoReg=0;
MemRead=0;
PCWriteCondition=0;
PCWrite=0;
MemWrite=0;
IRWrite=0;
RegWrite=0;
forceAdd=1;
//if lw or sw nextstate = C
//if(opcode==35 || opcode==43)
if(opcode==6'b100011 || opcode==6'b101011)
begin
next_state=C;
end
//if r type nextstate = G

//if(opcode==0)
else if(opcode==6'b000000)
begin
next_state=G;
end
//if beq nextstate = I

//if(opcode==4)
begin
next_state=I;
end
//if j nextstate = j
//if(opcode==2)
begin
next_state=J;
end
//IType instrcution, treate as R-Type
//because ALU control will take care of proper execution
else if(opcode== 6'b001000 ||//addI

opcode==6'b001010//slt
)
begin
next_state=K;//sign extended immediate state
end
else if(opcode== 6'b001101||//orI
opcode== 6'b001100||//andI
opcode== 6'b001111//xorI
)
begin
next_state=M;//zero extended immediate state
end
else
debug = 4'b1111;
end
C:begin
debug = 4'b0010;
ALUSrcA=1;
ALUSrcB=3'b010;
IorD=0;
PCSource=0;
RegDst=0;
MemtoReg=0;
MemRead=0;
PCWriteCondition=0;
PCWrite=0;
MemWrite=0;
IRWrite=0;
RegWrite=0;
forceAdd=0;
//if lw nextstate = D or sw nextstate = F

//if(opcode==35 || opcode==43)
if(opcode==6'b100011)
begin
next_state=D;
end
next_state=F;
else
debug = 4'b1111;
end
D:begin
debug = 4'b0011;
MemRead = 1;
IorD=1;
ALUSrcA=0;
ALUSrcB=0;
PCSource=0;
RegDst=0;
MemtoReg=0;
PCWriteCondition=0;
PCWrite=0;
MemWrite=0;
IRWrite=0;
RegWrite=0;
next_state=E;
forceAdd=0;
end
E:begin
debug = 4'b0100;
RegDst=1'b0;
RegWrite = 1;
MemtoReg=1'b1;
next_state=A;
ALUSrcA = 0;
IorD = 0;
ALUSrcB = 0;
PCSource = 0;
MemRead = 0;
PCWriteCondition = 0;
PCWrite = 0;
MemWrite = 0;
IRWrite = 0;
forceAdd=0;
end
F:begin
debug = 4'b0101;
MemWrite = 1'b1;
IorD=1'b1;
next_state=A;
ALUSrcA = 0;
ALUSrcB = 0;
PCSource = 0;
RegDst = 0;
MemtoReg = 0;
MemRead = 0;
PCWrite = 0;
IRWrite = 0;
RegWrite = 0;
forceAdd=0;
end
G:begin
debug = 4'b0110;
ALUSrcA=1;
ALUSrcB=3'b000;
next_state=H;
IorD = 0;
PCSource = 0;
RegDst = 0;
MemtoReg = 0;
MemRead = 0;
PCWrite = 0;
MemWrite = 0;
IRWrite = 0;
RegWrite = 0;
forceAdd=0;
end
H:begin
//For RType or IType, if not RType, it is IType
//if IType regDst = 0
debug = 4'b0111;
RegDst=1'b1;
RegWrite = 1;
MemtoReg=1'b0;
next_state=A;
ALUSrcA = 0;
IorD = 0;
ALUSrcB = 0;
PCSource = 0;
MemRead = 0;
PCWrite = 0;
MemWrite = 0;
IRWrite = 0;
forceAdd=0;
end
I:begin
debug = 4'b1000;
ALUSrcA=1;
ALUSrcB=3'b000;
PCSource=2'b01;
next_state=A;
IorD = 0;
RegDst = 0;
MemtoReg = 0;
MemRead = 0;
PCWrite = 0;
MemWrite = 0;
IRWrite = 0;
RegWrite = 0;
forceAdd=0;
end
J:begin
debug = 4'b1001;
PCWrite = 1;
PCSource=2'b10;
next_state=A;
ALUSrcA = 0;
IorD = 0;
ALUSrcB = 0;
RegDst = 0;
MemtoReg = 0;
MemRead = 0;
MemWrite = 0;
IRWrite = 0;
RegWrite = 0;
forceAdd=0;
end
K:begin
debug = 4'b1010;
ALUSrcA = 1;
ALUSrcB = 3'b010;
MemtoReg = 0;
IorD = 0;
RegDst = 0;
MemRead = 0;
PCWrite=0;
MemWrite = 0;
IRWrite = 0;
RegWrite = 0;
PCSource=2'b00;
next_state=L;
forceAdd=0;
end
L:
begin
debug = 4'b1011;
RegDst = 0;
RegWrite = 1;
MemtoReg = 0;
IorD = 0;
MemRead = 0;
PCWrite=0;
MemWrite = 0;
IRWrite = 0;
ALUSrcA = 0;
ALUSrcB = 3'b000;
PCSource=2'b00;
next_state=A;
forceAdd=0;
end
M:
begin
debug = 4'b1100;
ALUSrcA = 1;
ALUSrcB = 3'b100;
MemtoReg = 0;
IorD = 0;
RegDst = 0;
MemRead = 0;
PCWrite=0;
MemWrite = 0;
IRWrite = 0;
RegWrite = 0;
PCSource=2'b00;
next_state=L;
forceAdd=0;
end
endcase
end
endmodule//END: MulticycleControlFSM
//Testbench for control

module testbenchMulticycleControlFSM;
reg [5:0]op;
reg clock=0;
wire ALUSrcA,IorD,RegDst,MemtoReg;
wire [1:0]ALUSrcB;
wire [1:0]PCSource;
always
begin
#2 clock=~clock;
end
initial begin
op=6'b000000;//add 1
#10 op=6'b001000;//addi 9
#10 op=6'b000000;//Sub 2
#10 op=6'b000100;//branch 10
#10 op=6'b000000;//And 3
#10 op=6'b000010;//j 15
#10 op=6'b000000;//Or 4
#10op=6'b100011;//LW 16
#10 op=6'b000000;//Xor 5
#30 op=6'b101011;//SW 17
#10 op=6'b000000;//Slt 6
#10 op=6'b001101;//OrI 11
#10 op=6'b000000;//Mult 7
#10 op=6'b001100;//AndI 12
#10 op=6'b000000;//Div 8
#10 op=6'b001111;//XorI 13
#10 op=6'b001010;//SltI 14
end
MulticycleControlFSM test(op,clock,ALUSrcA,IorD,ALUSrcB,PCSource,RegDst,MemtoReg,
MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite);
endmodule// END: testbenchMulticycleControlFSM
//Total Lines: 186

module ALUMulticycle(aluctrl, valueA, valueB,result,zero);
input [3:0] aluctrl;
input [31:0] valueA;
input [31:0] valueB;
output [31:0] result;
reg [31:0] result;
output zero;
reg zero;
always@(aluctrl or valueA or valueB)

begin
case(aluctrl)
4'b0000://Bitwise And
begin
result = valueA & valueB;
end
4'b0001://Bitwise Or
begin
result = valueA | valueB;
end
4'b0010://Add
begin
result = valueA + valueB;
end
4'b0101://Xor
begin
result = valueA ^ valueB;
end
4'b0110://Sub
begin
result = valueA - valueB;
end
4'b0111://Slt
begin
result = valueA < valueB ? 1:0;
end
endcase
if(valueA==valueB)
begin
zero=1'b1;
end
else
begin
zero=1'b0;
end
end
endmodule
module testALUMultiCycle;
reg [3:0] aluctrl;
reg [31:0] valueA;
reg [31:0] valueB;
wire [31:0] result;
wire zero;
initial
begin
//AND
aluctrl = 4'b0000;
valueA = 0;
valueB = 4294967295;
$monitor("AND -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
#5
aluctrl = 4'b0000;
valueA = 4294967295;
valueB = 4294967295;
$monitor("AND -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
//OR
#5
aluctrl = 4'b0001;
valueA = 4294967295;
valueB = 4294967295;
$monitor("OR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
#5
aluctrl = 4'b0001;
valueA = 0;
valueB = 0;
$monitor("OR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
//Add
#5
aluctrl = 4'b0010;
valueA = 5;
valueB = 5;
$monitor("ADD -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
#5
aluctrl = 4'b0010;
valueA = 0;
valueB = 4294967295;
$monitor("ADD -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
//XOR
#5
aluctrl = 4'b0101;
valueA = 0;
valueB = 1;
$monitor("XOR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
#5
aluctrl = 4'b0101;
valueA = 4294967295;
valueB = 0;
$monitor("XOR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
//Subtract
#5
aluctrl = 4'b0110;
valueA = 5;
valueB = 4;
$monitor("SUBTRACT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b
(%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
#5
aluctrl = 4'b0110;
valueA = 4294967295;
valueB = 0;
$monitor("SUBTRACT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b
(%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
//SLT
#5
aluctrl = 4'b0111;
valueA = 5;
valueB = 4;
$monitor("SLT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
#5
aluctrl = 4'b0111;
valueA = 0;
valueB = 4294967295;
$monitor("SLT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) |
end
ALUMulticycle test(aluctrl, valueA, valueB,result,zero);
endmodule
//ALU control
module ALUControlMulti(funct, opcode,forceadd, ALUIn);
input [5:0]funct;
input [5:0] opcode;
input forceadd;
output [3:0]ALUIn;
reg [3:0]ALUIn;
always@(funct or opcode or forceadd)

begin
if(forceadd==1)
begin
ALUIn = 4'b0010;
end
else
begin
//begin case
case(opcode)
//R-Type
6'b000000:
begin
//And
if(funct==6'b100100)
begin
ALUIn = 4'b0000;
end
//Or
else if(funct==6'b100101)
begin
ALUIn = 4'b0001;
end
//Add
begin
ALUIn = 4'b0010;
end
//Xor
else if(funct == 6'b000001)
begin
ALUIn = 4'b0101;
end
//Sub
begin
ALUIn = 4'b0110;
end
//Slt
begin
ALUIn = 4'b0111;
end
end//end R-type
//Begin I-Type
//AndI
6'b001100://C
begin
ALUIn = 4'b0000;
end
//OrI
6'b001101://D
begin
ALUIn = 4'b0001;
end
//XorI
6'b001111://F
begin
ALUIn = 4'b0101;
end
//SltI
6'b001010://A
begin
ALUIn = 4'b0111;
end
//AddI
6'b001000://8
begin
ALUIn = 4'b0010;
end
//Branch
6'b000100://4
begin
ALUIn = 4'b0010;
end
//LW
6'b100011:
begin
ALUIn = 4'b0010;
end
//SW
6'b101011:
begin
ALUIn = 4'b0010;
end
//End I-Type
//jump
6'b000010:
begin
ALUIn=4'b0010;
end
endcase //endcase
end//end else
end//end always
endmodule
//ALU control testbench

module testALUControlMulti;
reg clock;
reg [5:0]funct;
reg [5:0]op;
wire [3:0]ALUIn;
initial
begin
$monitor(" Time=%d,\top=%d,\t funct=%d,\t ALUIn=%d", $time, op,funct, ALUIn);
end
initial
begin
op=6'b000000;funct=6'b100000;//add 1
#20 op=6'b000000;funct=6'b100010;//Sub 2
#20 op=6'b000000;funct=6'b100100;//And 3
#20 op=6'b000000;funct=6'b100101;//Or 4
#20 op=6'b000000;funct=6'b000001;//Xor 5
#20 op=6'b000000;funct=6'b101010;//Slt 6
#20 op=6'b000000;funct=6'b011000;//Mult 7
#20 op=6'b000000;funct=6'b011010;//Div 8
#20 op=6'b001000;funct=6'b010100;//addi 9
#20 op=6'b000100;funct=6'b000110;//branch (I) 10 ///???
#20 op=6'b001101;funct=6'bx;//OrI 11
#20 op=6'b001100;funct=6'bx;//AndI 12
#20 op=6'b001111;funct=6'bx;//XorI 13
#20 op=6'b001010;funct=6'bx;//SltI 14
#20 op=6'b000010;funct=6'bx; //j 15
#20 op=6'b100011; funct=6'bx;//LW 16
#20 op=6'b101011; funct=6'bx;//SW 17
#20 $stop;
end
ALUControlMulti aluctrltest(funct, op, ALUIn);
endmodule
//Data Memory module

module DataMemory( memWrite,memRead,Address, writeData,readData);
input memWrite, memRead;
input [5:0] Address;
input [31:0] writeData;
output [31:0] readData;
reg [31:0] readData;
reg [31:0]dataMemory[1024:0];
initial
begin
dataMemory[0] = 32'b00100000000101010000000000010100;//N=20
dataMemory[4] = 32'b00000000000000001011100000100000;
dataMemory[8] = 32'b00010010101000000000000000000110;
dataMemory[12] = 32'b00100000000101100000000000000001;
dataMemory[16] = 32'b00000010111101101011100000100000;
dataMemory[20] = 32'b00000010111101101011000000100010;
dataMemory[24] = 32'b00100010101101011111111111111111;
dataMemory[28] = 32'b00010010101000000000000000000001;
dataMemory[32] = 32'b00010000000000001111111111111011;
dataMemory[36] = 32'b10101100000101110000000000000001;
end
always@(memWrite or memRead or Address)
begin
if(memWrite == 1'b1)
begin
dataMemory[Address]=writeData;
end
if(memRead ==1'b1)
begin
readData=dataMemory[Address];
end
end
endmodule//END: DataMemory
//Data Memory testbench

module ADataMemTest;
reg memWrite,memRead;
reg [5:0] Address;
reg [31:0] writeData;
wire [31:0] readData;
initial
begin
$monitor(" memWrite=%d, memRead=%d, Address=%d, writeData=%d,readData=%d ",
$time,memWrite,memRead,Address, writeData,readData);
end
initial
begin
memRead=1;
#20 Address=4;
#20 memRead=0;
#20 Address=1;
#20 memWrite=1;
#20 writeData=32'b1;
#20 $stop;
end
DataMemory testMem(memWrite,memRead,Address, writeData,readData);

endmodule
//Register File
module regFileRTL(clock,regWrite,inData,wrReg,readA, readB,regA,regB);
input clock;
input regWrite;
input [31:0] inData;
input [4:0] wrReg;
input [4:0] readA;
input [4:0] readB;
output [31:0] regA;
output [31:0] regB;
reg [31:0] registerFiles[31:0];
initial
begin
registerFiles[5'b00000] = 32'b0;
end
begin
if(regWrite && ( wrReg != 5'b00000))

begin
registerFiles[wrReg] = inData;
end
end
assign regA = registerFiles[readA];

assign regB = registerFiles[readB];
endmodule//END: regFileRTL
//InstructionDecode decode the instruction

module InstructionDecode(clock,instruction, opcode, rs,rt,rd,shamt,funct, immediate, address);
input clock;
input [31:0] instruction;
output [5:0] opcode, funct;
output [4:0] rs, rt,rd,shamt;
output [15:0] immediate;
output [25:0] address;
reg [5:0] opcode, funct;

reg [4:0] rs, rt,rd,shamt;
reg [15:0] immediate;
reg [25:0] address;
begin
assign opcode = instruction[31:26];
assign rs = instruction[25:21];
assign rt = instruction[20:16];
assign rd = instruction[15:11];
assign shamt = instruction[10:6];
assign funct = instruction[5:0];
assign immediate = instruction[15:0];
assign address = instruction[25:0];
end
endmodule// END: InstructionDecode
//Instruction decode testbench

module AInstrTest;
reg clock;
reg [31:0] instr;
wire [5:0] opcode, funct;
wire [4:0] rs, rt,rd,shamt;
wire [15:0] immediate;
wire [25:0] address;
initial
begin
$monitor("Time=%d,
instOp=%d,%d,instRs=%d,%d,instRt=%d,%d,instRd=%d,%d,instShT=%d,%d,instFt=%d,%d,in
stImm=%d,%d,instAdd=%d;%d",
$time,instr[31:26], opcode,instr[25:21],
rs,instr[21:16],rt,instr[15:11],rd,instr[10:6],shamt,instr[5:0],funct,instr[15:0],
immediate,instr[25:0], address);
clock=0;
end
always
#2 clock= ~clock;
initial
begin
instr = 32'b00100000000101010000000000010001;
#20 instr = 32'b00000000000000001011100000100000;
#20 instr = 32'b00010010101000000000000000000110;
#20 instr = 32'b00100000000101100000000000000001;
#20 instr = 32'b00000010111101101011100000100000;
#20 instr = 32'b00000010111101101011000000100010;
#20 instr = 32'b00100010101101011111111111111111;
#20 instr = 32'b00010010101000000000000000000001;
#20 instr = 32'b00010000000000001111111111111011;
#20 instr = 32'b10101100000101110000000000000001;
#20 $stop;
end
InstructionDecode testdecode(clock,instr, opcode, rs,rt,rd,shamt,funct, immediate, address);
endmodule//END:AInstrTest
//Sign extension module

module sign_extend(clock,value,signvalue);
input clock;
input [15:0] value;
output [31:0] signvalue;
reg [31:0] signvalue;
begin
signvalue[31:16] = 16'b0000000000000000;
if(value[15] ==1'b1)
begin
signvalue[31:16] = 16'b1111111111111111;
end
signvalue[15:0] = value;
end
endmodule//END: sign_extend
//Sign Extend test bench

module testSignExtend;
reg clock;
reg [15:0] value;
wire [31:0] newvalue;
initial
begin
$monitor(" Time=%d, value=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b,
signvalue=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%
b%b%b%b%b%b%b",
$time,
value[15],value[14],value[13],value[12],value[11],value[10],value[9],
value[8],value[7],value[6],value[5],value[4],value[3],value[2],value[1], value[0],
newvalue[31],newvalue[30],newvalue[29],newvalue[28],newvalue[27],
newvalue[1],newvalue[0]);
clock=0;
end
always
#2 clock= ~clock;
initial
begin
value = 0;#20 value = 1;#20 value = 2;#20 value = 3;#20 value = 4;
#20 value = 5;#20 value = 20;#20 value = 40;#20 value = 500;#20 value = 10000;
#20 value = 16'b1000000000000000;#20 value = 16'b1000000000000001;

#20 value = 16'b0111111111111111;#20 value = 16'b1010101010101010;
#20 value = 16'b1111111111111111;#20 value = 16'b1111111111111110;
#20 $stop;
end
sign_extend testsign(clock,value,newvalue);
endmodule
//Zero extension module

module zero_extend(clock,value, zerovalue);
input clock;
input [15:0] value;
output [31:0] zerovalue;
reg [31:0] zerovalue;
begin
zerovalue[31:16] = 16'b0000000000000000;
zerovalue[15:0] = value;
end
endmodule
//Zero extension testbench

module testZeroExtend;
reg [15:0] value;
reg clock;
wire [31:0] newvalue;
initial
begin
$monitor(" Time=%d, value=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b,
zerovalue=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%
b%b%b%b%b%b%b",
$time,
value[15],value[14],value[13],value[12],value[11],value[10],value[9],
value[8],value[7],value[6],value[5],value[4],value[3],value[2],value[1], value[0],
newvalue[1],newvalue[0]);
clock=0;
end
always
#2 clock= ~clock;
initial
begin
value = 0;#20 value = 1;#20 value = 2;#20 value = 3;#20 value = 4;
#20 value = 5;#20 value = 20;#20 value = 40;#20 value = 500;#20 value = 10000;
#20 value = 16'b1000000000000000;#20 value = 16'b1000000000000001;

#20 value = 16'b0111111111111111;#20 value = 16'b1010101010101010;
#20 value = 16'b1111111111111111;#20 value = 16'b1111111111111110;
#20 $stop;
end
zero_extend testzero(clock,value,newvalue);
endmodule
//JumpAddress
module JumpAddress(pc,address,newAddress);
input [31:0] pc;
input [25:0] address;
output [31:0] newAddress;
reg [31:0] newAddress;
always@(pc or address)
begin
newAddress[31:28] = pc[31:28];
newAddress[27:0] = (address <<2);
end
endmodule//END: JumpAddress
//JumpAddress testbench
module AJumpAddressTest;
reg [31:0] pc;
reg [25:0] addr;
wire [31:0] newAddr;
integer x,y;
initial
begin
$monitor(" Time=%d, pc=%d, addr=%d, newAddr=%d", $time,pc,addr,newAddr);
end
initial
begin
x=0;
y=0;
addr = 32'b0;
pc = 32'b00010000000000000000000000000000;
for(x = 0; x < 32; x=x+1)
begin
#10 addr=x;
end
pc = 32'b00110000000000000000000000000000;
addr= 32'b11110000000000000000000000000000;
for(y = 0; y < 32; y=y+1)
begin
#10 addr=y;
end
#20 $stop;
end
JumpAddress jumptest(pc,addr,newAddr);
endmodule
//Twomux5 has a datapath of 5 bits wide and a choice of two elements

module twomux5(a,x1,x0,x);
input a;
input [4:0] x1,x0;
output [4:0]x;
reg [4:0]x;
always@(a or x1 or x0)
begin
if(a == 1'b1)
begin
x = x1;
end
else if(a==1'b0)
begin
x = x0;
end
end
endmodule//END: twomux5
//Twomux32 has a datapath of 32 bits wide and a choice of two elements

module twomux32(a,x1,x0,x);
input a;
input [31:0] x1,x0;
output [31:0]x;
reg [31:0]x;
always@(a or x1 or x0)
begin
if(a == 1'b1)
begin
x = x1;
end
else if(a == 1'b0)
begin
x=x0;
end
end
endmodule//END: twomux32
//ThreeToOneMux has a datapath of 32 bits wide and a choice of three elements

module ThreeToOneMux32(select,x2,x1,x0,out);
input [1:0] select;
input [31:0] x2,x1,x0;
output [31:0] out;
reg [31:0] out;
always@(select or x0 or x0 or x2)
begin
if(select == 2'b00)
begin
out = x0;
end
if(select == 2'b01)
begin
out = x1;
end
if(select == 2'b10)
begin
out = x2;
end
end
endmodule// END:ThreeToOneMux32
//FiveToOneMux 32 has a datapath of 32 bits and a choice of three elements
module FiveToOneMux32(select,x4,x3,x2,x1,x0,out);
input [2:0] select;
input [31:0] x4,x3,x2,x1,x0;
output [31:0] out;
reg [31:0] out;
always@(select or x0 or x1 or x2 or x3 or x4)
begin
if(select == 3'b000)
begin
out = x0;
end
begin
out = x1;
end
begin
out = x2;
end
begin
out = x3;
end
begin
out = x4;
end
end
endmodule
Appendix B – Simulation Results
The following simulation results are of a program we wrote which calculates the nth digit of the
Fibonacci sequence. In this simulation the nth digit to calculate was set as “20”. After running
the simulation we calculated that 20th digit of the Fibonacci sequence was “6765”, which is
indeed correct.
The assembly code to our program is listed below:
addi $21,$0,20
add $23,$0,$0
beq $21,$0,end
addi $22, $0,1
loop:
add $23,$23,$22
sub $22,$23,$22
addi $21,$21,-1
beq $21,$0,end
beq $0,$0, loop
end:
sw $23,1($0)
To double check out binary math, we compiled our assemble code in the MIPS simulator SPIM.
[0x00400000] 0x20150002 addi $21, $0, 20 ; 1: addi $21,$0,2

[0x00400004] 0x0000b820 add $23, $0, $0 ; 2: add $23,$0,$0
[0x00400008] 0x12a00007 beq $21, $0, 28 [end-0x00400008]; 3: beq $21,$0,end
[0x0040000c] 0x20160001 addi $22, $0, 1 ; 4: addi $22, $0,1
[0x00400010] 0x02f6b820 add $23, $23, $22 ; 6: add $23,$23,$22
[0x00400014] 0x02f6b022 sub $22, $23, $22 ; 7: sub $22,$23,$22
[0x00400018] 0x22b5ffff addi $21, $21, -1 ; 8: addi $21,$21,-1
[0x0040001c] 0x12a00002 beq $21, $0, 8 [end-0x0040001c] ; 9: beq $21,$0,end
[0x00400020] 0x1000fffc beq $0, $0, -16 [loop-0x00400020] ; 10: beq $0,$0, loop
[0x00400024] 0xac170001 sw $23, 1($0) ; 12: sw $23,1($0)
In binary representation our program code is as follows:
dataMemory[0] = 32'b00100000000101010000000000010100;
dataMemory[4] = 32'b00000000000000001011100000100000;
dataMemory[8] = 32'b00010010101000000000000000000110;
dataMemory[12] = 32'b00100000000101100000000000000001;
dataMemory[16] = 32'b00000010111101101011100000100000;
dataMemory[20] = 32'b00000010111101101011000000100010;
dataMemory[24] = 32'b00100010101101011111111111111111;
dataMemory[28] = 32'b00010010101000000000000000000001;
dataMemory[32] = 32'b00010000000000001111111111111011;
dataMemory[36] = 32'b10101100000101110000000000000001;
On the following pages are the results of the simulation running the program described above.
/AMultiCycleTest/testcpu/clock
/AMultiCycleTest/testcpu/cycle
/AMultiCycleTest/testcpu/alu_out
/AMultiCycleTest/testcpu/mem_out
/AMultiCycleTest/testcpu/pc 00000000000000000000000000101100
/AMultiCycleTest/testcpu/regdst
/AMultiCycleTest/testcpu/memread
/AMultiCycleTest/testcpu/memwrite
/AMultiCycleTest/testcpu/regwrite
/AMultiCycleTest/testcpu/memtoreg
/AMultiCycleTest/testcpu/zero
/AMultiCycleTest/testcpu/pcwritecond
/AMultiCycleTest/testcpu/pcwrite
/AMultiCycleTest/testcpu/iord
/AMultiCycleTest/testcpu/irwrite
/AMultiCycleTest/testcpu/aluscra
/AMultiCycleTest/testcpu/pcsource 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
/AMultiCycleTest/testcpu/alusrcb 011
/AMultiCycleTest/testcpu/jumpaddress 0000xxxxxxxxxxxxxxxxxxxxxxxxxx00
/AMultiCycleTest/testcpu/branchCondition
/AMultiCycleTest/testcpu/aluCtrl 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010
/AMultiCycleTest/testcpu/ALUOut
/AMultiCycleTest/testcpu/register_A
/AMultiCycleTest/testcpu/register_B
/AMultiCycleTest/testcpu/memAddress 00000000000000000000000000101100
/AMultiCycleTest/testcpu/opCode 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
/AMultiCycleTest/testcpu/regToWrite 10110 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000
/AMultiCycleTest/testcpu/rs 00000 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101 10111 10101
/AMultiCycleTest/testcpu/rt 00000
10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110 00000 10110
/AMultiCycleTest/testcpu/rd 00000 00000
/AMultiCycleTest/testcpu/immediatevalue
/AMultiCycleTest/testcpu/shamt 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000
/AMultiCycleTest/testcpu/funct 000001
/AMultiCycleTest/testcpu/address
/AMultiCycleTest/testcpu/memDataReg
/AMultiCycleTest/testcpu/regA
/AMultiCycleTest/testcpu/regB
/AMultiCycleTest/testcpu/regWriteData
/AMultiCycleTest/testcpu/imm_value 0000000000000000xxxxxxxxxxxxxxxx
/AMultiCycleTest/testcpu/valueA 00000000000000000000000000101100
/AMultiCycleTest/testcpu/valueB 00000000000000xxxxxxxxxxxxxxxx00
/AMultiCycleTest/testcpu/forceadd
/AMultiCycleTest/testcpu/zeroextendvalue 0000000000000000xxxxxxxxxxxxxxxx
/AMultiCycleTest/testcpu/brachwritecond
0 2 us 4 us 6 us 8 us 10 us 12 us
Entity:AMultiCycleTest Architecture: Date: Sun Dec 02 8:30:30 PM Central Standard Time 2007 Row: 1 Page: 1
/AMultiCycleTest/testcpu/gotoNextPc
/AMultiCycleTest/testcpu/RTL/registerFiles
[31]
[30]
[29]
[28]
[27]
[26]
[25]
[24]
[23] 0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
[22] 1 0 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181
[21] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
[20]
[19]
[18]
[17]
[16]
[15]
[14]
[13]
[12]
[11]
[10]
[9]
[8]
[7]
[6]
[5]
[4]
[3]
[2]
[1]
[0] 0
/AMultiCycleTest/testcpu/data/dataMemory[1] 6765
0 2 us 4 us 6 us 8 us 10 us 12 us
Entity:AMultiCycleTest Architecture: Date: Sun Dec 02 8:30:30 PM Central Standard Time 2007 Row: 1 Page: 2
Appendix C – Commonly Made Verilog Mistakes
• When a register or a wire that is spelled incorrectly is used in Verilog using ModelSim,
the compiler will not throw any error or warnings, but at the same time as expected, the
program will cease to function correctly.
• A warning is thrown but not enforced in ModelSim when a register is assigned more bits
than the register is wide. This forces the register to only act upon the bottom bits of the
assigned bits, usually to the inconvenience of the developer.
• Module names should be name exactly as the file name which holds the module.
Although this is not a strict rule of Verilog, it is a good practice because some programs
like Quartus II depend on this naming scheme for some applications.
• It is important to remember to pay close attention to the sensitivity list on an always
block. If a variable is not included in the always block that is used inside the block itself,
then the entire block may not run at all. This is a confusing issue to find when debugging
code.
• In Verilog an output must be accompanied by a register if the data is to be manipulated.
• Begin and end statements must be used properly. Not having an end statement to
accompany a begin statement will cause problems in code.
• To assign output from one module to another a wire must be used. Using a register will
cause a compilation error.
• Blocking vs. Non-Blocking assignment statements, misunderstanding the differences
between these assignment statements can cause problems in the inner workings of
Verilog code. This is also a very hard issue to debug.
Appendix D – Figures and Diagrams
Figure 1 – Control FSM Logic Diagram (Patternson,Hennessy, page 338)

Figure 2 – High Level Datapath Design (Patternson,Hennessy, page 320)
Figure 3 – High Level Datapath Design with Control Logic (Patternson,Hennessy, page 323)
Appendix E – Sources
David A. Patterson, John L. Hennessy. Computer Organization and Design, Revised

Printing 3rd Ed. New York: Elsevier, 2007.

Implementation of A Verilog Multicycle CPU-FinalDraft

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Implementation of A Verilog Multicycle CPU-FinalDraft

Încărcat de

Drepturi de autor:

Formate disponibile

Implementation of a Verilog

Joey Nirschl, Benjamin Holland

Iowa State University

Keywords: Verilog, Simulation, Multicycle Processor, CPU,

Abstract: This project was a semester term project to solidify our

gained knowledge in CPU datapath design. The project

These stages include:

Stage 1: Instruction fetch

2. Instruction Set Definition

j – Jump, unconditionally jumps to the instruction at the specified address.

Add1 OUT Mux31_OUT

result~95_OUT0 OUT Mux30_OUT

result~94_OUT0 OUT Mux29_OUT

result~93_OUT0 OUT Mux28_OUT

result~92_OUT0 OUT Mux27_OUT

result~91_OUT0 OUT Mux26_OUT

result~90_OUT0 OUT Mux25_OUT

result~89_OUT0 OUT Mux24_OUT

result~88_OUT0 OUT Mux23_OUT

result~87_OUT0 OUT Mux22_OUT

result~86_OUT0 OUT Mux21_OUT

result~85_OUT0 OUT Mux20_OUT

result~84_OUT0 OUT Mux19_OUT

result~83_OUT0 OUT Mux18_OUT

result~82_OUT0 OUT Mux17_OUT

result~81_OUT0 OUT Mux16_OUT

result~80_OUT0 OUT Mux15_OUT

result~79_OUT0 OUT Mux14_OUT

result~78_OUT0 OUT Mux13_OUT

result~77_OUT0 OUT Mux12_OUT

result~76_OUT0 OUT Mux11_OUT

result~75_OUT0 OUT Mux10_OUT

result~74_OUT0 OUT Mux9_OUT

result~73_OUT0 OUT Mux8_OUT

result~72_OUT0 OUT Mux7_OUT

result~71_OUT0 OUT Mux6_OUT

result~66_OUT0 OUT Mux4_OUT

result~67_OUT0 OUT Mux3_OUT

result~68_OUT0 OUT Mux2_OUT

result~70_OUT0 OUT Mux0_OUT

opcode[5..0] IN[5..0] OUT[63..0]

//MultiCycle is our multicycle cpu

always @ (posedge clock)

wire [31:0] memAddress;

//Instruction decode variables

reg [31:0] memDataReg;

assign memAddress = iord? ALUOut:pc;

//Data Memory module holds both data and instructions

//Microcode Control FSM control control of multicycle cpu

//Chooses appropiate write register depending on the control

//sign extend the immediate value

// ALU control control operation of alu

//The Testbench for our multicycle cpu

MultiCycle testcpu(cycle, pc, clock, alu_out, mem_out, regdst,

//Control for multicycle cpu

//forceAdd 1=add (only in states A, B)

debug = 4'b0000; //added recently

//if r type nextstate = G

//if beq nextstate = I

else if(opcode== 6'b001000 ||//addI

//if lw nextstate = D or sw nextstate = F

//Testbench for control