0 evaluări0% au considerat acest document util (0 voturi)
139 vizualizări92 pagini
This document describes a graduate project to implement a multi-cycle central processing unit (CPU) using VHDL. The project involves designing a multi-cycle MIPS processor that breaks instructions down into multiple clock cycles. This allows functional blocks like the ALU and memory to be reused across cycles, reducing hardware needs. The design is simulated and synthesized to verify its operation and resource usage. The implementation provides experience with developing processor hardware, an important task in systems-on-chip design.
This document describes a graduate project to implement a multi-cycle central processing unit (CPU) using VHDL. The project involves designing a multi-cycle MIPS processor that breaks instructions down into multiple clock cycles. This allows functional blocks like the ALU and memory to be reused across cycles, reducing hardware needs. The design is simulated and synthesized to verify its operation and resource usage. The implementation provides experience with developing processor hardware, an important task in systems-on-chip design.
This document describes a graduate project to implement a multi-cycle central processing unit (CPU) using VHDL. The project involves designing a multi-cycle MIPS processor that breaks instructions down into multiple clock cycles. This allows functional blocks like the ALU and memory to be reused across cycles, reducing hardware needs. The design is simulated and synthesized to verify its operation and resource usage. The implementation provides experience with developing processor hardware, an important task in systems-on-chip design.
A graduate project submitted in partial fulfillment of the requirement
For the degree of Master of Science
In Electrical Engineering
By
Raghunandan Srinidhi
May 2012
The graduate project of Raghunandan Srinidhi is approved:
Ali Amini, Ph.D. Date
Shahnam Mirzaei, Ph.D. Date
Ramin Roosta, Ph.D., Chair Date
California State University, Northridge
ii
Acknowledgement
I would like to dedicate this project to my family for their unconditional love and great care throughout my life. In addition, I would like to thank Prof. Ramin Roosta for his unlimited support and advisement.
iii
Table of Contents
Signatures...ii Acknowledgement iii List of figures.. vi List of Tablesvii Abstract...viii
CHAPTER 1: INTRODUCTION... 1
1.1 MIPS Multicycle Processor ..... 1
1.2 Design Environment .... 2
CHAPTER 2: MULTICYCLE PROCESSOR DESIGN..3
2.1 MIPS subset for implementation..3
2.2 Single Cycle Datapath.....6
2.3 Analyzing Performance of Single Cycle Datapath .. 7
2.4 Clock Period Comparisons.....8
2.5 Multicycle Datapath ...10
2.6 Multicycle CPU with control ...12
2.7 Operation Flow..13
CHAPTER 3: DESIGN IMPLEMENTATION.... 14
3.1 ALU .... 14
3.2 Memory .. 14
3.3 Control ..16
iv 3.3.1 Micro operations and control groups .17
3.3.2 Control signal generation stages .18
3.3.3 Controller Implementation.... 19
3.4 Datapath ..22
3.4.1 Instruction Fetch ......22
3.4.2 Instruction Decode.... 23
3.4.3 Execution 24
3.4.4 Memory Write back .. 25
3.4.5 Datapath .... 25
3.5 Datapath and control ... 26
3.6 Processor and Memory 27
CHAPTER 4: RESULTS OF SIMULATION AND SYNTHESIS .. 28
4.1 Simulation with a Program in memory .....28
4.2 Synthesis Report .. 32
CHAPTER 5: Conclusion and Future Enhancements ... 33
APPENDIX - VHDL Codes of Multi cycle CPU . 34
REFERENCES . 84
v
List of Figures
Fig 2.1: Single cycle design.... 6 Fig 2.2: Clock Period in single cycle design....... 8 Fig 2.3: Clock Period in Multicycle design..... 8 Fig 2.4: Unbalanced delay in Multicycle design..... 9 Fig 2.5: Multicycle Datapath... 10 Fig 2.6: Multicycle Datapath with Control.. 12 Fig 2.7: Operation flow for the instructions..... 13 Fig 3.1: Memory Block Diagram. 14 Fig 3.2: Internal view of the memory.. 15 Fig 3.3: Inferred 256 x 8 memory....... 15 Fig 3.4: Top Level Block diagram of Controller 16 Fig 3.5: Control signal generation stages... 18 Fig 3.6: Sequence Controller Block Diagram. 20 Fig 3.7: ALU Controller Block Diagram 20 Figure 3.8: Instruction Fetch Block 22 Figure 3.9: Instruction Decode Block. 23 Figure 3.10: Instruction Execution Block... 24 Figure 3.11: Memory Writeback Block.. 25 Figure 3.12: Data path Block.. 25 Figure 3.12: Data path and Control Unit.... 26 Figure 3.13: Processor Sub modules.......27 Figure 3.14: Completed processor...27 Figure 4.1: Results in memory after simulation.. 30 Figure 4.2: Memory before and after simulation......31 Figure 4.3: Synthesis Report....32
vi List of Tables
Table 2.1: R-type Instruction Field..3 Table 2.2: Load and Store Instruction Field4 Table 2.3: Branch Instruction Field.4 Table 2.4: Jump Instruction Field5 Table 2.5: Component delays..7 Table 3.1: PC group...17 Table 3.2: Memory group..17 Table 3.3: Register file group17 Table 3.4: ALU group... 18 Table 3.5: State transition table for Sequence Controller. 19 Table 3.6: Signals for ALU controller Implementation 21 Table 4.1: Instructions in memory before simulation.. 28 Table 4.2: Data in memory before simulation...28 Table 4.3: Expected values in memory. 29
vii
Abstract
MIPS PROCESSOR IMPLEMENTATION
By
Raghunandan Srinidhi
Master of Science in Electrical Engineering
Computers and computer systems are a pervasive part of the modern world. Aside from just the common desktop PC, there are a number of other types of specialized computer systems. The central component of these computers and computer systems is the microprocessor, or the CPU. The CPU (short for "Central Processing Unit") is essentially the brains behind the computer system.
The scope of the project was to implement the design of a Multi Cycle Central Processing Unit(CPU) in Very-High-Speed Integrated Circuits(VHSIC) Hardware Description Language or commonly known as VHDL. The implementation was carried out to understand the development of processor hardware as the design and customization of embedded processors has become a mainstream task in the development of complex SoCs (Systems-on-Chip).
viii "
CHAPTER 1: INTRODUCTION
1.1 MIPS Multi cycle Processor
The Multi cycle approach breaks instructions down into multiple steps. Each step is designed to take one clock cycle. It allows each functional block to be used more than once per instruction if they are used on different clock cycles.
This implementation has several key advantages over a Single cycle implementation. First, it can share modules, allowing the use of fewer hardware components. Instead of multiple arithmetic logic units (ALUs), the Multi cycle implementation uses only one. Only one memory is used for the data and the instructions. Breaking complex instructions into steps also allows us to significantly increase the clock cycles because we no longer have to base the clock on the instruction that takes the longest to execute.
The Multi cycle implementation also uses several registers to temporarily hold the output of the previous clock cycle. These include an Instruction register, Memory data register, ALU Output register, etc.
The Multi cycle machine breaks simple instructions down into a series of steps. These steps typically are the: 1. Instruction fetch step 2. Instruction decode and Register fetch step 3. Execution, memory address computation, or branch completion step 4. Memory access or R-type instruction completion step 5. Memory read completion step
During the instruction fetch step the Multi cycle processor fetches instructions from the memory and computes the address of the next instruction, by incrementing the program counter (PC). During the second step, the Instruction decode and register fetch step, we decode the instruction to figure out what type it is: memory access, R-type, I-type, branch. The third step, the Execution, memory address computation, or branch completion step functions in different ways depending on what type of instruction the processor is executing. For a memory access instruction the ALU computes the memory address. An R-type instruction uses this third step to perform the actual arithmetic. This third step is the last step for branch and jump instructions. It is the step where the next PC address is computed and stored. The fourth step only takes place in load word, store word, R-type, and I-type instructions. This step is when the load and store word instructions access the memory and use an arithmetic-logical instruction to write its result. Values are either loaded from memory and stored into the memory data register, or loaded from a register and stored back into the memory. This fourth step is the last step for R-type and I-type instructions.
#
For R and I type instructions this is the step where the result from the ALU computation is stored back into the destination register. Only load instructions need the fifth step to finish up. This is the memory read completion step. In a load instruction the value of the memory data register is stored back into the register file.
These different steps are all controlled and orchestrated by the brain of the multi cycle CPU. This brain is the controller. The controller is a finite state machine that works with the Opcode to walk the rest of the components through all the different steps, or states. The controller controls when each register is allowed to write and controls which operation the ALU is performing.
1.2 Design Environment
Xilinx ISE Design Suite 13.2 is used as an environment for compiling and synthesis. Very-High-Speed Integrated Circuit ( VHSIC) Hardware Description Language or commonly known as VHDL programming is used to describe the behavior of the design. Bottom-up methodology is implemented to design the modules. In this way, the lowest level of hierarchy is tested first to eliminate possible errors on top level design. For the simulator, ISIM Simulator is invoked.
$
CHAPTER 2: MULTI CYCLE PROCESSOR DESIGN
2.1 MIPS subset for implementation
The Designed Multi cycle CPU could handle nine instructions. Of these, there are five R- type instructions: add, subtract, and, or, and set less than. There are three I-Type instructions: load word, store word, and branch on equal. The jump instruction is also supported. All the instructions are 32 bits in width.
1. Arithmetic logic instructions [R- format]
The instructions we use all read two registers, perform an ALU operation and write back the result. These arithmetic-logical instructions are also called R-type instructions. This instruction class considers add, sub, slt, and and or. The 32 registers of the processor are stored in a Register File. To read a data word two inputs and two outputs are needed. The inputs are 5 bits wide and specify the register number to be read, the outputs are 32 bits wide and carry the value of the register. To write the result back two inputs are needed: one to specify the register number and one to supply the data to be written. To process the data from the Register, an ALU with two data inputs is used.
The instruction field for an R-Type instruction is shown in Table 2.1
Example: add $t0, $s1, $s2
000000
10001
10010
01000
00000
100000 op rs rt rd shamt funct
Table 2.1: R-type Instruction Field
The meaning of the fields is: op: basic operation rs : first source register rt : second source register rd: destination register shamt: shift amount funct: function
%
2. Load and store instructions [I- format]
The sw- and lw-instructions compute a memory address by adding a register value to the 16-bit signed offset field contained in the instruction. Because the ALU has 32-bit values, the instruction offset field must be sign extended from 16 to 32 bits simply by concatenating the sign-bit 16 times to the original value.
The instruction field for a lw- or sw-instruction is shown in Table 2.2
Example: lw $t0, 32($s2)
100011
10010
01001
0000000000100000 op rs rt 16 bit number
Table 2.2: Load and Store Instruction Field
3. Branch instruction [I- format]
The beq instruction has three operands, two registers that are compared for equality, and a 16-bit offset used to compute the branch target address relative to the branch instruction address. The datapath for Branch instruction must do two operations: compare the register contents and compute the branch target. The address field of the branch instruction must be sign extended from 16 bits to 32 bits and must be shifted left 2 bits so that it is a word offset. The branch target address is computed by adding the address of the next instruction (PC + 4) to the computed offset.
The instruction field for a branch instruction is shown in Table 2.3
Example: beq $t8, $t9, 16
000100
11000
11001
0000000000000001 op rs rt rd
Table 2.3: Branch Instruction Field
&
4. Jump instruction [J- format]
The jump instruction is similar to the branch instruction, but computes the target PC differently and not conditionally. The destination address for a jump is formed by concatenating the upper 4 bits of the current PC + 4 to the 26-bit address field in the jump instruction and adding 00 as the last two bits.
The instruction field for a jump-instruction is shown in Table 2.4
Example: j24
000010
00000000000000000000000110 op 26 bit number
Table 2.4: Jump Instruction Field
'
2.2 Single Cycle Datapath
The Single Cycle Datapath attempts to execute all instructions in one clock cycle. This means that any element can be used only once per instruction. So these elements have to be duplicated. If possible datapath elements can be shared by different instruction flows. Therefore multiple connections to the input must be realised. This is commonly done by a multiplexer. Figure 2.1 shows the combined datapath including a memory for instructions and one for data, the ALU, the PC-unit and the mentioned multiplexers.
Fig 2.1: Single cycle design
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4 th Edition (
2.3 Analyzing Performance of Single Cycle Datapath
Assuming the delay of each component in the data path is as shown in Table 2.5, performance analysis could be performed on the Single cycle data path.
Component Delay Register 0 Adder t +
ALU t A
Multiplexer 0 Register file t R
Program memory t I
Data memory t M
Table 2.5: Component delays
Datapath delay for R-type {add, sub, and, or, slt} instruction in the Single cycle implementation : max (t + , t I + t R + t A + t M ) Delay for SW : max (t + , t I + t R + t A + t M ) For LW : max (t + , t I + t R + t A + t M + t R ) For beq : max (t + + t + , t I + t + , t I + t R + t A ) For jmp : max (t I , t + )
Hence the Critical path delay of the single cycle design in figure 2.1 will be max(t I + t R + t A + t M + t R , t I + t + , t I + t + ).
The above expression proves that the Performance is pulled down by the slowest instruction.
Some of the other problems with single cycle design include poor resource utilization and there are some instructions, which are impossible to be implemented in this manner. For example, an instruction that can transfer data from/to multiple memory locations cannot be implemented in a single cycle approach.
)
2.4 Clock Period Comparisons
Fig 2.2: Clock Period in single cycle design
Fig 2.3: Clock Period in Multicycle design
Fig 2.2 and Fig 2.3 illustrate the clock periods in Single and Multicycle design. The single cycle instructions clock period is dependent on the longest instruction but the Multicycle designs clock period is dependent on the longest time taken by any major functional block. The latency of lw instruction has increased slightly in the case of Multicycle approach due to quantization of time in terms of clock period.
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4 th Edition After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4 th Edition *
As long as the time taken by each functional unit is similar, the performance of Multicycle design is better. If there is a wide disparity in the delays of major functional units, the Multicycle implementation may give poor performance as shown in Fig 2.4
Fig 2.4: Unbalanced delay in Multicycle design
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4 th Edition "+
2.4 Multicycle Datapath
To avoid the disadvantages of the single cycle implementation described in the section before, a multicycle implementation is used. This technique divides each instruction into steps and each step is executed in one clock cycle. Fig 2.5 shows the design of a multicycle datapath.
Fig 2.5: Multicycle Datapath
Comparing to the single-cycle datapath the differences are that only one memory unit is used for instructions and data, there is only one ALU instead of an ALU and two adders and several output registers are added to hold the output value of a unit until it is used in a later clock cycle.
The instruction register (IR) and the memory data register (DR) are added to save the output of the memory. The registers A and B hold the register operands read from the register file and the Res register holds the output of the ALU. With exception of the IR all these registers hold data only between a pair of adjacent clock cycles. Because the IR holds the value during the whole time of the execution of an instruction, it requires a write control signal.
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4 th Edition ""
The reduction from former three ALUs to one cause the following changes in the datapath: An additional multiplexer is added for the first ALU input to choose between the A register and the PC. The multiplexer at the second ALU input is changed from a two-way to a four-way multiplexer. The two new inputs are a constant 4 to increment the PC and the sign- extended and shifted offset field for the branch instruction. In order to handle branches and jumps more additions in the datapath are required.
The three cases of R-type instructions, branch instruction and jump instruction cause three different values to be written into the PC: 1. The output of the ALU which is PC + 4 should be stored directly to the PC. 2. The register Res after computing the branch target address. 3. The lower 26 bits of the IR shifted left by two and concatenated with the upper 4 bits of the incremented PC, when the instruction is jump.
If the instruction is branch, the write signal for the PC is conditional. Only if the two compared registers are equal, the computed branch address has to be written to the PC.
Therefore the PC needs two write signals, which are PWu if the write is unconditional (value is PC + 4 or jump instruction) and PWc if the write is conditional. The output of the ALU Zero bit is ANDed with PWc and the result is ORed with PWu to get the write control signal PW of the Program Counter.
"#
2.5 Multicycle CPU with control
Figure 2.6 shows the completed datapath for a multicycle implementation including the whole control.
Fig 2.6: Multicycle Datapath with Control
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 4 th Edition "$
2.6 Operation flow
Figure 2.7 illustrates the operation flow for instructions in the multicycle implementation. The execution of an instruction is broken into clock cycles, which mean that each instruction is divided into a series of steps.
Fig 2.7: Operation flow for the instructions
"%
CHAPTER 3: DESIGN IMPLEMENTATION
3.1 ALU
The arithmetic-logic unit (ALU) performs basic arithmetic and logic operations which are controlled by the opcode. The result of the instruction is written to the output. An additional zero-bit signalizes a high output if the result equals zero. At the present time, the basic arithmetic operations add and sub and the logic operations and, or and slt can be applied to inputs. The inputs are 32 bit wide with type unsigned.
3.2 Memory
Data is synchronously written to or read from the memory with a data bus width of 32 bit. The memory consists of four ram blocks with 8 bit data width each. A control signal enables the memory to be written, otherwise data is only read. In order to store data to the memory the data word is subdivided into four bytes which are separately written to the ram blocks. Vice versa, the single bytes are concatenated to get the data word back again.
At the moment, it is only possible to read and write data words. An addressing of half- words or single bytes is not allowed. In order to write or read data words, all ram blocks have to be selected. Hence, the lowest two bits are not examined for chip-select logic. Data is addressed by the MIPS-processor with an address width of 32 bit, while the address width of a ram block is 8 bit each. All ram blocks are connected to the same address, namely from mem_Adr(9 downto 2). Since we do not use the full address width for addressing and chip selects, data words are addressed by multiple addresses.
The Block diagram of the memory with its I/O interface is shown in Fig 3.1
Fig 3.1: Memory Block Diagram "&
Fig 3.2 illustrates the internal view of the memory with four instantiated 256 x 8 RAM modules.
Fig 3.2: Internal view of the memory
Fig 3.3 shows the inferred 256 x 8 memory.
Fig 3.3: Inferred 256 x 8 memory
"'
3.3 Control
The Control Unit is basically divided into two parts. They are,
! Sequence Controller ! ALU Controller
Fig 3.4 illustrates the Top level view of the Controller which takes in specific bit fields of the instruction to generate the required control signals. In the case of Branch instruction, it should also consider Zero output of the ALU as an input.
Fig 3.4: Top Level Block diagram of Controller
"(
3.3.1: Micro operations and Control groups
A number of micro operations are identified and control groups defined to ease the task of setting control signals. As an example, PC = PC+4 comes under the Program counter group and is identified as a micro operation. The PC related signals are given required values to accomplish the micro operation. PC = PC + 4 is given a simpler name as PCinc for the ease of identification. Similar procedure is carried out for other control groups such as Memory group, Register file group and ALU group as illustrated in Table 3.1 Table 3.4
Operation Pwu PWc Psrc PC = PC + 4 Pcinc 0 X X if (A==B) PC = res branch 0 1 0 PC=PC[31-28]||s2(IR[25:0]) jump 1 X 2
Table 3.1: PC group
Expression for PC write enable, PW = PWu + Z.PWc
Operation MW MR IorD IW IR = Mem[PC] fetch 0 1 0 1 DR = Mem[Res] memLoad 0 1 1 0 Mem[Res] = B m_wr 1 0 1 0
Table 3.2: Memory group
Operation RW Rdst M2R AW BW A=RF[IR[25-21]] rs2A 0 X X 1 0 B=RF[IR[20-16]] rt2B 0 X X 0 1 RF[IR[15-11]]=Res reg_state 1 1 0 0 0 RF[IR[20-16]]=DR mem_done 1 0 1 0 0
Fig 3.5 shows the flow for each instruction in different clock cycles. The micro operation name identifies the required control signal values at that particular stage as illustrated in table 3.1 3.4 in the previous section.
Fig 3.5: Control signal generation stages
"*
3.3.3 Controller Implementation:
The control of the processor is realized by a Finite State Machine. The input to the State Machine is the upper 6 bits of the function field containing the instruction. The outputs of the state machine are the control signals of the single functional units of the processor implementation especially the multiplexers of the datapath.
mem_Addr branch jump alu_exec reg_state X X X X reg_state fetch_inst X X X X mem_Addr X mem_store mem_load X X mem_store X fetch_inst X X X mem_load X X mem_done X X mem_done X X fetch_inst X X branch X X X fetch_inst X jump X X X X fetch_inst
Table 3.5: State transition table for Sequence Controller
#+
The Sequence Controller is implemented as a FSM using the State transition table 3.5 and the respective outputs in each state are set using the values in the Tables 3.1 to 3.4. The Block Diagram of the Sequence Controller is as shown in Fig 3.3.2.
Fig 3.6: Sequence Controller Block Diagram
The Operation Code of the ALU is stored in a truth table and the corresponding Opcode is produced depending on the opc signal of the state machine and the lower 6 bits of the function field containing the information which of the arithmetic or logic instruction is to use.
Fig 3.7: ALU Controller Block Diagram
#"
Table 3.6 gives the required select outputs (OP) to control the operation of the ALU.
The inputs to the ALU controller are the opc bits from the Sequence controller block and the function field in the instruction as shown in Fig 3.3.3
TYPE opc ins FUNC Action OP R-type 10 Add 100000 Add 010 R-type 10 Sub 100010 Sub 110 R-type 10 And 100100 And 000 R-type 10 Or 100101 or 001 R-type 10 slt 101010 setOnLess 111 SW 00 Sw
Table 3.6: Signals for ALU controller Implementation
##
3.4 Data path
The datapath is divided into four sections with respect to the pipelining structure of a processor. The four parts are the Instruction Fetch, Instruction Decode, Execution and Memory Writeback.
3.4.1 Instruction Fetch
The Instruction Fetch Block contains the PC, Instruction Register and the Memory Data Register. This part provides the data and instructions from the memory.
Fig 3.4.1 shows the Block view of the Instruction fetch module.
Figure 3.8: Instruction Fetch Block
#$
3.4.2 Instruction Decode
The Instruction Decode Block decodes the instruction in Instruction Register and addresses the Register File and computes the second operand for a Branch Instruction or a sw- or lw-instruction. The sub components of this block are the register file, register A and register B.
Fig 3.9 shows the Block view of the Instruction decode module.
Figure 3.9: Instruction Decode Block
#%
3.4.3 Instruction Execution
The Execution Block contains the ALU as main element and computes the desired result of the instruction. It also computes the jump target address. The operands loaded to the ALU are chosen by two multiplexers, which have select signals Asrc1 and Asrc2. Main Component of Execution Block is ALU that is Capable of executing 9 instructions.
Fig 3.10 shows the Block view of the Instruction execute module.
Figure 3.10: Instruction Execution Block
#&
3.4.4 Memory Writeback
The Memory Writeback Block consists of the Res register and a multiplexer with select signal Psrc. This block leads the result of the computation either back to memory or to the register file. This stage calculates value to be loaded into PC.
Fig 3.11 shows the Block view of the Memory Writeback module.
Figure 3.11: Memory Writeback Block
3.4.5 Data Path
Fig 3.12 shows the Block view of the Data path module.
Figure 3.12: Data path Block
#'
3.5 Datapath and control
Both the Datapath and Control are combined to form the processing unit as illustrated in Fig 3.12
Figure 3.12: Data path and Control Unit
#(
3.6 Processor and Memory
Data plus control block, which represents the datapath and control, is combined with Memory to complete the processor as shown in Figure 3.13 and Figure 3.14
Figure 3.13: Processor Sub modules
Figure 3.14: Completed processor #)
CHAPTER 4: RESULTS OF SIMULATION AND SYNTHESIS
4.1 Simulation with a Program in memory
The following instructions were written into the memory to verify the functionality of the Processor.
Table 4.1: Instructions in memory before simulation
The following data was written to the memory as in Table 4.2
Memory Address (hex) Memory Address (Dec)
Contents (hex) 380 896 10 384 900 10 388 904 11
Table 4.2: Data in memory before simulation
#*
The expected values as shown in Table 4.3 should be stored back into the memory.
Memory Address (hex) Memory Address (Dec)
Contents (hex)
Description F0 960 00000010 10 AND 10 = 10 F1 964 00000011 10 OR 11= 11 F2 968 00000020 10 ADD 10 = 20 F3 972 00000001 11 SUB 10 = 01 F4 976 00000001 10 SLT 11 = 01(true)
Table 4.3: Expected values in memory
The simulation starts at memory address 000 with a load word instruction. The value of memory address 896 is written into register $t8. The PC is incremented and the next instruction at memory address 004 is executed. It is also a load word instruction which loads the value of memory address 900 to register $t9.
Then a branch instruction follows which compares the two operands in registers $t8 and $t9. As the data stored in two addresses 896 and 900 were the same, the values in two registers are equal and hence beq affects the program counter to jump to address 16.Then a Jump is executed to address (24) d or (18) h .
An and instruction follows which ands the two operands in registers $t8 and $t9 and puts the result in $t0. Then a store word instruction writes the content of register $t0 to the memory at address (960) d or (F0) h .
A load word instruction loads a new data value from address (964) d or (F1) h . The following instructions are for or, add, sub and slt. The result of a computation is always stored to the memory by a store word instruction.
$+
Memory window is illustrating the values in memory as shown in Fig 4.1 Highlighted in blue are the bytes of the first instruction distributed in 4 separate memories. Other instructions are stored in a similar way. The final result of simulation is highlighted in black.
Figure 4.1: Results in memory after simulation
$"
Fig 4.2 shows the memory before and after simulation and illustrates the results as they are stored back in memory.
Figure 4.2: Memory before and after simulation
$#
4.2 Synthesis Report
Figure 4.3: Synthesis Report
$$
Chapter 5: CONCLUSION AND FUTURE ENHANCEMENTS
5.1 Conclusion:
The multicycle implementation of a CPU is a great improvement over a single cycle implementation. It allows for the use of fewer design modules and a faster clock speed by breaking instructions into up to five different steps. These steps are controlled by a finite state machine controller. The design shows implementation of a multicycle CPU capable of handling nine different instructions. These instructions are in a variety of categories: R-type, I-type, and j-type. Each of these categories has a different instruction format. A multi-stepped design methodology was implemented to break this big project into many smaller steps. This project shows the wide variety of things to consider and components of a multicycle processor implementation.
5.2 Future Enhancements:
! Realize a hardware implementation of Processor and memory in order to verify the behavior of the desired hardware on Xilinx Spartan 3E development board. ! Introduce the pipelining of instructions to improve the performance of the processor.
$%
References
David A. Patterson, John L. Hennessy: Computer Organization and Design - The Hardware/Software Interface. Fourth Edition (2006). Morgan Kaufmann Publisher, Inc.
http://www.iitg.ernet.in/asahu/cs222/ Lectures on Computer Organization and Architecture. Retrieved: October 2011
http://nptel.iitm.ac.in/video.php?subjectId=106102062 Video tutorials on Computer Architecture Principles. Retrieved: November 2011
http://www.seas.upenn.edu/~ese171/vhdl/vhdl_primer.html VHDL Primer. Retrieved: October 2011
http://www.xilinx.com/itp/xilinx10/books/docs/sim/sim.pdf Xilinx Synthesis and Simulation Guide 10.1. Retrieved: December 2011
http://www.gstitt.ece.ufl.edu/courses/eel4712/lectures/vhdl/xst.pdf Xilinx Synthesis Technology (XST) User Guide. Retrieved: January 2012