Sunteți pe pagina 1din 7

EE/CE 6370

Spring 2015

Design Assignment: UTD PIPELINED MICROPROCESSOR (PUT )


Last year, when I almost taught this course, I assigned a design assignment that is described in the appendix
section of this document. I think it was fun. Please read it carefully. As an extension, I want you to create
exactly the same functionality but using a different architecture. This assignment will give you a taste of real
large system design. It will expose intricacies of a large complex design and give you experience in design,
simulation, and test. The modified assignment is described below:

PUT Pipelined Architecture


Original UT described in the appendix is a 16-bit custom-configurable microprocessor. Its architecture is
based on the traditional von Neumann model with a single address space and memory used to store both
programs and data. The UT core instructions are executed as sequences of micro-operations, each instruction
cycle consisting of four machine cycles, which perform the three major steps: instruction fetch, instruction
decode and instruction execution. Consequently, one instruction is completed after each four machine cycles
resulting in a relatively low instruction throughput and low utilization of hardware resources.
The approach to achieve a speedup and enhance the performance of a processor can be to shorten the machine
cycle time by using faster hardware elements and/or to reduce the number of cycles per instruction (increase
instruction throughput) by using some more efficient processing algorithm. The basic way to reduce the number
of cycles per instruction is to exploit instruction level parallelism. Instruction pipelining is an implementation
technique that achieves instruction parallelism by overlapping instruction fetching, decoding and execution. In
this technique, the pipelined processor consists of a sequence of m processing stages, through which a stream of
instructions can be passed. Every instruction is broken down into m partial steps for execution in the m stage
pipelining. Partial processing of the instructions takes place in each stage. Final fully processed result is
obtained only after an instruction has passed through the entire pipeline. The partial steps are executed within a
single machine cycle; consequently one instruction result is available with each machine cycle except for the
first couple and ending instructions.
When designing a pipelined processor, the first task is finding a suitable multistage sequential algorithm for
computing the target function. The original (see appendix) UT is a very simple architecture. Therefore, a
three-stage instruction pipelining can be implemented by partitioning the instruction cycle into three stages:
fetch stage, decode stage and execution stage. There are fundamental requirements that will ensure proper
implementation of the (PUT) design. Therefore the data path and control mechanism should be designed
carefully.
Memory conflicts
Von Neumann architecture suitable for the original UT requires both instructions and data to be stored in the
same memory. Obviously it is difficult to implement instruction pipelining to a processor adopting this model
since all pipeline stages are simultaneously active which may cause request for simultaneous access to the
memory by two pipeline stages. This problem can be resolved by adopting Harvard architecture1, which uses
two separate memories for instructions and data. Therefore, a new program memory should be introduced to
1

It may be a good idea to read and refresh your memory on the design and implementation of instruction set architectures (ISAs) as
well as pipelined architectures. Most importantly, please feel free to approach instructor anytime for discussions, ideas, and
clarifications. You have full freedom to implement the pipelining however way you wish.

store instructions with separate data and address buses. Figure 1 illustrates a mechanism for implementing the
memory for (PUT).

Figure 1: Memory access difference in von Neuman and Harvard Architectures

Stage Registers
Pipeline stages have to be separated by stage registers in order to store the intermediate result from one stage
and make it available to the next stage.
Branching Instructions
Although instruction pipelining can increase instruction throughput, there still remain some critical instruction
sequences that cannot be pipelined (overlapped or partitioned). These sequences usually consist of data and
control dependencies.
An example of these instructions is a branching instruction. When an instruction Ii with address Ai is being
executed by the execution stage, the instruction Ii+1 with the next consecutive address Ai+1 is being decoded
in decode stage while the instruction Ii+2 with the next consecutive address Ai+2 is fetched from program
memory by the fetch stage. If Ii happens to be the branch instruction, then next fetch should happen at location
defined by branch address.
Due to the adoption of Harvard architecture, additional memory is used as program memory in pipelined PUT.
As this memory has as primary function to store programs (instructions), it is sufficient to provide a single
program memory address register (PMAR) to hold an effective address, and instruction data register to store
instruction read from program memory.
The role of the program counter (PC) and program counter temporary register (TEMP) is also defined. PC is
used to point to the next instruction to be fetched by fetch stage, while TEMP holds the address of the
instruction being decoded in decode stage (next instruction to be executed) in order to be prepared for an
interrupt request signal if it occurs.
Stack pointer (SP) and temporary stack pointer (ST) are used to implement stack and support related operations
and mechanisms such as subroutine, interrupt and return mechanism, and instructions to push to and pull from
the stack. SP always points to the next available (free) location on the stack, while ST is used in original UT to
hold a copy of SP value. UT updates these values after each instruction cycle (four machine cycles). This is
not allowed in pipelined PUT as everything should be done in one machine cycle.

Control Unit

PUT control mechanisms must support pipelining and also provide proper operation in the situations that can
be considered as exceptions (initialization at power-up and reset, interrupts handling and program flow change
when branch is taken in branch instructions).
In the original von Neuman style UT, four non-overlapping phases from the system clock have to be
generated. For its normal pipelined operation, the pipelined PUT does not require this as all pipeline stages are
simultaneously active and the instruction fetching, decoding and execution are temporally overlapped. However
the processor still requires four identifiable machine cycles to initialize pipeline stages at the system power-up,
reset, or after some instructions that force to disable the pipeline processing such as branching and return
instructions that can be considered as a kind of exceptions. Also it requires the same number of cycles to
perform a jump on address specified in interrupt vector when interrupt cycle is carried out. To implement this,
the required actions can be built in into the control unit FSM (finite state machine).
Your task is to implement PUT in FPGA and demonstrate using example program of your choice.
The program should include branch and exception conditions.
What will you implement?
For this design assignment, you will work on the implementation of three components:
1. Fully worked out design of and implementation of PUT on the Nexyx 4 DDR2 board.
2. An ability to interface your FPGA development board with the keyboard3.
3. An ability to interface your FPGA development board with a VGA monitor4
Grading Rubric
We will post grading rubric for this design assignment on the course website.

If you are using Nexys 3 board then you will be responsible for proper demonstration.
Design details for this interface will be provided.
4
Design details for this interface will be provided. We will also setup one machine in our laboratory for a quick demo. In the coming
two weeks, the teaching assistant will reserve first half of the office hour for the demo of interfaces and quick demo.
3

Appendix
Description of the original customizable microprocessor5
In this assignment we will design a simple 16-bit customizable microprocessor called UT. The UT can be
considered the core for various user specific computing machines. It consists of a set of basic microprocessor
features that can be used without any changes for some simple applications, or can be extended by the user in
many application specific directions. Extensions can be achieved by adding new instructions or other features to
the UT s core, or by attaching functional blocks to the core without actually changing the core.
Requirements
The basic features of the UT core are:

16-bit data bus and 12-bit address bus that enable direct access to up to 4096 16-bit memory locations
two programmer visible 16-bit working registers, called A and B registers, which are used to store
operands and results of data transformations
memory-mapped input/output for communication with the input and output devices
basically a load/store microprocessor architecture with a simple instruction cycle consisting of four
machine cycles per each instruction; all data
transformations are performed in working registers
support of direct and the most basic stack addressing mode, as well as implicit addressing mode
definable custom instructions and functional blocks which execute custom instructions can be added
physical pin assignments can be changed to suit the PCB layout.

Instruction Formats and Instruction Set


The UT instructions have very simple formats. All instructions are 16-bits long and require one memory word.
In the case of direct addressing mode, 12 lower instruction bits represent an address of the memory location. All
other instructions for basic data processing, program flow control, and control of processor flags use implied
addressing mode. The core instruction set is:
Mnemonic
LDA
LDB
STA
STB
JMP
JSR
ADD
AND
CLA
PUSHA
POPA
CLB
CMB
INCB
DECB
5

Function
A M[address]
B M[address]
M[address] A
M[Address] B
PC Address
Stack PC, PC address, SP SP1
A A+B
A A AND B
A 0
Stack A, SP SP-1
SP SP+1, A stack
B 0
B B
B B+1
B B-1

Please read the appendix carefully as it establishes the overall functionality of the microprocessor architecture.

CLflag
ION
IOF
SZ
SC
RET

Flag 0 (flag can be carry, zero)


IEN 1, enable interrupts
IEN 0, disable interrupts
If Z=1, PC PC+1; skip if zero is set
If C=1, PC PC+1; skip if carry set
SP SP+1, PC stack

All memory reference instructions use either direct or stack addressing mode as shown below.

The four most significant bits are used as the operation code (opcode) field. As such operation code field can
specify up to 16 different instructions. Twelve least significant bits are used as an address for instructions with
direct addressing mode or they have no meaning for instructions using stack(implicit) addressing mode.
Memory reference instructions with the direct and stack addressing modes are assigned the opcodes as shown
below:
Opcode[1512]
0000
0001
0010
0011
0100
1000
1010
1100
1110

Mnemonic
LDA
LDB
STA
STB
JMP
JSR
PUSHA
POPA
RET

Instructions in direct addressing mode have the most significant bit equal to 0. Those that use the stack have
most significant bit equal to 1. The instructions which belong to the register reference instructions and are not
using the stack have the most significant bit equal to 0 and four most significant bits equal to HEX 7, and
instructions that operate on user specified (configurable) functional blocks have the most significant bit equal to
1 and four most significant bits equal to HEX F.
The remaining core instructions have the following instruction formats:

The opcodes are assigned as below:


01110001
01110010
01110011
0111 0100

ADD
AND
CLA
CLB

0111 0101
0111 0110
0111 0111
0111 1000
0111 1001
0111 1010
0111 1011
0111 1100
0111 1101

CMB
INCB
DECB
CLC
CLZ
ION
IOF
SC
SZ

Register reference instructions operate on the contents of working registers (A and B), as well as on individual
flag registers used to indicate different status information within the processor or to enable and disable
interrupts. Examples of those instructions are ADD and DEC instructions. Program flow control instructions are
used to change program flow depending on the results of current computation are simple "skip if zero or carry"
set (SZ and SC). These instructions in combination with unconditional JMP instruction can achieve conditional
branching to any memory address.
Besides the shown instructions, the UT provides instructions that invoke different application specific
functional blocks6. These instructions are designated with instruction bits [1512] set to 1. Individual
instructions are coded using the least significant bits [70].
Register Set

UT contains a number of registers that are used in performing micro-operations. These include 16 bit registers
A and B. 12 bit program counter (PC) and 12 bit stack pointer (SP) are not accessible to users. Single bit carry
(C), zero (Z) and, interrupt enable (IEN) are also available.
Instruction Execution
The UT core instructions are executed as sequences of micro-operations presented by register transfers. The
basic instruction cycle contains all operations from the start to the end of an instruction. It is divided into three
major steps that take place in four machine clock cycles denoted by TO, TI, T2, and T3.
1. Instruction fetch is when a new instruction is fetched from an external memory location pointed to by
the program counter. It is performed in two machine cycles. The first cycle, TO, is used to transfer the
address of the next instruction from the program counter to the address register. The second cycle TI is
used to actually read the instruction from the memory location into instruction register, IR. At the same
time program counter is incremented by one to the value that usually represents the next instruction
address.
2. Instruction decode is the recognition of the operation that has to be carried out and the preparation of
effective memory address. This is done in the third machine cycle T2 of the instruction cycle.
3. Instruction execution is when the actual operation specified by the operation code is carried out. This is
done in the fourth machine cycle T3 of instruction cycle.
Besides these three fundamental operations in each machine cycle, various auxiliary operations are also
performed that enable each instruction to be executed in exactly four machine cycles. They also provide the
consistency of contents of all processor registers at the beginning of each new instruction cycle.
6

No one did this last year!!

Instructions are executed in the same sequence they are stored in memory, except for program flow change
instructions. Besides this, the UT provides a very basic single level interrupt facility that enables the change of
the program flow based on the occurrence of external events represented by hardware interrupts. A hardware
interrupt can occur at any moment since an external device controls it. However, the UT checks for the
hardware interrupt at the end of each instruction execution and, in the case that the interrupt has been required,
it sets an internal flip-flop called interrupt flip-flop (IFF). At the beginning of each instruction execution, UT
checks if IFF is set. If not set, the normal instruction execution takes place.
If the IFF is set, UT enters an interrupt cycle in which the current contents of the program counter is saved on
the stack and the execution is continued with the instruction specified by the contents of memory location called
the interrupt vector (INTVEC).
The interrupt vector represents the address of the memory location, which contains the first instruction of the
Interrupt Service Routine (ISR), which then executes as any other program sequence. At the end of the ISR, the
interrupted sequence, represented by the memory address saved on the stack at the moment of the interrupt
acknowledgment, is returned to using the "RET" instruction.

S-ar putea să vă placă și