(Advanced) Computer Architechture Prof. Dr. Hasan Hüseyin BALIK (3 Week)

(Advanced) Computer Architechture
Prof. Dr. Hasan Hüseyin BALIK

(3rd Week)
Outline
2. The computer system
—A Top-Level View of Computer Function and
Interconnection
+
2.1 A Top-Level View of Computer
Function and Interconnection
2.1 Outline
• Computer Components
• Computer Function
• Interconnection Structures
• Bus Interconnection
• Point-to-Point Interconnect
• PCI Express
Computer Components
• Contemporary computer designs are based on concepts
developed by John von Neumann at the Institute for
Advanced Studies, Princeton
• Referred to as the von Neumann architecture and is
based on three key concepts:
– Data and instructions are stored in a single read-write memory
– The contents of this memory are addressable by location, without regard
to the type of data contained there
– Execution occurs in a sequential fashion (unless explicitly modified) from
one instruction to the next
• Hardwired program
– The result of the process of connecting the various components in the
desired configuration
Sequence of
Data arithmetic Results
and logic
functions
(a) Programming in hardware
Instruction Instruction
codes interpreter
Control
signals
General-purpose
Data arithmetic Results
and logic
functions
(b) Programming in software
Figure 3.1 Hardware and Software Approaches

Software and I/O Components
Software
• A sequence of codes or instructions
• Part of the hardware interprets each instruction and
generates control signals
• Provide a new sequence of codes for each new
program instead of rewiring the hardware
Major components:
• CPU
• Instruction interpreter
• Module of general-purpose arithmetic and logic
functions
• I/O Components
• Input module
• Contains basic components for accepting data
and instructions and converting them into an
internal form of signals usable by the system
• Output module
• Means of reporting results
Memory, MAR, and MBR
Memory Memory buffer
address register (MBR)
register (MAR) • Contains the data
• Specifies the to be written into
address in memory memory or
for the next read or receives the data
write read from memory
I/O address I/O buffer

register (I/OAR) register (I/OBR)
• Specifies a • Used for the
particular I/O exchange of data
device between an I/O
module and the
CPU
CPU Main Memory
0
System 1
2
PC MAR Bus
Instruction
Instruction
Instruction
IR MBR
I/O AR
Data
Execution
unit Data
I/O BR Data
Data
I/O Module n–2

n–1
PC = Program counter
Buffers IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = Input/output address register
I/O BR = Input/output buffer register
Figure 3.2 Computer Components: Top-Level View

Fetch Cycle Execute Cycle
Fetch Next Execute

START HALT
Instruction Instruction
Figure 3.3 Basic Instruction Cycle

Fetch Cycle
• At the beginning of each instruction cycle the processor

fetches an instruction from memory
• The program counter (PC) holds the address of the

instruction to be fetched next
• The processor increments the PC after each instruction

fetch so that it will fetch the next instruction in sequence
• The fetched instruction is loaded into the instruction
register (IR)
• The processor interprets the instruction and performs the

required action
Action Categories
• Data transferred from • Data transferred to or
processor to memory from a peripheral
or from memory to device by
processor transferring between
the processor and an
I/O module
Processor- Processor-
memory I/O
Data
Control
processing
• An instruction may • The processor may

specify that the perform some
sequence of arithmetic or logic
execution be altered operation on data
0 3 4 15
Opcode Address
(a) Instruction format
0 1 15
S Magnitude
(b) Integer format
Program Counter (PC) = Address of instruction

Instruction Register (IR) = Instruction being executed
Accumulator (AC) = Temporary storage
(c) Internal CPU registers
0001 = Load AC from Memory

0010 = Store AC to Memory
0101 = Add to AC from Memory
(d) Partial list of opcodes
Figure 3.4 Characteristics of a Hypothetical Machine

Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 0 PC 300 1 9 4 0 3 0 1 PC
301 5 9 4 1 AC 301 5 9 4 1 0 0 0 3 AC
302 2 9 4 1 1 9 4 0 IR 302 2 9 4 1 1 9 4 0 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 2
Step 1 Step 2
300 1 9 4 0 3 0 1 PC 300 1 9 4 0 3 0 2 PC
301 5 9 4 1 0 0 0 3 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 5 9 4 1 IR 302 2 9 4 1 5 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3 3+2=5
941 0 0 0 2 941 0 0 0 2
Step 3 Step 4
300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC
301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 5
Step 5 Step 6
Figure 3.5 Example of Program Execution

(contents of memory and registers in hexadecimal)
Instruction Operand Operand
fetch fetch store
Multiple Multiple
operands results
Instruction Instruction Operand Operand

Data
address operation address address
Operation
calculation decoding calculation calculation
Return for string

Instruction complete, or vector data
fetch next instruction
Figure 3.6 Instruction Cycle State Diagram

Classes of Interrupts
Program Generated by some condition that occurs as a result of an

instruction execution, such as arithmetic overflow, division by
zero, attempt to execute an illegal machine instruction, or
reference outside a user’s allowed memory space.
Timer Generated by a timer within the processor. This allows the

operating system to perform certain functions on a regular basis.
I/O Generated by an I/O controller, to signal normal completion of an
operation, request service from the processor, or to signal a
variety of error conditions.
Hardware Failure Generated by a failure such as power failure or memory parity
error.
User 1 2 3
WRITE WRITE WRITE
Program
(a) No
interrupts
I/O 4 I/O 5
Program Command END
User 1 2a 2b 3a 3b
WRITE WRITE WRITE
Program
(b) Interrupts,
short I/O wait
I/O 4 I/O Interrupt 5

Program Command Handler END
User 1 2 3
WRITE WRITE WRITE
Program
(c) Interrupts,
long I/O wait
I/O 4 I/O Interrupt 5

Program Command Handler END
= interrupt occurs during course of execution of user program
Figure 3.7 Program Flow of Control Without and With Interrupts

User Program Interrupt Handler
i
Interrupt
occurs here i+1
Figure 3.8 Transfer of Control via Interrupts

Fetch Cycle Execute Cycle Interrupt Cycle
Interrupts
Disabled
Check for
Fetch Next Execute
START Interrupt;
Instruction Instruction Interrupts Process Interrupt
Enabled
HALT
Figure 3.9 Instruction Cycle with Interrupts

Time
1 1
4 4
I/O operation
I/O operation;
processor waits 2a concurrent with
processor executing
5 5
2b
2
4
I/O operation
4 3a concurrent with
processor executing
I/O operation;
processor waits 5
5 3b
(b) With interrupts

3
(a) Without interrupts
Figure 3.10 Program Timing: Short I/O Wait

Time
1 1
4 4
I/O operation; 2 I/O operation

processor waits concurrent with
processor executing;
then processor
waits
5
5
2
4
4
3 I/O operation
concurrent with
I/O operation; processor executing;
processor waits then processor
waits
5
5
3 (b) With interrupts
(a) Without interrupts
Figure 3.11 Program Timing: Long I/O Wait

Instruction Operand Operand
fetch fetch store
Multiple Multiple
operands results
Instruction Instruction Operand Operand

Data
address operation address address
Operation
calculation decoding calculation calculation
Return for
string or
Instruction complete, vector data
No
fetch next instruction interrupt Interrupt
check
Interrupt
Interrupt
Figure 3.12 Instruction Cycle State Diagram, With Interrupts

Interrupt
User program handler X
Interrupt
handler Y
(a) Sequential interrupt processing
Interrupt
User program handler X
Interrupt
handler Y
(b) Nested interrupt processing
Figure 3.13 Transfer of Control with Multiple Interrupts

Printer Communication
User program
interrupt service routine interrupt service routine
t=0
15
0 t=
t =1
t = 25
t= t = 25 Disk
40 interrupt service routine
t=
35
Figure 3.14 Example Time Sequence of Multiple Interrupts

I/O Function
• I/O module can exchange data directly with the processor
• Processor can read data from or write data to an I/O module
– Processor identifies a specific device that is controlled by a particular I/O
module
– I/O instructions rather than memory referencing instructions
• In some cases it is desirable to allow I/O exchanges to occur

directly with memory
– The processor grants to an I/O module the authority to read from or write
to memory so that the I/O memory transfer can occur without tying up the
processor
– The I/O module issues read or write commands to memory relieving the
processor of responsibility for the exchange
– This operation is known as direct memory access (DMA)
Read Memory
Write
N Words
Address 0 Data
Data N–1
Read I/O Module Internal

Write Data
External
Address M Ports Data
Internal
Data Interrupt
Signals
External
Data
Instructions Address
Control
Data CPU Signals
Interrupt Data
Signals
Figure 3.15 Computer Modules

The interconnection structure must support the
following types of transfers:
Memory Processor I/O to or

I/O to Processor
to to from
processor to I/O
processor memory memory
An I/O
module is
allowed to
exchange
Processor Processor data directly
reads an Processor reads data Processor with
instruction writes a unit from an I/O sends data memory
or a unit of of data to device via to the I/O without
data from memory an I/O device going
memory module through the
processor
using direct
memory
access
A communication pathway Signals transmitted by any
connecting two or more one device are available for
devices reception by all other
• Key characteristic is that it is a devices attached to the bus
shared transmission medium • If two devices transmit during the
same time period their signals
will overlap and become garbled
Bus
Typically consists of
Interconnection
multiple communication Computer systems contain a
lines number of different buses
that provide pathways
• Each line is capable of between components at
transmitting signals representing
binary 1 and binary 0 various levels of the
computer system hierarchy
System bus
• A bus that connects major The most common computer
computer components (processor,
memory, I/O) interconnection structures
are based on the use of one
or more system buses
Data Bus
• Data lines that provide a path for moving data among system
modules
• May consist of 32, 64, 128, or more separate lines
• The number of lines is referred to as the width of the data bus
• The number of lines determines how many bits can be
transferred at a time
• The width of the data bus is a key factor in determining overall
system performance
Address Bus Control Bus
• Used to designate the source or • Used to control the access and the
destination of the data on the use of the data and address lines
data bus
– If the processor wishes to • Because the data and address lines
read a word of data from are shared by all components there
memory it puts the address must be a means of controlling their
of the desired word on the use
address lines • Control signals transmit both
• Width determines the maximum command and timing information
possible memory capacity of the among system modules
system
• Also used to address I/O ports • Timing signals indicate the validity
– The higher order bits are of data and address information
used to select a particular • Command signals specify
module on the bus and the operations to be performed
lower order bits select a
memory location or I/O port
within the module
CPU Memory Memory I/O I/O
Control lines
Address lines Bus
Data lines
Figure 3.16 Bus Interconnection Scheme

Point-to-Point Interconnect
Principal reason for change At higher and higher data
was the electrical rates it becomes
constraints encountered increasingly difficult to
with increasing the perform the synchronization
frequency of wide and arbitration functions in a
synchronous buses timely fashion
A conventional shared bus

on the same chip magnified
Has lower latency, higher
the difficulties of increasing
data rate, and better
bus data rate and reducing
scalability
bus latency to keep up with
the processors
Quick Path Interconnect
QPI
• Introduced in 2008
• Multiple direct connections
– Direct pairwise connections to other components eliminating
the need for arbitration found in shared transmission systems
• Layered protocol architecture

– These processor level interconnects use a layered protocol
architecture rather than the simple use of control signals
found in shared bus arrangements
• Packetized data transfer

– Data are sent as a sequence of packets each of which
includes control headers and error control codes
I/O device
I/O device
I/O Hub
DRAM
DRAM
Core Core
A B
DRAM
DRAM
Core Core
C D
I/O device
I/O device
I/O Hub
QPI PCI Express Memory bus
Figure 3.17 Multicore Configuration Using QPI

Packets
Protocol Protocol
Routing Routing
Flits
Link Link
Physical Phits Physical
Figure 3.18 QPI Layers

QPI Layers
• Physical: Consists of the actual wires carrying the
signals, as well as circuitry and logic to support ancillary
features required in the transmission and receipt of the 1s
and 0s. The unit of transfer at the Physical layer is 20
bits, which is called a Phit (physical unit).
• Link: Responsible for reliable transmission and flow
control. The Link layer’s unit of transfer is an 80-bit Flit
(flow control unit).
• Routing: Provides the framework for directing packets
through the fabric.
• Protocol: The high-level set of rules for exchanging
packets of data between devices. A packet is comprised
of an integral number of Flits.
COMPONENT A
Fwd Clk Intel QuickPath Interconnect Port
Rcv Clk
Transmission Lanes Reception Lanes
Fwd Clk
Rcv Clk
Reception Lanes Transmission Lanes
Intel QuickPath Interconnect Port

COMPONENT B
Figure 3.19 Physical Interface of the Intel QPI Interconnect

QPI Link Layer
• Flow control function
• Performs two key – Needed to ensure that a
functions: flow control sending QPI entity does not
and error control overwhelm a receiving QPI
– Operate on the level of entity by sending data faster
the flit (flow control unit) than the receiver can process
– Each flit consists of a 72- the data and clear buffers for
bit message payload and more incoming data
an 8-bit error control Error control function
code called a cyclic – Detects and recovers from bit
redundancy check (CRC) errors, and so isolates higher
layers from experiencing bit
errors
© 2018 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

QPI Routing and Protocol Layers
Routing Layer Protocol Layer

• Used to determine the course • Packet is defined as the unit
that a packet will traverse of transfer
across the available system
• One key function performed
interconnects at this level is a cache
coherency protocol which
• Defined by firmware and deals with making sure that
describe the possible paths main memory values held in
that a packet can follow multiple caches are
consistent
• A typical data packet payload

is a block of data being sent
to or from a cache
Peripheral Component Interconnect (PCI)
• A popular high bandwidth, processor independent bus that
can function as a mezzanine or peripheral bus
• Delivers better system performance for high speed I/O
subsystems
• PCI Special Interest Group (SIG)
– Created to develop further and maintain the compatibility of the PCI
specifications
• PCI Express (PCIe)

– Point-to-point interconnect scheme intended to replace bus-based
schemes such as PCI
– Key requirement is high capacity to support the needs of higher data rate
I/O devices, such as Gigabit Ethernet
– Another requirement deals with the need to support time dependent data
streams
Core Core
Gigabit PCIe
Memory
Ethernet
Chipset
PCIe–PCI PCIe
Memory
Bridge
PCIe
PCIe PCIe
Switch
PCIe PCIe
Legacy PCIe PCIe PCIe

endpoint endpoint endpoint endpoint
Figure 3.21 Typical Configuration Using PCIe

Transaction layer
packets (TLP)
Transaction Transaction
Data link layer

packets (DLLP)
Data Link Data Link
Physical Physical
Figure 3.22 PCIe Protocol Layers

PCIe Layers
• Physical: Consists of the actual wires carrying the
signals, as well as circuitry and logic to support ancillary
features required in the transmission and receipt of the 1s
and 0s.
• Data Link: Responsible for reliable transmission and flow
control. Data packets generated and consumed by the
DLL are called Data Link Layer Packets (DLLPs).
• Transaction: Generates and consumes data packets
used to implement load/store data transfer mechanisms
and also manages the flow control of those packets
between the two components on a link. Data packets
generated and consumed by the TL are called
Transaction Layer Packets (TLPs).
PCIe Transaction Layer (TL)
• Receives read and write requests from the software
above the TL and creates request packets for
transmission to a destination via the link layer
• Most transactions use a split transaction technique
– A request packet is sent out by a source PCIe
device which then waits for a response called a
completion packet
• TL messages and some write transactions are posted
transactions (meaning that no response is expected)
• TL packet format supports 32-bit memory addressing
and extended 64-bit memory addressing
The TL supports four address spaces:
• Memory • I/O
– The memory space includes – This address space is used
system main memory and for legacy PCI devices, with
PCIe I/O devices
– Certain ranges of memory
reserved address ranges
addresses map into I/O used to address legacy I/O
devices devices
• Configuration • Message
– This address space enables – This address space is for
the TL to read/write control signals related to
configuration registers interrupts, error handling,
associated with I/O devices and power management
PCIe TLP Transaction Types
Address Space TLP Type Purpose
Memory Read Request
Transfer data to or from a location in the system
Memory Memory Read Lock Request
memory map.
Memory Write Request
I/O Read Request Transfer data to or from a location in the system
I/O I/O Write Request memory map for legacy
I/O Write Request devices.
Config Type 0 Read Request
Config Type 0 Write Request Transfer data to or from a location in the
Configuration configuration
Config Type 1 Read Request space of a PCIe device.
Config Type 1 Write Request
Message Request
Message Provides in-band messaging and event reporting.
Message Request with Data
Completion
Memory, I/O, Completion with Data
Configuration
Returned for certain requests.
Completion Locked
Completion Locked with Data
Number
of octets
1 STP framing 1 Start
Appended by PL
2 Sequence number
DLLP
Created
by DLL
4
2 CRC
12 or 16 Header 1 End
Created by Transaction Layer
Appended by Data Link Layer
Appended by Physical Layer

0 to 4096 Data
0 or 4 ECRC
4 LCRC
1 STP framing
(a) Transaction Layer Packet (b) Data Link Layer Packet
Figure 3.25 PCIe Protocol Data Unit Format

(Advanced) Computer Architechture Prof. Dr. Hasan Hüseyin BALIK (3 Week)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

(Advanced) Computer Architechture Prof. Dr. Hasan Hüseyin BALIK (3 Week)

Încărcat de

Drepturi de autor:

Formate disponibile

(Advanced) Computer Architechture

Prof. Dr. Hasan Hüseyin BALIK

(a) Programming in hardware

(b) Programming in software

Figure 3.1 Hardware and Software Approaches

I/O address I/O buffer

I/O Module n–2

Figure 3.2 Computer Components: Top-Level View

Fetch Next Execute

Figure 3.3 Basic Instruction Cycle

• At the beginning of each instruction cycle the processor

• The program counter (PC) holds the address of the

• The processor increments the PC after each instruction

• The processor interprets the instruction and performs the

• An instruction may • The processor may

(a) Instruction format

(b) Integer format

Program Counter (PC) = Address of instruction

(c) Internal CPU registers

0001 = Load AC from Memory

(d) Partial list of opcodes

Figure 3.4 Characteristics of a Hypothetical Machine

Figure 3.5 Example of Program Execution

Instruction Instruction Operand Operand

Return for string

Figure 3.6 Instruction Cycle State Diagram

Program Generated by some condition that occurs as a result of an

Timer Generated by a timer within the processor. This allows the

I/O 4 I/O Interrupt 5

I/O 4 I/O Interrupt 5

= interrupt occurs during course of execution of user program

Figure 3.7 Program Flow of Control Without and With Interrupts

Figure 3.8 Transfer of Control via Interrupts

Figure 3.9 Instruction Cycle with Interrupts

(b) With interrupts

(a) Without interrupts

Figure 3.10 Program Timing: Short I/O Wait

I/O operation; 2 I/O operation

3 (b) With interrupts

(a) Without interrupts

Figure 3.11 Program Timing: Long I/O Wait

Instruction Instruction Operand Operand

Figure 3.12 Instruction Cycle State Diagram, With Interrupts

(a) Sequential interrupt processing

(b) Nested interrupt processing

Figure 3.13 Transfer of Control with Multiple Interrupts

Figure 3.14 Example Time Sequence of Multiple Interrupts

• In some cases it is desirable to allow I/O exchanges to occur

Read I/O Module Internal

Figure 3.15 Computer Modules

Memory Processor I/O to or

Address lines Bus

Figure 3.16 Bus Interconnection Scheme

A conventional shared bus

• Layered protocol architecture

• Packetized data transfer

QPI PCI Express Memory bus

Figure 3.17 Multicore Configuration Using QPI

Physical Phits Physical

Figure 3.18 QPI Layers

Reception Lanes Transmission Lanes

Intel QuickPath Interconnect Port

Figure 3.19 Physical Interface of the Intel QPI Interconnect