Sunteți pe pagina 1din 53

STATE-OFSTATEOF-THETHE-ART

INTERCONNECT FABRICS AND


COMMUNICATION PROTOCOLS

AHB: critical overview




Protocol


Lacks parallelism







High arbitration overhead (min. 2 cycles on single-transfers)


Bus-centric vs. transaction-centric


In order completion
Address of next transaction just anticipated on the bus 
No multiple outstanding transactions: cannot hide slave wait
states effectively

Initiators and targets are exposed to bus architecture (e.g.


arbiter)
No decoupling, instance-specific bus components

Topology


Scalability limitation of shared bus solution!

Toward improved utilization of


the topology (throughtput, latency)

Bus evolution
Protocol

Toward enhanced parallelism


Topology

Topology evolution
Shared bus with unidirectional
Request and response lanes

Crossbar with unidirectional


Request and response lanes

Topology evolution
Partial Crossbar
with unidirectional
request and
response lanes
0

M0 M1

S0

Shared bus
S1

P2 P3 T1 M2 M3
Shared bus

S2

P4 P5 T2 M4 M5

xbar

Shared bus
P6 P7 T3

M7

S3

Shared bus
S4
P8 P9 T4 M8 M9
Shared bus

M6

Multi-layer bus architecture

The communication bottleneck




Today: multi-layer topology


IPTG

LX

IP 1

IPTG

System interconnect

IPTG
IPTG
IPTG

IP 2

IPTG

IPTG
IPTG
IPTG

IP 3
IPTG
IPTG
IPTG
IPTG

IP 3

IPTG

IPTG

IP 5
IPTG
IPTG

Jeopardizing design predictability, feasibility and cost!

off-chip
memory
controller

The communication bottleneck




Today: multi-layer topology


IPTG

LX

IP 1

IPTG

System interconnect

IPTG
IPTG
IPTG

IP 2

IPTG

IPTG
IPTG
IPTG

IP 3
IPTG
IPTG
IPTG
IPTG

IP 3

IPTG

IPTG

IP 5
IPTG
IPTG

Jeopardizing design predictability, feasibility and cost!

off-chip
memory
controller

Topology evolution
4-ary 2mesh
Switches

16

Bis. Band.

Tiles x
Switch

Switch Arity

Max. Hops

Tile
Switch

4-ary 2-mesh

Topology evolution
4-ary 2mesh

2-ary 4mesh

Switches

16

16

Bis. Band.

Tiles x
Switch

Switch Arity

Max. Hops

Tile

Tile

Switch

Switch

4-ary 2-mesh

2-ary 4-mesh

Topology evolution
4-ary 2mesh

2-ary 4mesh

2-ary 2mesh

Switches

16

16

Bis. Band.

Tiles x
Switch

Switch Arity

10

Max. Hops

Tile

Tile

Switch

Switch

4-ary 2-mesh

2-ary 2-mesh
Low latency

Split transactions
A split
split--transaction bus is a bus where the request and response phases
are split and independent to improve bus utilization
-Master must arbitrate for the request phase
-Slave must arbitrate for the response phase

Master

Request
Bus
Bus released
Busy

Slave

Response
Bus
Bus released
busy

Multiple outstanding transactions


Master

Slave

Queue of pending
requests

Requests

Queue of pending
responses

Responses

The master needs to associate each response to one of its pending requests
The initiator should support multiple outstanding transactions too

Out--of
Out
of--order completion
Master

To S2

S2 -fast

S1-slow

Queue of
pending
requests

time

To S1

Queue of
pending
requests

Requests
From S2

From S1

 Association between requests and responses is more challenging


 The typical case for out-of-order completion is when a fast slave is
addressed after a slow slave. The fast slave will return its response earlier.

Out--of
Out
of--order completion
Master

S1
anticipated

S12
S11

S11
S12

time

Queue of
pending
requests

Requests
Resp of S12

Resp of S11

 Out-of-order completion even in case multiple outstanding transactions are


addressed to the same complex slave
 A complex slave may use local optimizations and change the processing
order of incoming requests (e.g., serve accesses to an open row first in an
SDRAM device)

Bus--centric architecture
Bus
Master
interface

Slave
interface

Bus
architecture

 Internal bus components are directly exposed to the connected


master and slave interfaces
 The bus architecture is instance-specific and lacks modularity

Transaction--centric architecture
Transaction
Slave interface

Master interface
Point-to-point
Communication
Protocol

Slave interface
Hidden components

Master interface
Bus
architecture

 Internal bus components are hidden behind bus interfaces


 Modular architecture
 Orthogonalization of concerns
 Internal bus architecture can freely evolve without impacting the interfaces
 The only objective of interfaces: specifying communication transactions!
(communication abstraction)

But what is there on the market?

AMBA MultiMulti-layer AHB







Enables parallel access paths between multiple masters and


slaves
Fully compatible with AHB wrappers
It is a topology (not protocol) evolution
Pure combinational matrix (scales poorly with no of I/Os)

Master1

AHB

AHB

Interconnect
Matrix

Slave1

Slave1

Master2
Slave1

Multi--Layer AHB implementation


Multi




The matrix is completely flexible and can be adapted


MUXes are point arbitration stages
AHB layer can be AHB-lite: single master, no
req/grant, no split/retry

Multi--layer AHB implementation


Multi


A layer loosing arbitration is waited by means of


HREADY
When a layer is waited, input stage samples
pipelined address and control signals

Hierarchical systems

Slaves accessed only by masters on a given layer can


be made local to the layer

Multiple slaves

Multiple slaves appear as


single
slave to the matrix
combine low bandwidth
slaves
group slaves accessed
only
by one master (e.g. DMA
controller)
Alternatively, a slave can be
an AHB-to-APB bridge, thus
allowing connection to
multiple low-bandwidth
slaves

Multiple masters per layer

Combine masters that have


low bandwidth requirements

Putting it alltogether
Interconnect matrix and Slave4
are used for across-layer
communication

HW
semaphores

Dual port slaves

Common for off-chip SDRAM controllers


Layer1: bandwidth limited high priority traffic with
low latency requirements (e.g., processor cores)
Layer2: Bandwidth-critical traffic
(e.g., hardware accelerators)
The dual-port slave may even be connected to the matrix

AMBA 3.0 (AMBA AXI)


This is an evolution of the communication protocol
High bandwidth low latency designs
High frequency operation
Flexibility in the implementation
Backward compatible with AHB and APB
Novel features with respect to AHB
Burst-based transactions with only first address issued
Address information can be issued before/after actual
write data transfer
Multiple outstanding addresses
Out-of-order transaction completion
easy addition of register stages for timing closure

Design paradigm change


Slave

AXI

Master

Slave

Master
Initiator

Communication
architecture

AXI

Target

 Point-to-point interface specification


 Independent of the implementation
of the communication architecture
 Communication architecture can (be) freely evolve (customized)
 Transaction-based specification of the interface
 Open Core Protocol (OCP) is another example of this paradigm

Transaction--centric bus
Transaction
AXI can be used to interconnect:
-an initiator to the bus
The interface definition
-a target to the bus
allows a variety of different
-an initiator with a target
interconnect
implementations

Slave

Master
Initiator

AXI

Target

Interconnect approaches
Slave
Slave

crossbar

Master

Slave

AXI

Master

Slave

Master

shared

Master

AXI

bus

Most systems use one of three interconnect approaches:


-shared address and data buses
Most common
-Shared address buses and multiple data buses
-Multilayer, with multiple address and data buses

Channel--based Architecture
Channel


Five groups of signals








Read Address
Read Data
Write Address
Write Data
Write Response

R. ADDRESS

AR signal name prefix


R signal name prefix
AW signal name prefix
W signal name prefix
B signal name prefix

W. ADDRESS

READ DATA

WRITE DATA
RESPONSE

Channels are independent and asynchronous wrt each other

Read transaction

Single address for burst transfers

Write transaction

Single response for an entire burst

Channels - One way flow


AWVALID

WVALID

RVALID

BVALID

AWDDR

WLAST

RLAST

BRESP

AWLEN

WDATA

RDATA

BID

AWSIZE

WSTRB

RRESP

BREADY

AWBURST

WID

RID

AWLOCK

WREADY

RREADY

AWCACHE
AWPROT
AWID

AWREADY


Channel: a set of unidirectional information


signals
Valid/Ready handshake mechanism





READY is the only return signal


Valid: source IF has valid data/control signals
Ready: destination IF is ready to accept data
Last: indicates last word of a burst transaction

Valid ready handshake

AMBA 2.0 AHB Burst


ADDRESS

DATA

A11

A12

A13 A14 A21

A22

A23

D31

D11

D12 D13 D14

D21

D22

D23

AHB Burst




Address and Data are locked together


Two pipeline stages
HREADY controls pipeline operation

D31

AXI - One Address for Burst


ADDRESS

DATA

A11

A21

D11

D12 D13 D14

D31

D21

AXI Burst


One Address for entire burst

D22

D23

D31

AXI - Outstanding Transactions


ADDRESS

A11

A21

DATA

D11

D31

D12 D13 D14

D21

D22

D23

AXI Burst



One Address for entire burst


Allows multiple outstanding addresses

D31

Problem: Slow slave

ADDRESS

DATA

A11

A21 A31

D11

D12

If one slave is very slow, all data is held


up.

Out--of
Out
of--Order Completion
ADDRESS

DATA

A21

D31

D21 D22 D23

D31

D11 D12 D13 D14

Out of order completion allowed


 Fast slaves may return data ahead of slow slaves
 Complex slaves may serve requests out-of-order
Each transaction has an ID attached (given by the master IF)
 Channels have ID signals - AID, RID, etc.
 Transactions with the same ID must be ordered
 The interconnect in a multi-master system must append
another tag to ID to make each masters ID unique


A11

Ordering restrictions

Simple rules
A simple master can issue transactions with the same ID
(implicitely forcing in-order delivery)
A simple slave can serve requests in the order they arrive,
regardless of the ID tag

AXI - Data Interleaving


ADDRESS

DATA





A11

A21

D31

D21 D22

D11 D23 D12 D31 D13

D14

Returned data can even be interleaved


Gives maximum use of data bus
Note - Data within a burst is always in
order

Burst read
Valid high until ready high

The valid-ready handshake regulates data transfer


This is clearly a split transaction bus!

Overlapping burst read


Address of second burst issued:
True outstanding transactions

Burst write

Register slices for max frequency


Channels are
WID
asynchronous
WDATA
WSTRB
 Register slices can
WLAST
WVALID
be applied across
WREADY
any channel
 Allows maximum
frequency of operation
by changing delay into latency


Other AXI features





No early burst termination, but fine granularity specification of burst beats


(1-16)
Burst types:
 Fixed (FIFO-like))
 Incremental
 Wrapping

Support for system caches


 Bufferable vs. Cacheable transactions

Support for
 Priviledged transactions vs. Normal ones
 Secure vs. non-secure transactions

Support exclusive accesses


 Read exclusive, followed by write exclusive

Support for locked accesses


 Terminated by an unlocked access

Write data interleaving ( of transactions with different IDs)

Init1

Comparison
2 wait states memories

AHB
STBUS low buf

STBUS high buf

AXI

Init2
Init3

Mem1
Bus

Mem2
Mem3

It is impossible to
hide slave response
latency
While the previous
response phase is in
progress, a new request
can be processed by the
next addressed slave
More data pre-accessed
while previous response
phase is in progress
Interleaving support in
interfaces and
interconnect allow
better interconnect
exploitation

Scalability
Highly parallel benchmark (no slave bottlenecks)
1 memory wait state




110%

180%

100%

170%
160%
150%

80%
70%
2 Cores

60%

4 Cores

50%

6 Cores
8 Cores

40%
30%
20%
10%

Relative execution time

Relative execution time

90%

140%
130%
120%
110%
100%
90%

2 Cores

80%
70%
60%

6 Cores

4 Cores
8 Cores

50%
40%
30%
20%
10%
0%

0%
AHB

AXI

STBus

STBus (B)

 1 kB cache (low bus


traffic)

AHB

AXI

STBus

STBus (B)

 256 B cache (high


bus traffic)

Scalability
100%

100%

Interconnect busy

80%
70%
60%
50%

2 Cores

40%

4 Cores
6 Cores
8 Cores

30%
20%

0%

70%
60%
2 Cores

50%

4 Cores
6 Cores

40%

8 Cores

30%
20%

0%

AHB

80%

10%

10%

Interconnect usage efficiency

90%

90%

AXI

STBus

STBus (B)

AHB

AXI

STBus

STBus (B)

Increasing contention: AXI, STBus show 80%+


efficiency, AHB < 50%
Saturation of shared bus architectures

Networks--on
Networks
on--Chip (NoCs)
Same paradigm of Wide Area Networks and
of large scale multi-processors
IP core
master

NI

Packet

NI

IP core
master

NI

IP core
master

switch

TAIL
FLIT

PAYLOAD
L

FLIT

HEADER

FLIT

switch

FLIT

switch
IP core
slave

Clean separation
at session layer
Core issues end-to-end
transactions
(through AXI, OCP,..),
Network deals with
lower level issues

NoC

NI

IP core
slave

switch

NI
NI

IP core
slave

Modularity at HW level Physical design aware


Only 2 building blocks:
network interface,
switch

Path segmentation
Regular routing

Shared buses vs NoCs


NoCs Pros.
- Each integrated IP core adds bus load capacitance
+ Only point-to-point one-way links are used
- Bus timing problems in deep sub-micron designs
+ Better suited for GALS paradigm
- Arbiter delay grows with no of masters. Instance-specific arbiter
+ Distributed routing decisions. Reinstantiable switches
- Bus bandwidth is shared among all masters
+ Bus bandwidth scales with network dimension

Shared buses vs NoCs


NoCs Cons.
+ After bus is granted, bus access latency is null
- Unpredictable latency due to network congestion problems
+ Very low silicon cost
- High area cost
+ Simple bus-IP core interface
- Network-IP core interface can be very complex (e.g. packetization,..)
+ Design guidelines are well known
- Design guidelines start to consolidate

S-ar putea să vă placă și