Sunteți pe pagina 1din 44

Extending the Transaction Level

Modeling Approach for Fast


Communication Architecture
Exploration

Sudeep Pasricha, Nikil Dutt Mohamed Ben-Romdhane


{sudeep,dutt}@ cecs.uci.edu m.benromdhane@conexant.com
Center for Embedded Computer Systems Conexant Systems Inc
University of California, Irvine Newport Beach, CA
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
SoC Communication

Communication between IPs in such complex systems


significantly affects system performance!
Communication Architectures
► Several bus based communication architectures commonly
used in SoC designs
 AMBA (2.0, 3.0)
 IBM CoreConnect
 Wishbone
► Key Features
 High Performance System Bus
► processors, memory, DMA etc.
 Low Bandwidth Peripheral Bus
► timer, interrupt controller, UART etc.
AMBA 2.0

► AHB ► APB
 Pipelined  Low Power
 Burst modes  Simple Interface
 Split transactions  Single Master
 Multiple masters
AMBA 3.0
► Introduces AXI high performance protocol
 Out of order completion
 Fixed mode bursts
 Advanced system cache support
►Specify if transaction is cacheable/bufferable
►Specify attributes such as write-back/write-through
 Enhanced protection support
►Secure/non-secure transaction specification
 Exclusive access (for semaphore operations)
Issues
► Selecting and configuring these
architectures for optimal PE
performance is a critical activity
in a SoC design Interface
 bus architecture
(e.g. AMBA 2.0, AMBA 3.0
CoreConnect)
 architecture parameters

Interface
(e.g. bus width, burst size)
?

PE
 bus topologies
(e.g. shared, hierarchical)
 protocol choices
(e.g. arbitration strategies)

Interface

PE
SoC Simulation Speed
Cycle Rate Technology
1 Silicon Reference Design
10-2 HW Emulator
10-3 Transaction Model
10-4 Cycle Accurate Model
10-6 RTL Model
10-7 Gate Level Model

► Capturing a complete SoC design at RTL level and


then simulating for exploration is
 too slow (~10–100 cycles/s)
 cumbersome to capture all the detail
 too late in the design flow for exploration!
Problem Definition
► Toenable exploration of the System-on-Chip
communication design space
 early in the design flow
 good accuracy
 fast simulation speed (>> 100K cycles/s)
 rapid system prototyping
 IP reuse (plug and play IP library)
 Support early development of
►embedded software
►executable (golden) specification of SoC
►system testbenches
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Communication Modeling Approaches
► Cycle Accurate (CA) Models
► Bus Cycle Accurate (BCA) Models
► Transaction Level Modeling (TLM)
► Hybrid Modeling Approaches
Cycle Accurate Models
master slave Algorithm
var1 = a + b; case CTR_WR:
wait(); CTR_WR = in;
REG = d << var1; wait();
bus
wait(); CTR_WR |=0xf;
TLM
HREQ.set(1);
e = REG4 | 0xff
arb wait();
ST_RG = in|0x1
wait(); wait();

pin interface
BCA
• Detailed system debug and analysis

• Time consuming to model


- /1 to /3 RTL CA
• Too slow for exploring SoC designs
- 100x RTL
Register Transfer Level
Bus Cycle Accurate Models
master slave Algorithm
… …
var1 = a + b; case CTR_WR:
REG = d << var1; CTR_WR = in;
bus
HREQ.set(1); CTR_WR |=0xf;
TLM
e = REG4 | 0xff
wait(3, SC_NS);
arb ST_RG = in|0x1
wait(3,SC_NS);
… …

pin interface
BCA
• High level system exploration

• Still time consuming to model


- /5 to /10 RTL
CA
• Still slow for exploring SoC designs
- 100x to 500x RTL
Register Transfer Level
Transaction Level Models
master slave Algorithm
channel
… …
var1 = a + b; case CTR_WR:
d = d << var1; CTR_WR = in;
bus
request(port1); CTR_WR |=0xf;
TLM
e = REG4 | 0xff
wait();
arb ST_RG = in|0x1
wait();
… …

generic channel interface


BCA
• High level system validation and
embedded software development

• Fast to model
- /10 to /50 RTL
CA
• Fast simulation speed, but model not
too detailed for exploring SoC designs
- >>1000x RTL
Register Transfer Level
Hybrid Approaches
master slave Algorithm
… …
var1 = a + b; case CTR_WR:
d = d << var1; CTR_WR = in;
bus
request(port1); CTR_WR |=0xf;
TLM
e = REG4 | 0xff
wait(3, SC_NS);
arb ST_RG = in|0x1
wait(3, SC_NS);
HSEL.set(1); …

pin, transaction interface


• Use Transaction Level Modeling
BCA
(TLM) techniques to speed up Bus
Cycle Accurate (BCA) model
simulation
• Time to model varies (sometimes CA
more than BCA)
• Simulation speed generally slightly
faster than BCA
Register Transfer Level
Hybrid Approaches
► Xinping et al. (ICCAD 2002) use function calls
instead of slower signal semantics to describe
models of AMBA2 and CoreConnect
 resulting models are not detailed enough for accurate
communication exploration
► Caldariet al. (DATE 2003) similarly attempt to
model AMBA2 using function calls for reads/writes
 Bus signals are also modeled : slows simulation
 Clocked threads used extensively : slows simulation
Hybrid Approaches
► Ogawa et al. (DATE 2003) also model data
transfers in AMBA2 using read/write transactions
 use low level handshaking semantics
► Inmid 2003, ARM released the AHB Cycle-Level
Interface Specification
 for modeling AMBA AHB at CA level in SystemC
 function calls emulate bus signals at interface
 Scope for improving speed by reducing
number of calls
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
CCATB Modeling Abstraction
► Variant of Hybrid Modeling Approach
 No pins at interface
 read(), write() transaction interface
► Cycle Count Accurate at Transaction Boundaries
 maintains overall cycle accuracy, essential for system
exploration
► Trades off intra transaction visibility for
simulation speed
 more than 1.5x faster than fastest BCA models
Timing Analysis
CCATB
► Model Abstraction
 IPs modeled at behavioral level
 Bus model extends generic TLM channel, adding
►Timing
►Bus protocol details
► Communication Interface
 extension of read(), write() transactions from TLM
 Protocol details (e.g. burst size, cache hints) need to be passed
► Modeling Language - SystemC
 fast (C/C++ native execution)
 provides constructs (concurrency, timing) for hardware modeling
 extensive commercial tool support (debugging, waveform
viewing)
Exploration with CCATB Models
► Bus Architecture
 e.g. AMBA 2.0 or 3.0 or Coreconnect
► Bus widths
 e.g. 16/32/64 bits
► Burst Sizes
 for DMA and other bus masters
► Bus Hierarchy/Topology
 e.g. Single or Multi layer
► Arbitration Strategy
 e.g. static priority, TDMA, RR
► Buffer Sizes
 e.g. for queued out of order request completion
► Advanced Modes
 e.g. OO completion, CACHE/BUFFER hints
► IP Cores
 processor/peripherals
Master Bus Slave
msg.length = 1; get_requests(r); status read(a, msg)
addr = TIMER_REG2; sl_req = arbitrate(r); { switch (addr)
write(bus->port1, addr, a = decode(sl_req); {
msg); if (a.read) case TIMER_REG2:
wait(); st= read(a, sl_req); msg.data = t_reg2;
… else x.stat = SLV_OK;
st = write(a, sl_req); return x;

read/write
(addr, data_control_token)

request + arbitration +
decode cycle delay

Slave delay

Burst + pipeline + busy +


interface + slave + add. transaction status
arbitration delay

Simulation
Slave response Time
CCATB Transaction Token Fields

Request field Description


m_data pointer to an array of data
m_burst_length length of transaction burst
m_burst_type type of burst (incr, fixed, wrapping etc.)
m_byte_enable byte enable strobe for unaligned transfers
m_read indicates whether transaction is read/write
m_lock lock bus during transaction
m_cache cache/buffer hints
m_prot protection modes
m_transID transaction ID (needed for OO access)
m_busy_idle schedule of busy/idle cycles from master
m_ID ID for identifying the master
COMMEX Design
Framework
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Exploration Study

Broadband Communication SoC Platform


Exploration Study
► Used platform for exploring AMBA 2.0 and AMBA
3.0 configurations
 Communication Protocol Comparison
 Arbitration strategies
 Topology configurations
 Optimal Buffer Size
 Simulation Speedup
Software Applications
► We use three application benchmarks running on
ARM926EJ-S ISS for simulation
 COMPLY
► Configures USB, SWITCH and DMA
 USBDRV
► Configures USB, DMA but restricts SWITCH activity
 SWITRN
► Configures SWITCH, DMA but restricts USB activity
Bus Protocol Comparison

Transactions (read/write) / sec

COMPLY

USBDRV AMBA3 (AXI)


AMBA2 (AHB)

SWITRN

0 500 1000 1500 2000


Arbitration Strategies

Transactions (read/write) / sec

2000
1800
1600
1400
1200 COMPLY
1000 USBDRV
800 SWITRN
600
400
200
0
Topology Configuration

Original Config = {ARM926, DMA, USB, SWITCH} on 1 bus


Topology Configuration

Config A = {ARM926, DMA} {USB, SWITCH} on 2 busses


Topology Configuration

Config B = {ARM926, DMA, SWITCH} {USB} on 2 busses


Topology Configuration
Conflicts (%)

45
40
35
30
25 COMPLY
20 USBDRV
15 SWITRN
10
5
0
Original config A config B
Effect of Buffer Size on Performance
Transactions (read/write) / sec

1800
1700
1600
1500 COMPLY
1400 USBDRV
1300 SWITRN
1200
1100
1000
1 2 3 4 5 6 7

Comparing performance with different SDRAM Out-of-Order Buffer sizes


Simulation Performance
Transactions (read/write) / sec

1800
1600
1400
1200
1000 CCATB
800 BCA
600
400
200
0
orig_c orig_u orig_s A_c A_u A_s B_c B_u B_s

Comparing speed of transaction based BCA


and CCATB platform models
Outline
► Motivation

► Related Work
► Communication Architectures
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Conclusion
► CCATB models
 1.55x to 2.20x faster than fastest BCA models
 Less Modeling effort compared to BCA models
►Since intra-transaction visibility is not a concern
 Accurate exploration of communication space
►Performance figures comparable in accuracy to detailed
pin accurate BCA models
 Conveniently fit into SoC Design Flow
►Easy to extend TLM level models to get CCATB models
►Easy to refine down to pin accurate BCA level
Thank You!

sudeep@cecs.uci.edu
CCATB
► Plug and play IP models from library
 Master (DMAs, processor ISS etc)
 Slave (Timers, Interrupt Controllers, Memory etc)
 Bus (AMBA 2.0 AHB, AMBA 3.0 AHB etc)
► Performance statistics include
 Arbitration Conflicts
 IP Throughput
 Bandwidth Utilization
 Cycles spent waiting for bus (for all master IPs)
 Instructions/transactions executed
Transaction Level Models (TLM)
► Transactiondefined as exchange of a data or an
event between two components
 data can be single word, a series of words (burst)
or a complex data structure that is transferred
over a bus
► TLM captures reads/writes of register values and
interrupts between various system components
 not concerned with micro architecture (pin details,
cycle accuracy, clock, protocols like handshaking)
COMMEX Features
► Fast communication space exploration at CCATB level
► Seamless interface refinement
 from TLM level down to CCATB level
 from CCATB down to BCA level
► Plug-and-play different IPs effortlessly
 communication architectures (e.g. AMBA2, AMBA3,
CoreConnect)
 masters (e.g. ARM926ej-s, ARM920, ARM940)
 slaves (e.g. simple ITC, vectored ITC)
► Integrate preexisting IPs using SystemC wrapper code
 e.g. ARM CCM models
IBM CoreConnect

► PLB ► OPB ► DCR


 Pipelined  Low bandwidth  Low throughput
► 4 deep read  Burst mode ► 1 r/w = 2 cycles
► 2 deep write  Multiple Masters  Ring type data bus
 Burst modes
 Split transactions
 Multiple masters

S-ar putea să vă placă și