Documente Academic
Documente Profesional
Documente Cultură
SystemC
Pao-Ann Hsiung,
Embedded Systems Laboratory,
Department of Computer Science and Information Engineering,
National Chung Cheng Univ., Taiwan.
1
Contents
Introduction to Transaction level
Modeling
Programmer’s View: PV
Architect’s View: AV
Verification View: VV
Conclusions
2
Introduction to Transaction
level Modeling (TLM)
Transaction:
A single object that encompasses a
sequence of signals and handshakes
required for system components to
exchange data
What is TLM?
Communication uses function calls
Ex: burst_read(char* buf, int addr, int len);
3
Introduction to TLM
What is the primary goal of TLM?
To achieve dramatically increased simulation
speeds,
While still offering enough accuracy for the design
task at hand.
Why TLM is interesting?
Fast and compact
Integrate HW and SW models
Early platform for SW development
Early system exploration and verification
Verification reuse
4
Introduction to TLM
How is TLM being adopted?
Widely used for verification
TLM for design is starting at major
electronics companies
Is it really worth the effort?
Yes, particularly for platform-based design
and verification
5
Introduction to TLM
How to achieve the goal?
TLM provides a way of minimizing the
number of events and amount of
information that has to be processed
during simulation.
No driving of individual signals
Exchange only “the data payload”
Separates communication from behavior
Supports different levels of abstraction
6
Introduction to TLM
Principles of TLM in SystemC
The TLM is built as set of interfaces that
define how models communicate.
3 Layers:
Application/User Layer
Protocol Layer
Transport Layer
7
Layered TLM API’s
8
Layered TLM API’s
9
TLM Examples (Router, Arbiter)
10
TLM Examples (Cross Bar)
11
TLM Basic Interfaces
Bidirectional Blocking Interface
Unidirectional Interface
Unidirectional Blocking Interface
Unidirectional Non-blocking Interface
In SystemC,
blocking: SC_THREAD, or SC_CTHREAD
Can contain “wait()”
non-blocking: SC_METHOD
No “wait()”
12
Bidirectional Blocking Interface
The most simple communication and
synchronization mechanism
REQ: Information data type provided by an
initiator in a communication
RSP: Information data type received from
a target in a communication
13
Unidirectional Interface
Communication interfaces
‘put’ and ‘get’ calls
Boolean return value
success or failure
Event interfaces
data available or will be available
14
Unidirectional Blocking
Interface
15
Unidirectional Non-blocking
Interface
16
TLM Channels
tlm_fifo<T>
Current implementation of tlm_fifo is almost
identical to sc_fifo
Extensions under consideration:
peek / pop / poke
Shrink / unshrink
Dynamic resizing
Any extensions will use request/update and clearly
distinguish between blocking and nonblocking.
17
TLM Channels
tlm_req_rsp_channel<req, rsp>
Two fifos, one for requests and one for
responses
It also implements
tlm_transport_if<req,rsp>
So it can act as converter between
bidirectional and unidirectional modules
eg for arbitration
18
TLM Channels
tlm_req_rsp_channel<REQ,RSP>
implements 7 interfaces
Initiator: put request, get response
Target: get request, put response
Master,
Slave,
tlm_transport_if<REQ,RSP>
19
Master/Slave Interfaces
20
Transport Interface
21
SoC Design Tasks and TLM
Consistent set of views on the SoC
named after the design tasks:
Programmers View (PV)
For the embedded software engineers
Architects View (AV)
For the embedded hardware engineers
AV = PV + T
Verification View (VV)
For the verification engineers
VV == CC (Cycle-Callable)
22
TLM Design Flow (3 views)
No need if SW
development No need if HW
boards exist platform is
predefined
23
Programmers View (PV)
Contents
Introduction
Protocol Layer
Data Structures
Implementation in SystemC
How to introduce an ISS in PV
Performance of PV
Example PV Blocks
24
Introduction to PV
A PV model needs
register accuracy
bit accuracy
interrupt handling
Notable exception
timing critical SW
E.g., real-time embedded SW
Requires cycle accuracy
25
Introduction to PV
PV should link efficiently to an
instruction set simulator (ISS)
Continue using all SW tools:
RTOS, Compiler, Debug Environment
Develop SW on a functionally correct
representation of target system
Requires
Bidirectional Blocking TLM Interface
26
Typical PV Model
27
User Layer
Protocol Layer
Protocol Layer
Transport Layer
Protocol Layer
Protocol Layer
Transport Layer
29
Data Structures
2 structures
Request (REQ): initiator Æ target
Address,
Write data,
Transaction type: read/write,
Attributes: access size, byte enable
Response (RSP): target Æ initiator
Read data,
Response structure: OK, error, …
30
Implementation in SystemC
TLM transport interface:
31
Implementation in SystemC
Target:
The target has PVTarget ports which
are built using sc_export instantiated
with the PV_if interface defined above.
sc_export is a System C 2.1 feature,
which maintains a pointer to an
interface.
32
How to introduce an ISS in PV
To link an ISS to a PV router, the
implementation of the PVReq and
PVResp data types are driven by the
typical requirements of an ISS.
An example definition for Request and
Response is as follows:
33
Request (PVReq)
Allows burst
transaction in
one API call
enum
PVType Allows smaller
{pvWrite=0, data sizes than
pvRead=1} normal (byte)
Burst type,
protection, etc.
34
Response (PVResp)
enum
PVResponse
{pvOk=0,
pvError=1}
35
Integration of ISS
An ISS is integrated as an object having
A method to fetch, decode and execute a
single instruction,
A memory access API,
Methods to set interrupt flags, and
… (other important interfaces)
Integrated as sc_module in SystemC
36
Performance of PV
Each wait() is a CONTEXT SWITCH!!!
To increase simulation speed
Minimize the number of threads
Optimize wait()
Manage RSP within initiator
37
1. Minimize # threads
Rules to decide if a component needs a
thread:
Is the component an ISS model?
Does it lead independently to an interrupt
for an ISS?
Does it model synchronization between
different processor independent of the SW
running on them?
38
2. Optimize wait()
Check rate of synchronization!
Example:
Processor: 200 MIPS
Timer: sends an interrupt every 10 ms
Processor can execute 2 million instructions before
calling wait(), instead of calling wait() after every
instruction
Timer can wait for the total time immediately,
without calling wait() at every count-down
Accuracy reduced: reprogramming takes effect only on
next timer tick
39
3. Manage RSP in Initiator
Creating and deleting RSP in target may
incur large overhead
Manage RSP in initiator for
Efficiency
Safety
40
Example PV Blocks
Simple Peripheral
DMA Model
Simple Initiator
41
PV Example: Simple Peripheral
Simpler Implementation Î
Faster Simulation
Preferred implementation
Pure functional call
To impact system timing
Can use wait() [note: slow!]
Use latency parameter in RSP
42
PV Example: Simple Peripheral
43
PV Example: DMA Model
Initiator programs start/end addresses,
Initiator enables DMA through memory
mapped access
DMA function is called with the last
access
DMA moves data from source to dest
DMA function returns to initiator
44
PV Example: DMA Model
45
PV Example: Simple Initiator
PV Initiator
Implemented as SC_THREAD
wait(n, SC_NS):
n = (clock period or expected instruction rate)
* (clock ticks between different sync points)
2 Ports:
PVInitiator_port for Memory
PVTarget_port for Interrupt
46
PV Example: Simple Initiator
47
PV Example: Simple Initiator
48
Architects View (AV)
Contents
Introduction
Protocol Stack built on a Channel implementing 4
Unidirectional Non Blocking interfaces
Data Structures
Implementation in System C: Explicit Timing
Implementation in System C: Implicit Timing
Convenience functions in Protocol Layer
Mixing Modeling Styles
SW debugging and PV-AV run-time switching
A Simple Memory AV Model
49
Introduction
AV = PV + T
Key problem:
to enhance the early architecture trade-off
for system design,
to set right design constraints for HW and
SW engineers
50
Architectural Exploration
51
Introduction
Sufficiently accurate model of
Communication timing,
Resource sharing (e.g. memory),
Data size
Open enough to allow to
Insert more detailed implementation models
ISS,
refined TLM models,
functionally complete models
52
Introduction
AV has sufficient timing to
quantify overall performance, and
identify the potential bottlenecks in the
system.
Timing can be either
Explicit, or
Implicit
53
Explicit timing
Timing
is modeled in the initiators and targets,
by using
events and
event synchronization or
SystemC timing modeling mechanisms
Advantages:
More closely related to the actual hardware, and
Internal timing of a block can be modeled more
accurately
54
Implicit timing
Timing accuracy is achieved through
timing annotation in the TLM API calls.
Advantages:
Timing annotation is orthogonal to the
functionality
a system can be refined for timing information
without breaking the previously validated
functionality.
Different block implementations can be
explored by annotating different delay times
55
Protocol Stack built on a Channel implementing
4 Unidirectional Non Blocking interfaces
1 AV Channel = 4 unidirectional
non-blocking interfaces
request
Initiator Target
response
56
4 Interfaces in AV Channel
Initiator Interface:
Request : tlm_extended_put_if
Response : tlm_extended_get_if
Target Interface:
Request : tlm_extended_get_if
Response : tlm_extended_put_if
57
Get method in AV channel
The basic “get” method in the interface
performs three tasks:
read the data (this is what the “peek”
shrink
function does)
consumes the data (free the buffer used to
pop store the data)
notify the event (this is what the “unshrink”
function does)
58
Data Structures
DS in PV + timing
Packet Level
Sets of functionally associated data
combined into Abstract Data Types (ADTs)
REQ & RSP are application dependent
DS in PV + implicit timing
Structures with large amounts of data
E.g.: Complete IP packet
59
Implementation in System C:
Explicit Timing
60
Implementation in System C:
Implicit Timing
61
Convenience functions in
Protocol Layer
62
Mixing Modeling Styles
An adaptor connecting a PV initiator to
an AV channel
63
Mixing Modeling Styles
An adaptor connecting an AV channel to
a PV target
64
SW debugging
With a simple PV-AV adaptor
An instruction accurate ISS can be
connected to an AV
However, PV transactions should not affect
AV states, such as:
To view contents of peripheral registers
To load program into memory
Need specialized debugging interface!
65
PV-AV Runtime Switching
Add a PV interface to the AV interfaces
Invisible channel between I and T
Add a simple switch in all PV initiators to select
between running in
PV mode, OR
AV mode
Useful for evaluating architectures based on an
application running in an OS
Boot OS in PV mode
Run application in AV mode
66
A Simple Memory AV Model
67
A Simple Memory AV Model
68
Verification View (VV)
Contents
Introduction
Protocol Stack built on many Transfers
Data Structures
Implementation in SystemC
Relations between Transfers of same
Transaction
Thread Safety
VV Modeling Example
69
Introduction
As design refinement goes down in
abstraction level, there is a point where
all cycle timing details of a protocol are
important.
To reuse TLM models
Use transactors (TLM adaptors)
Use cycle accurate bus models
70
Protocol Stack built on many
Transfers
A transaction is a single object that
encompasses a sequence of signals and
handshakes required for system
components to exchange data.
A transfer is an atomic operation, such
as the transfer of an address or data.
71
Protocol Stack built on many
Transfers
With every transfer, a set of attributes
are associated which represent the
information of the transaction,
examples of such information are
address, type, size, etc.
VV uses the unidirectional non-blocking
interface as the lowest layer in the
protocol stack.
72
Protocol Stack built on many
Transfers
Examples of protocol layer convenience
functions implemented in ports:
Send a transfer, which will call nb_put.
Test whether a transfer can be sent, by calling
nb_can_put
Receive a transfer, using the nb_get call.
Test whether a transfer can be received, by calling
nb_can_get
Get access to attributes of transfer or transaction
73
Protocol Stack built on many
Transfers
The convenience functions outlined
above are transfer based.
An initiator port will typically have the
obtain and the put interfaces
implemented for the address and write
data transfers and will have the get
interface implemented for the read data
and end of transaction transfers.
74
Data Structures
Typical examples of transfers sent from
the initiator to the target are:
Address Transfer containing address, type
of transaction, access size, burst
information, etc.
Write Data Transfer containing the actual
write data.
75
Data Structures
Typical examples of transfers sent from
the target to the initiator are:
Read Data Transfer containing the read
data (the actual data and not a pointer or
reference).
End of Transaction Transfer containing the
status response (typically OK, error,
etcetera).
76
Data Structures
It is advantageous to store the data
structures containing these transfers in
a centrally allocated circular buffer.
Easier to ensure the required
thread safety and
lifetime properties for the transfers
77
Implementation in SystemC
To access centralized circular buffer
78
Implementation in SystemC
Convenience Functions on ports
80
Relations between Transfers
of same Transaction
Write data transfer sensitive thread is
triggered when the slave has to sample data
from the WDATA pins
81
Cross-references
In case a back to back write and read
transaction (w, r, w, r, w, r, …) are
performed using a pipelined protocol.
It is possible that the slave samples the
write data (for the first transaction)
from the WDATA pins and drives read
data on the RDATA pins (for the second
transaction) at the same point in time.
82
Cross-references
Given a (read or write) data transfer ---
cross references --- allow access to the
address corresponding to this
transaction.
So for the w and r transfers, the
address is obtained as:
83
Cross-references
These 2 pieces of code are executed at the
same point in time and result in a different
address.
Hence we have a coding style which is
independent of relative timing of address and
corresponding data.
Cross-references are implemented more
efficiently using a centralized memory
allocation
84
Write Transaction
85
Read Transaction
86
Thread Safety
Special care is to be taken when
different threads access the same data
Data is deleted too soon, or not at all
Data is overwritten by another thread
before the first finished
87
Thread Safety
3 rules for safety
Master (initiator) manages all data
PV and AV uses this
Protect against editing data, use coding
style
VV uses this
Copy data whenever there is a wait() call
in a target module, or not call wait() at all
Too slow
88
Thread Safety
Local data of modules
Data values may be changed by other
threads
Different: Before wait() vs. After wait()
Thread continues with wrong data!!!
Solution: some protocol needed to ensure
data coherency
89
Thread Safety
General Rule:
Avoid wait() calls in TLM interface modules
Use latency parameter in PV
Use non-blocking unidirectional interfaces
with implicit timing style in AV
90
VV Modeling Example
A Peripheral modeled by a Clocked
Thread
A lot of unnecessary simulation events
Models peripheral timing very accurately
91
VV Modeling Example
Implementation of clocked thread
92
VV Modeling Example
Alternative Model
2 threads: only triggered when actual
transfer available
Enormous speed improvement!
93
VV Modeling Example
Implementation of Alternative Model
94
Conclusions
An overview was given of the different TLM
modeling styles and how they can be applied
to the different SoC design tasks.
The API’s and their usage was presented in
some detail, but by no means is this a
complete guide to TLM modeling.
The intention is to show how to apply TLM
modeling techniques and how to create a
TLM modeling style for a specific purpose.
95