Sunteți pe pagina 1din 5

Using Portable Stimulus

in the Arm World:


Creating bare-metal SW
coherency scenarios

Feedback
Nick September 18, 2017

In my last blog (Navigating SoC Veri cation with Perspec Portable


Stimulus) I introduced the Accellera Portable Stimulus Standard (PSS)
and how Cadence Perspec System Veri er supports the creation of
portable baremetal Arm SoC integration tests using the Perspec PSLib
for multicore Armv8 and Armv8.2 architectures. In this blog we will dig a
little deeper into what PSLib supports and how it can be used Out-of-
the-box to create a rich variety of coherent and I/O coherent scenarios.

It is worth spending a few minutes just revisiting cache and it’s place
across the hierarchy of Arm IP. With the advent of DynamIQ, Arm’s new
cluster microarchitecture, there are a multitude of places where cache
lives:- within each core, usually called L1 cache, this is typically the
smallest and fastest cache in the system, shared between cores of like
type, usually called L2, shared across the cluster, called L3 and shared
across the clusters, which may be called Last Level Cache (LLC) or
System Cache, typically the slowest but largest cache in the system.
There are any number of architectural options available when
constructing such systems and therefore some or all these caches may
be present in your target system. Interestingly with the announcement
of the new CCIX protocol we will soon see Arm-based SoCs which also
share cache from chip-to-chip as well. 

Feedback
Given the number of options and the need to integrate these complex
compute subsystems into bigger SoCs which may also utilize I/O
Coherency to optimize the system performance for high speed I/O such
as PCIExpress, it is essential that the caching is fully exercised before
committing to Silicon as a bug in the integration of the SoC could prove
disastrous.

To address this growing complex challenge Cadence developed a rich set


of portable actions which comprise the Perspec PSLib, they are readily
assembled into target scenarios with code then being generated at the
push of a button. In fact for two common cache testing scenarios, the
library provides a complete scenario ready-made.

False Sharing
I will now explain in a little more detail the “False Sharing” scenario, look
for my next blog coming soon which will detail the “True Sharing”
scenario. 

False Sharing is a situation where cache lines are being used by a number
of cores, and hence the system considers them shared data, but in fact
the cores are using exclusively different parts of the cache line and
therefore do not actually share data with each other.

The gure below shows by colour which core is using which bytes of the
64 byte cache line. We can immediately see that within each cache line,
regions of data are exclusively used by one core only (one colour). This is
what we mean by False Sharing.

Feedback

Also notice the regions are not of regular size, but obviously a whole
number of bytes. The permutations of False Sharing situations are
enormous especially when considering the hierarchical cache
architecture permutations. Creating baremetal SW scenarios to cover a
good number of permutations using hand-written code would be a
signi cant challenge.

The PSLib provides a ready-made scenario to create such scenarios with


a number of degrees of freedom, the Perspec generator provides
multiple tests generated from one single use-case greatly increasing test
writer productivity. The beauty of the Portable Stimulus model is that
these scenarios can be intermixed with your own scenarios creating
stress tests that are uniquely targeting your SoC, for example maybe you
want to mix cache stress with power management, this is readily
achieved with Perspec. 

Very easily, complex multithreaded uses-cases can be created for any


number of cores with randomly selected regions of shared memory, see
the example below.

Feedback
Perspec is able to generate a huge number of speci c test cases, the
diagram above is one speci c solution, through powerful constraint
solver technology and the PSS model which abstractly de nes data
dependency independent of action ordering. This brings huge
productivity to the test writer as one test can create hundreds of
possible solutions, the user can pick one and then run it on the SoC they
are working on.

In the next blog I will dig a little deeper into how tests are created and
how users can use coverage to decide which test or tests they want to
run.

 0 comments  0 members are here


SoC Design blog

SoC Design blog: SoC Design blog: SoC Design blog:


Simplifying workload Docker enables Arm AMBA ATP: Gaining
modeling with Cycle Model Studio momentum with
AMBA ATP Engine on Ubuntu workload modeling
Following the release of the Arm Cycle Model Studio (CMS) This blog presents examples of
AMBA Adaptive Traf c Pro les is a great tool to create SystemC AMBA ATP adoption and the
(ATP) Speci cation, we are simulation models from Verilog bene ts that Arm and its
pleased to announce the AMBA RTL source code. This articles ecosystem partners deliver.
ATP Engine, to further facilitate shows how to use Docker to run
Francisco Socal
ATP’s adoption into a variety of CMS and create models on an
platforms. Ubuntu machine.

Feedback
Francisco Socal Jason Andrews

S-ar putea să vă placă și