Sunteți pe pagina 1din 4

IEEE EMBEDDED SYSTEMS LETTERS, VOL. 6, NO.

3, SEPTEMBER 2014 53

S2CBench: Synthesizable SystemC Benchmark Suite


for High-Level Synthesis
Benjamin Carrion Schafer, Senior Member, IEEE, and Anushree Mahapatra, Student Member, IEEE

AbstractHigh-level synthesis (HLS) is being increasingly used common language supported by all of the main HLS tools: Sys-
for commercial VLSI designs. This has led to the proliferation of temC. There have been many discussions on which is the best
many HLS tools. In order to evaluate their performance and func- input language for HLS [2], but eventually SystemC has gained
tionalities, a standard benchmark suite in a common language sup-
ported by all of them is required. This letter presents a bench- wider acceptance over ANSI-C and C++ mainly due to the IEEE
mark suite, which complies with the latest Synthesizable SystemC standardization efforts of OSCI (now Accellera). OSCI set up a
standard, called S2CBench. The benchmarks have been carefully working group to define a synthesizable SystemC subset cur-
chosen to not only include applications of different sizes and from rently supported by all of the HLS tools. Thus, in order to facil-
various domains typically used in HLS (e.g., encryption, image and itate the evaluation of different commercial HLS tools, a synthe-
DSP application), but also to test specific optimization techniques
in each of them. This allows an easy comparison of not only quality sizable SystemC Benchmark suite is required. This benchmark
of results (QoR) of the different HLS tools under review, but also suite should not only cover applications of different domains
to test their completeness. and designs of different sizes, but should also include structures
Index TermsBenchmark testing, design automation, high level to test specific optimization techniques. The success of many
synthesis. HLS tool evaluations often depends on the expertise of the field
application engineer (FAE), since the original behavioral de-
scriptions often have to be manually modified to obtain the best
I. INTRODUCTION QoR. Re-writing of behavioral descriptions can sometimes be
considerably time-consuming and should be taken into account

H IGH LEVEL synthesis has evolved significantly over the


last decade and the QoR of commercial HLS tools has
improved to the level where HLS has begun to be used for com-
when evaluating different HLS tools. By designing a benchmark
suite that tests specific features, designers can easily understand
the strengths and weaknesses of each tool. The main features
mercial designs. The adoption of HLS as part of standard VLSI to be tested may be classified in three main categories: 1) lan-
design flows has led to the proliferation of HLS tools. The main guage support (e.g., templates, structures, and fixed point data
problem faced by many designers, wanting to transition from types); 2) synthesis optimizations (multidimensional array ex-
traditional RTL to C-based design, is the absence of validated pansion, polynomial decompositions, functions synthesis, loop
standards to evaluate the different HLS tools available (a good unrolling, pipelining and array synthesis); and 3) tool perfor-
comparative review of these can be found at [1]). The evalua- mance (e.g., synthesis running time and accuracy of area and
tion phase is crucial in order to find the best HLS tool for the timing report).
type of applications being designed, but the lack of expertise Multiple efforts in the area of general-purpose computing
of most RTL designers in HLS combined with busy schedules benchmarks have been made since the 1980s, with the SPEC
makes it hard to set up an efficient evaluation methodology. benchmark suite [3] as one of the earliest one. The SPEC
Moreover, HLS tools depend on a wide range of vendor specific benchmark mainly intends to analyze the performance of
optimization features in order to get good QoR making them major system components. More recent benchmark suites
hard to master. Finally, different tools support different input specialize on specific domains. E.g., MediaBench [4] focuses
languages complicating the HLS tools evaluation process fur- on multimedia applications and MiBench [5] on modern em-
ther. This often results in designers sending some behavioral de- bedded applications. These benchmarks are often used as HLS
scriptions to the different HLS vendors, after which the vendors benchmarks because they represent computationally intensive
manually optimize the designs for their particular tool. The de- applications amiable to HLS. However, the benchmark pro-
signers eventually determine the appropriate tool by evaluating grams must first be made synthesizable and allow only a very
the QoR of the synthesized circuits. Fortunately, there exists a limited number of HLS features to be tested.
In terms of HLS benchmarks, some of the initial efforts were
Manuscript received December 08, 2013; accepted April 20, 2014. Date of the HLS92 and the HLS95 benchmarks [12]. Although many of
publication April 28, 2014; date of current version August 26, 2014. This man- the optimizations targeted in the aforementioned benchmarks
uscript was recommended for publication by M. Balakrishnan.
are still relevant, they were written in behavioral VHDL, which
The authors are with the Department of Electronics and Information Engi-
neering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong is currently not supported by any commercial HLS tool. A more
Kong (e-mail:b.carrionschafer@polyu.edu.hk; anushree.mahapatra@con- recent effort is the CHStone [6] benchmark suite. Similar to our
nect.polyu.hk).
work, it targets HLS and includes a set of ANSI-C programs.
Color versions of one or more of the figures in this letter are available online
at http://ieeexplore.ieee.org. The main drawback of using ANSI-C is, as mentioned before,
Digital Object Identifier 10.1109/LES.2014.2320556 that some of the main commercial HLS tools do not support

1943-0663 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
54 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 6, NO. 3, SEPTEMBER 2014

ANSI-C. E.g. Fortes Cynthesizer [7] only supports SystemC for basic HLS optimization techniques which include loop un-
and Calyptos CatapultC [8] supports C++ and SystemC, but not rolling, array synthesis (register or memory) and function syn-
ANSI-C. Other tools like NECs CyberWorkBench [9], support thesis with pointer argument support. It should be noted that var-
SystemC and a subset of C called BDL (Behavioral Description ious disciplines require fast sorting. Thus, qsort design could be
Language) which forces descriptions given in C to be manu- categorized into many other categories.
ally converted into this subset. Another important limitation of Sobel (dd): The sobel filter is an edge-detection algorithm
ANSI-C benchmarks for HLS is that ANSI-C does not support that takes a bitmap image directly as the input and returns a
fixed point data types, which is extremely important in DSP ap- new bitmap image solely consisting of the edges of the orig-
plications, often targeted in HLS. inal image. The program specifically checks for nested loop un-
In this letter, we present S2CBench, a freely available syn- rolling and pipelining optimizations, I/O ports expansion (ex-
thesizable SystemC benchmark suite [10], consisting of pand inputs specified as arrays to individual ports), multidi-
programs targeting a variety of applications typically used in mensional arrays expansion, fixed arrays synthesized as logic
HLS. 12 benchmarks comply with the latest SystemC synthe- or ROMs and pointer arguments to functions.
sizable subset draft, while one design (FFT) is nonsynthesizable A2 Security: The security category includes several algo-
because it contains trigonometric and floating point operations. rithms for data encryption and hashing. HLS has proved to be a
This design was added in order to help users understand how very good solution for designing data security applications due
the different HLS tools support these operations as most com- to their mathematical complexity. Moreover, these types of ap-
mercial tools have vendor specific ways to support them. The plications are very difficult to verify in RTL [11].
benchmarks are classified into different categories of applica- aes_cipher (dd): Advanced encryption standard cipher en-
tions. One of the unique features of the benchmark suite is that cryption algorithm performs AES encryption/decryption. This
every application is accompanied by its respective testbench. program consists of many user-defined functions. It contains a
The testbench contains test vectors stored in a file and com- large number of small for loops having interloop data depen-
pares the simulation results with a golden output included for dencies. The main optimization techniques addressed are input
each design. The test vectors are not fixed and can be modified port expansion, array synthesis (memory or registers), function
by the user. The option to create a waveform is also available. In synthesis (inline or goto operators) and large fixed arrays syn-
this case, the simulation will produce a VCD file, which may be thesized as logic or ROMs.
viewed by any waveform viewer. Finally, each benchmark is de- kasumi(dd): Kasumi is a block cipher algorithm used in mobile
signed to test specific optimization options allowing designers, communication systems. The SystemC description includes two
evaluating different HLS tools, to understand the support of the threads and multiple functions. Therefore, this design is useful
tools for these options. to verify the synthesis and especially the verification of multi-
process systems. The design also contains multidimensional I/O
II. S2CBENCH OVERVIEW ports and multiple arrays. Finally, the kasumi algorithm, similar
Synthesizable SystemC Benchmark (S2CBench) suite is a to most encryption applications, contains large amount of logic
collection of programs following the latest SystemC operations (e.g., and, or, xor). HLS tools are notably not efficient,
Synthesizable Subset draft 1.3. Some of the main objectives of for accurately estimating the critical path of these applications,
S2CBench are: because the discrete delay of all the operations are simply added,
to enable the direct comparison of commercial HLS tools; thus overestimating the critical path. This application can provide
to test specific tool features classified as language support, an indication of the accuracy of the HLS timing report, compared
synthesis optimization techniques, and tool performance; to that of the logic synthesis result.
to help researchers analyze and compare their own tech- md5C (dd): The message digest algorithm is widely used
niques. in cryptography to generate hash functions and check data in-
tegrity. MD5C is a single process design consisting of multiple
A. Benchmark Programs functions, arrays of different bit widths and different levels of
Every benchmark program in S2CBench is designed to test loop nesting. One of the unique language constructs to be tested
particular features. Described below, is a brief overview of with this design is the extensive use of define macros.
the programs included in the suite, each categorized according snow3G (dd): Snow 3G is a stream cipher that produces a
to its application domain. The benchmarks are also classified key stream that consists of 32-bit blocks using a 128-bit key.
into data-dominant (dd) or control-dominant (cd) designs. HLS Apart from the main optimization options, this design tests the
normally achieves very good results for the former category support of HLS tools for templates. A variable length multipli-
while it sometimes creates suboptimal designs for the latter cation operation is performed in this algorithm, which may be
category. easily simplified using templates.
A1 Automotive and Industrial: The category includes appli- A3 Telecommunication: With the explosion of portable elec-
cations normally used in embedded control systems, which per- tronic devices using wireless communication, constrained by
form extensive basic math operations and bit manipulations. limited power budgets, some of the telecommunication func-
qsort (dd): Quick sort design sorts data in ascending order tions are frequently being implemented as custom HW blocks
using the well-known quick sort algorithm. Sorting of data is im- in SoCs (Systems on Chip). HLS is a natural choice for most
portant for designs, so that they can be analyzed easier and pri- of the complex applications in this domain, having well known
orities be established. This design helps in verifying the support legacy C descriptions.
SCHAFER AND MAHAPATRA: S2CBENCH: SYNTHESIZABLE SYSTEMC BENCHMARK SUITE FOR HIGH-LEVEL SYNTHESIS 55

adpcm (cd): Adaptive differential pulse-code modulation TABLE I


(encoder part only) accepts 16-bit Pulse Code Modulation BENCHMARK DOMAIN AND OPTIMIZATION SUMMARY
(PCM) samples as input and converts them into 4-bit sam-
ples. Some of the optimization techniques that can be tested
with this design are loop unrolling, function synthesis, fixed
array synthesis, the most important being the support for in-
clusion of structures. Some HLS tools do not support the use
of structures, forcing the designer to rerewrite the original de-
scriptions manually.
fft (dd): (Fast Fourier Transform): The fft algorithm is the
only design in the benchmark suite that is not synthesizable
since the design includes floating- point data and trigonometric
operations, which are not synthesizable as per the latest synthe-
sizable subset draft. However, the design has been included as
part of the suite since most HLS vendors do support floating
point and trigonometric operations, though the process of syn-
thesizing is vendor specific. It is important to understand the
level of support for these operations by the HLS tool.
A4 Consumer: Consumer benchmarks represent applications
that are closely related to multimedia and digital signal pro-
cessing (DSP) applications. The focus of this domain is pri-
marily on filters as HLS has shown to produce very good results disparity (cd/dd): This program estimates the disparity in a
for applications involving filters and is widely used in this field. stereoscopic image. It is the largest of all the designs and con-
fir (dd): The fir filter is a 10- tap FIR filter algorithm de- sists of 4 processes executed in parallel (sc_cthreads). The de-
signed for 8- bit integer operations. The aim of this design is sign can serve to compare the synthesis running times of the
to check for loop unrolling, automatic array expansion of the different tools, since the main thread contains a large number
I/O ports and accepting pointers to functions. This program can of loops leading to extreme long synthesis run times, in case
be pipelined as well. of the loops being fully unrolled or pipelined. This design al-
decimation (dd): The algorithm is a 5-stage decimation filter. lows the testing of most of the optimization techniques of the
It consists of 5 FIR filters cascaded together where the output of designs described previously as well as the ability of the HLS
one stage is the input to the next stage. The main purpose of the tools to deal with hierarchical designs and their respective syn-
design is to evaluate the level of resource sharing that the HLS thesis running times. The design contains control as well as data
tool can extract by sharing the Multiply Accumulate (MAC) dominant parts.
operations of the filtering function across multiple loops. The Table I shows the summary of all the designs included in the
secondary purpose is to verify that the generated RTL is able S2CBench benchmark suite and the associated targeted opti-
to preserve the sum of product (SoP) construct provided in the mization features. Most of the designs include loops, arrays and
SystemC code, so that the logic synthesis tool can optimize the functions and hence are omitted in designs having unique fea-
construct further. Finally, this design also serves to identify if tures.
the HLS tool supports fixed- point data types and its different
rounding and saturation modes.
interpolation (dd): The algorithm is a 4-stage interpolation B. Benchmark Characteristics
filter. Apart from the above mentioned optimization techniques
like loop unrolling and arrays synthesis, the main purpose of As discussed in the previous section, the designs range from
this design is to test if the HLS tool can perform automatic poly- smaller single process designs (e.g., quick sort or FIR filter) to
nomial decompositions. Significant area reduction can be ob- larger multiprocess designs (e.g., kasumi and disparity).
tained if polynomials can be decomposed into terms, so that the Table II shows the detailed composition of the designs. The
total number of arithmetic operations required is reduced. Sim- table is divided into two categories; the first category describes
ilar to the decimation filter, this design also makes extensive use the number of program lines (not including comments, blank
of fixed point data types. lines or the testbench), number of processes, functions, loops
idct (dd): The inverse discrete cosine transform expresses a and arrays while the second category describes the variety of
finite sequence of data points in terms of, a sum of cosine func- operations in the code.
tions of different frequencies. It is normally used in many ap- From this table, it may be observed that the security appli-
plications, e.g., image compression (jpeg) and solution of par- cations contain a large number of logic operations and com-
tial differential equations. The synthesizable SystemC version parisons, while the DSP applications require many adders and
of this algorithm included in the suite serves to test most of the multipliers mainly to compute the MAC operations of the fil-
techniques already described previously, but as a unique lan- tering stage. All designs included in this benchmark suite are
guage support feature, to test the initialization of an array using directly synthesizable without any modifications except for the
#include statement. FFT which is nonsynthesizable.
56 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 6, NO. 3, SEPTEMBER 2014

TABLE II
BENCHMARK PROGRAM CHARACTERISTICS

which is freely available online. All designs were successfully


synthesized using a commercial HLS tool [9] for validation.
S2CBench is mainly targeted for designers wanting to evaluate
different commercial HLS tools, as all of main commercial HLS
tools support SystemCs synthesizable subset. The test cases
have been carefully chosen to represent different application do-
mains amiable to HLS and each of them serves to test the ex-
tension of the language support, specific synthesis optimizations
and tool performance.

REFERENCES
Fig. 1. Structure for testbench validation.
[1] W. Meeus, K. V. Beeck, T. Goedeme, J. Meel, and D. Stroobandt, An
overview of todays high-level synthesis tools, presented at the Design
III. BENCHMARK VALIDATION Automation for Embedded Systems, 2013.
[2] D. Gajski, T. Austin, and S. Svoboda, What input-language is the best
A SystemC testbench is provided with all of the designs in for high level synthesis (HLS)?, in Proc. DAC, 2010, pp. 857858.
order to verify their functionality. Fig. 1 shows the modular [3] K. M. Dixit, SPEC benchmarks, J. Parallel Comput., vol. 17, pp.
structure of the testbench interface with the synthesizable 11951209, Dec. 1991.
[4] L. Chunho, M. Potkonjak, and W. H. Mangione-Smith, MediaBench:
design. The testbench module contains a send and a receive A tool for evaluating and synthesizing multimedia and communica-
process which are modeled as clocked threads in SystemC. tions systems, in Proc. 30th Annu. ACM/IEEE Int. Symp. Microarchi-
The send process transmits data to the unit under test (UUT) tecture, 1997, IEEE Computer Society.
[5] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge,
continuously until the test vectors stored at the input file are and R. B. Brown, MiBench:A free, commercially representative
exhausted and the receive process receives the data from the embedded benchmark suite, in Proc. IEEE Int. Workshop Workload
UUT and stores the output data into another text file. Finally, Characterization (WWC-4), 2001, pp. 314.
[6] Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii, CHStone:
the simulation result is compared with the golden output before A benchmark program suite for practical c-based high-level synthesis,
the simulation finishes and any discrepancies reported. Addi- in Proc. Circuits Syst. (ISCAS), 2008, pp. 11921195, IEEE.
tionally, the testbench also contains the option to dump a VCD [7] Cynthesizer Tool, Forte design systems [Online]. Available: http://
www.forteds.com/.
file in order to view the waveform of the main signals. The [8] Catapult HLS tool [Online]. Available: www.calypto.com.
input stimuli are all stored in text files and can be modified by [9] CyberWorkBench, NEC [Online]. Available: www.cyberworkbench.
the user with the exception of the sobel and disparity estimator, com.
[10] S2CBench [Online]. Available: http://www.s2cbench.org/.
which take bitmap file as input as indicated in Table II. [11] S. Morioka, T. Isshiki, S. Obana, Y. Nakamura, and K. Sato, Flexible
architecture optimization and ASIC implementation of group signature
IV. CONCLUSION algorithm using a customized HLS methodology, in Proc. IEEE Int.
Symp. Hardware-Oriented Security Trust (HOST), 2011 , pp. 5762.
This letter presents a SystemC benchmark suite which com- [12] P. R. Panda and N. D. Dutt, 1995 high level synthesis design reposi-
plies with the latest Accelleras synthesizable subset draft and tory, in Proc. 8th Int. Symp. Syst. Synthesis, pp. 170174, ACM.

S-ar putea să vă placă și