Documente Academic
Documente Profesional
Documente Cultură
3, SEPTEMBER 2014 53
AbstractHigh-level synthesis (HLS) is being increasingly used common language supported by all of the main HLS tools: Sys-
for commercial VLSI designs. This has led to the proliferation of temC. There have been many discussions on which is the best
many HLS tools. In order to evaluate their performance and func- input language for HLS [2], but eventually SystemC has gained
tionalities, a standard benchmark suite in a common language sup-
ported by all of them is required. This letter presents a bench- wider acceptance over ANSI-C and C++ mainly due to the IEEE
mark suite, which complies with the latest Synthesizable SystemC standardization efforts of OSCI (now Accellera). OSCI set up a
standard, called S2CBench. The benchmarks have been carefully working group to define a synthesizable SystemC subset cur-
chosen to not only include applications of different sizes and from rently supported by all of the HLS tools. Thus, in order to facil-
various domains typically used in HLS (e.g., encryption, image and itate the evaluation of different commercial HLS tools, a synthe-
DSP application), but also to test specific optimization techniques
in each of them. This allows an easy comparison of not only quality sizable SystemC Benchmark suite is required. This benchmark
of results (QoR) of the different HLS tools under review, but also suite should not only cover applications of different domains
to test their completeness. and designs of different sizes, but should also include structures
Index TermsBenchmark testing, design automation, high level to test specific optimization techniques. The success of many
synthesis. HLS tool evaluations often depends on the expertise of the field
application engineer (FAE), since the original behavioral de-
scriptions often have to be manually modified to obtain the best
I. INTRODUCTION QoR. Re-writing of behavioral descriptions can sometimes be
considerably time-consuming and should be taken into account
1943-0663 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
54 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 6, NO. 3, SEPTEMBER 2014
ANSI-C. E.g. Fortes Cynthesizer [7] only supports SystemC for basic HLS optimization techniques which include loop un-
and Calyptos CatapultC [8] supports C++ and SystemC, but not rolling, array synthesis (register or memory) and function syn-
ANSI-C. Other tools like NECs CyberWorkBench [9], support thesis with pointer argument support. It should be noted that var-
SystemC and a subset of C called BDL (Behavioral Description ious disciplines require fast sorting. Thus, qsort design could be
Language) which forces descriptions given in C to be manu- categorized into many other categories.
ally converted into this subset. Another important limitation of Sobel (dd): The sobel filter is an edge-detection algorithm
ANSI-C benchmarks for HLS is that ANSI-C does not support that takes a bitmap image directly as the input and returns a
fixed point data types, which is extremely important in DSP ap- new bitmap image solely consisting of the edges of the orig-
plications, often targeted in HLS. inal image. The program specifically checks for nested loop un-
In this letter, we present S2CBench, a freely available syn- rolling and pipelining optimizations, I/O ports expansion (ex-
thesizable SystemC benchmark suite [10], consisting of pand inputs specified as arrays to individual ports), multidi-
programs targeting a variety of applications typically used in mensional arrays expansion, fixed arrays synthesized as logic
HLS. 12 benchmarks comply with the latest SystemC synthe- or ROMs and pointer arguments to functions.
sizable subset draft, while one design (FFT) is nonsynthesizable A2 Security: The security category includes several algo-
because it contains trigonometric and floating point operations. rithms for data encryption and hashing. HLS has proved to be a
This design was added in order to help users understand how very good solution for designing data security applications due
the different HLS tools support these operations as most com- to their mathematical complexity. Moreover, these types of ap-
mercial tools have vendor specific ways to support them. The plications are very difficult to verify in RTL [11].
benchmarks are classified into different categories of applica- aes_cipher (dd): Advanced encryption standard cipher en-
tions. One of the unique features of the benchmark suite is that cryption algorithm performs AES encryption/decryption. This
every application is accompanied by its respective testbench. program consists of many user-defined functions. It contains a
The testbench contains test vectors stored in a file and com- large number of small for loops having interloop data depen-
pares the simulation results with a golden output included for dencies. The main optimization techniques addressed are input
each design. The test vectors are not fixed and can be modified port expansion, array synthesis (memory or registers), function
by the user. The option to create a waveform is also available. In synthesis (inline or goto operators) and large fixed arrays syn-
this case, the simulation will produce a VCD file, which may be thesized as logic or ROMs.
viewed by any waveform viewer. Finally, each benchmark is de- kasumi(dd): Kasumi is a block cipher algorithm used in mobile
signed to test specific optimization options allowing designers, communication systems. The SystemC description includes two
evaluating different HLS tools, to understand the support of the threads and multiple functions. Therefore, this design is useful
tools for these options. to verify the synthesis and especially the verification of multi-
process systems. The design also contains multidimensional I/O
II. S2CBENCH OVERVIEW ports and multiple arrays. Finally, the kasumi algorithm, similar
Synthesizable SystemC Benchmark (S2CBench) suite is a to most encryption applications, contains large amount of logic
collection of programs following the latest SystemC operations (e.g., and, or, xor). HLS tools are notably not efficient,
Synthesizable Subset draft 1.3. Some of the main objectives of for accurately estimating the critical path of these applications,
S2CBench are: because the discrete delay of all the operations are simply added,
to enable the direct comparison of commercial HLS tools; thus overestimating the critical path. This application can provide
to test specific tool features classified as language support, an indication of the accuracy of the HLS timing report, compared
synthesis optimization techniques, and tool performance; to that of the logic synthesis result.
to help researchers analyze and compare their own tech- md5C (dd): The message digest algorithm is widely used
niques. in cryptography to generate hash functions and check data in-
tegrity. MD5C is a single process design consisting of multiple
A. Benchmark Programs functions, arrays of different bit widths and different levels of
Every benchmark program in S2CBench is designed to test loop nesting. One of the unique language constructs to be tested
particular features. Described below, is a brief overview of with this design is the extensive use of define macros.
the programs included in the suite, each categorized according snow3G (dd): Snow 3G is a stream cipher that produces a
to its application domain. The benchmarks are also classified key stream that consists of 32-bit blocks using a 128-bit key.
into data-dominant (dd) or control-dominant (cd) designs. HLS Apart from the main optimization options, this design tests the
normally achieves very good results for the former category support of HLS tools for templates. A variable length multipli-
while it sometimes creates suboptimal designs for the latter cation operation is performed in this algorithm, which may be
category. easily simplified using templates.
A1 Automotive and Industrial: The category includes appli- A3 Telecommunication: With the explosion of portable elec-
cations normally used in embedded control systems, which per- tronic devices using wireless communication, constrained by
form extensive basic math operations and bit manipulations. limited power budgets, some of the telecommunication func-
qsort (dd): Quick sort design sorts data in ascending order tions are frequently being implemented as custom HW blocks
using the well-known quick sort algorithm. Sorting of data is im- in SoCs (Systems on Chip). HLS is a natural choice for most
portant for designs, so that they can be analyzed easier and pri- of the complex applications in this domain, having well known
orities be established. This design helps in verifying the support legacy C descriptions.
SCHAFER AND MAHAPATRA: S2CBENCH: SYNTHESIZABLE SYSTEMC BENCHMARK SUITE FOR HIGH-LEVEL SYNTHESIS 55
TABLE II
BENCHMARK PROGRAM CHARACTERISTICS
REFERENCES
Fig. 1. Structure for testbench validation.
[1] W. Meeus, K. V. Beeck, T. Goedeme, J. Meel, and D. Stroobandt, An
overview of todays high-level synthesis tools, presented at the Design
III. BENCHMARK VALIDATION Automation for Embedded Systems, 2013.
[2] D. Gajski, T. Austin, and S. Svoboda, What input-language is the best
A SystemC testbench is provided with all of the designs in for high level synthesis (HLS)?, in Proc. DAC, 2010, pp. 857858.
order to verify their functionality. Fig. 1 shows the modular [3] K. M. Dixit, SPEC benchmarks, J. Parallel Comput., vol. 17, pp.
structure of the testbench interface with the synthesizable 11951209, Dec. 1991.
[4] L. Chunho, M. Potkonjak, and W. H. Mangione-Smith, MediaBench:
design. The testbench module contains a send and a receive A tool for evaluating and synthesizing multimedia and communica-
process which are modeled as clocked threads in SystemC. tions systems, in Proc. 30th Annu. ACM/IEEE Int. Symp. Microarchi-
The send process transmits data to the unit under test (UUT) tecture, 1997, IEEE Computer Society.
[5] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge,
continuously until the test vectors stored at the input file are and R. B. Brown, MiBench:A free, commercially representative
exhausted and the receive process receives the data from the embedded benchmark suite, in Proc. IEEE Int. Workshop Workload
UUT and stores the output data into another text file. Finally, Characterization (WWC-4), 2001, pp. 314.
[6] Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii, CHStone:
the simulation result is compared with the golden output before A benchmark program suite for practical c-based high-level synthesis,
the simulation finishes and any discrepancies reported. Addi- in Proc. Circuits Syst. (ISCAS), 2008, pp. 11921195, IEEE.
tionally, the testbench also contains the option to dump a VCD [7] Cynthesizer Tool, Forte design systems [Online]. Available: http://
www.forteds.com/.
file in order to view the waveform of the main signals. The [8] Catapult HLS tool [Online]. Available: www.calypto.com.
input stimuli are all stored in text files and can be modified by [9] CyberWorkBench, NEC [Online]. Available: www.cyberworkbench.
the user with the exception of the sobel and disparity estimator, com.
[10] S2CBench [Online]. Available: http://www.s2cbench.org/.
which take bitmap file as input as indicated in Table II. [11] S. Morioka, T. Isshiki, S. Obana, Y. Nakamura, and K. Sato, Flexible
architecture optimization and ASIC implementation of group signature
IV. CONCLUSION algorithm using a customized HLS methodology, in Proc. IEEE Int.
Symp. Hardware-Oriented Security Trust (HOST), 2011 , pp. 5762.
This letter presents a SystemC benchmark suite which com- [12] P. R. Panda and N. D. Dutt, 1995 high level synthesis design reposi-
plies with the latest Accelleras synthesizable subset draft and tory, in Proc. 8th Int. Symp. Syst. Synthesis, pp. 170174, ACM.