Sunteți pe pagina 1din 6

2010 International Conference on Reconfigurable Computing

Communication Architectures for Run-Time Reconfigurable Modules


in a 2-D Mesh on FPGAs

Jochen Strunk, Johannes Hiltscher, Wolfgang Rehm Heiko Schick


Computer Architecture Group Research & Development
Chemnitz University of Technology IBM Deutschland Research & Development GmbH
Chemnitz, Germany Boeblingen, Germany
Email: {sjoc,jhil,rehm}@cs.tu-chemnitz.de Email: schickhj@de.ibm.com

Abstract—This paper examines the feasibility of utilizing a over and over again. With the emergence of dynamically
2-dimensional (2-D) mesh of run-time reconfigurable modules and partially reconfigurable (DPR) FPGAs another degree
(RTRMs) on a dynamically and partially reconfigurable (DPR) of freedom was added to the resource utilization of config-
FPGA for throughput- and real-time-driven tasks. To utilize a
2-D mesh of RTRMs, efficient communication architectures
urable logic devices. Thereby distinct regions of the FPGA
(CA) are required, which will be presented in this work. can be reconfigured during run-time without affecting the
Such a 2-D mesh of RTRMs on a DPR-capable FPGA configuration and the functionality of other parts on the
can be utilized for throughput-driven tasks to dynamically FPGA. This feature of run-time reconfiguration (RTR)
offload compute functions on a host coupled system, providing allows to change parts of the design, i.e. the functionality,
multi-user and multi-context execution on behalf of user
demands. For embedded systems, it can be utilized as a highly
of an already configured FPGA during run-time to the
dynamical platform by providing functional enhancement by demands of the user or the environment. The adaptability is
module replacement during run-time. The exploration also increased in the way that functionality for tasks unknown
includes a CA for real-time communication between RTRMs during the design time, can be added later during the run-
in a 2-D mesh. The presented CA design is based on a time of the system. Applying RTR can lead to a higher
novel methodology by applying run-time reconfiguration to
increase the performance. The design, the implementation,
energy efficiency and lower costs when parts of the design
the performance and the resource utilization is shown for can be deployed on the FPGA by time division multiplexing
throughput- and real-time-driven CAs. As proof of concept, a (TDM) and therefore FPGAs with a lower amount of logic
case study is conducted for the presented CAs on state of the cells can be utilized. This paper is devoted to coarse grained
art Virtex-5 FPGAs. RTR based on run-time reconfigurable modules (RTRMs)
Keywords-dynamic reconfiguration; run-time reconfigura- on DPR capable FPGAs.
tion; FPGA; 2-D mesh; communication architecture; A RTR system architecture is investigated which allows
the arbitrary placements of RTRMs in a 2-D mesh at
I. I NTRODUCTION run-time without interfering other RTRMs. As multiple
Field programmable gate arrays (FPGAs), which are pro- RTRMs should be supported, a adequate communication
grammable logic devices, can be found in various fields architecture on the FPGA must be investigated. In this
of application, e.g. reconfigurable computing (RC) and architecture all RTRMs should operate and communicate to
embedded systems. The fundamental idea is based on the other RTRMs or static modules independently, i.e. poten-
creation of application-defined processing engines with a tially all RTRMs can act as an initiator (master) of commu-
programmable logic device in contrast to a microproces- nication. Two different communication architectures (CA)
sor program, which runs on a fixed instruction set. In for RTRMs based on FPGA logic and routing resources
a hybrid system, consisting of a processor and a FPGA, will be examined. The first one targets throughput-driven
the programmable logic device is able to accelerate user RTR systems, like RC systems, where multiple RTRMs
applications by taking advantage of creating self-defined, acting as offloading compute kernels can be run on a single
highly parallel and energy efficient hardware processing FPGA. These could be systems with host coupled FPGAs,
engines. Woods et al. [1] gained a speedup of more than where a user can create multiple accelerator modules on a
50 compared with a CPU when accelerating a Quasi-Monte single FPGA or different users are allowed to utilize the
Carlo Simulation. Zang et al. [2] reached a 25 times speedup same FPGA for computational tasks. The second field of
on another Monte Carlo Simulation. In embedded systems application targets real-time-driven embedded systems. The
FPGAs are capable to provide stand-alone replacement CA for this type of system needs to be designed in such
solutions for expensive silicon ASICs. Due to the fact that a way that the communication of RTRMs complies with
the configuration of the FPGAs is based on SRAM, they can real-time constraints. The exploration of efficient CAs for
be reconfigured to their specific task within milliseconds RTRM-based RTR systems is crucial for the implementation

978-0-7695-4314-7/10 $26.00 © 2010 IEEE 49


DOI 10.1109/ReConFig.2010.33
and performance of such systems. The rest of the paper is
organized as follows:
Section II focuses on related work. In section III, the de-
sign, the communication, and the management of a system
architecture for RTRMs in a 2-D mesh is described. The
throughput- and real-time-driven CA for the 2-D mesh are
explained in sections IV and V. As proof of concept and to
assess the performance and resource utilization, a case study
is conducted in section VI. The findings are presented and
compared in section VII. Section VIII concludes the results
of this paper.
II. R ELATED W ORK
Various techniques have been proposed to take advantage of Figure 1: Logical view of a 2-D mesh of RTRMs and its working
principle, (CTRL=controller, R=router)
DPR-capable FPGAs for coarse grained RTR with RTRMs.
The term communication architecture is used as a generic
name for all classes of interconnects within a FPGA for
respectively RTRM are the equal. Especially for embedded
the communication of RTRMs, i.e. bus, crossbar switched,
systems, this allows the fast and uncomplicated assignment
network on a chip (NoC). Koch et al. [3] proposed a tool,
of RTRMs to RTRRs utilizing frame-based reallocation
named ReCoBus-builder, for creating a bus which connects
techniques [10] for the configuration bitstreams of RTRMs.
a static master (CPU) with RTRMs of different size as
slaves. It is independent of the Xilinx module-based [4] B. Communication
partial reconfiguration flow, but only Virtex-II and Spartan-3 To allow the communication of RTRMs, an interconnect,
FPGAs are supported. A tool-flow for generating homoge- based on FPGA logic and routing resources, should be
neous communication infrastructures, built upon the Xilinx applied. As communication model, master-slave and inter-
design flow, was presented by Hagemeyer et al. [5]. Switch- module communication models should be supported. A
based communication architectures have been explored in regular 2-D mesh topology is utilized as interconnect to
[6] and [7]. For utilizing RTRMs on host coupled FPGAs, provide sufficient bandwidth for the communication and
we have designed the interface and framework ACCFS to allow higher operating frequencies. Two distinct com-
(ACCelerator File System) [8] for host system integration, munication architectures are investigated. The first one is
which is built on top of a virtual file system, supporting optimized for throughput-driven tasks, e.g. for host coupled
run-time reconfiguration on FPGAs. In [9], we have shown reconfigurable systems. The second one targets embedded
that ACCFS in combination with a single DPR FPGA can systems, which rely on real-time requirements, i.e. inter-
serve as a platform for allowing multiple offload-engines module communication and communication with I/Os have
(RTRMs) to be run in a single- and multi-user environment. to obey real-time constraints.
With increasing amounts of RTRMs and the permanently
growing count of logic cells within modern FPGAs, sophis- C. Management
ticated 2-D communication architectures for RTRMs need For the management of RTRMs, a controller is required
to be explored, which provide sufficient bandwidth for all which manages the run-time reconfiguration process of the
RTRMs and different communication models, e.g. master- RTRMs. This unit is usually a static module, which is able
slave and inter-module. to fulfill other tasks like access to external memory, host and
III. 2-D M ESH OF RTRM S I/O interface. Therefore we call this unit controller (CTRL),
refer to Figure 1. This controller utilizes an internal recon-
A. Architecture figuration interface for the reconfiguration of RTRMs. For
Xilinx Virtex FPGAs this is the internal configuration access
As system architecture for RTR computing, targeting host-
port (ICAP).
coupled and embedded systems with DPR capable FPGAs,
we utilize a regular 2-D mesh of homogeneous run-time IV. T HROUGHPUT-D RIVEN C OMMUNICATION
reconfigurable regions (RTRR), refer to Figure 1. As DPR A RCHITECTURE
FPGAs Xilinx Virtex-5 FPGAs will be utilized. During The term throughput-driven in this context refers to an
the run-time of the FPGA the RTRMs can be arbitrarily architecture trying to maximize the total throughput of
assigned on demand to a RTRR within the grid, without data respectively messages within the mesh. Since FPGAs
affecting the functionality, the run-time behavior and the provide a limited amount of resources, a good performance
communication of non-targeted RTRMs. Hence the size and to resource utilization ratio is targeted. At the same time
the available logic and routing resources for each RTRR the latency should be kept minimal. To achieve this,

50
requests should be served as soon as possible when they
appear.

A. Design
Packet switching is applied for the transmission from sender
to receiver. Each payload is preceded by a short header,
including source, destination and packet size occupying for
simplicity half a word each. A word is defined here as the
amount of bits representing the native data link width of
a router. Hence, every packet has a constant overhead of
two words for the header. As routing strategy deterministic
dimension-based X-Y routing is applied. This is mainly
due to the efficient implementation of X-Y routing in
contrast to more complex strategies. Data is forwarded in
a way combining the advantages of wormhole routing and
flit-based routing. However, in contrast to true wormhole
routing, packet data is propagated through the network as
the route is built, thus minimizing the startup time. When a
route is established by a router, it is reserved for the packet
until all data has passed. To improve the performance by
Figure 2: Architecture of a throughput (TP) driven router,
minimal buffer requirements, the routers are equipped with (N=North, E=East, S=South, M=Module (RTRM), W=West)
small FIFOs to store some flits in contrast to whole packets,
in case of true store-and-forward routing, which would
increase both latency and resource costs since a packet is resource utilization low. Since buffers are limited and packet
stored as a whole in each router before being propagated. data is transmitted on-line as the route is established, our
Two small FIFOs are utilized as a compromise between design requires a flow control. It is implemented using
throughput performance and costs of resource utilization. the valid-stop-protocol allowing every RTRM and router to
B. Implementation control the data flow. These signals are generated and used
by the channel state machines to control FIFO operation.
The main part of the throughput-driven router is the route
generation and arbitration logic as depicted in Figure 2. V. R EAL -T IME -D RIVEN C OMMUNICATION
The arbitration logic is responsible for a fair distribution of A RCHITECTURE
the router’s bandwidth using round-robin to poll the input
The term real-time-driven in this context describes a com-
ports. In every clock cycle the logic checks the currently
munication architecture providing predictable latencies for
selected input for a new request. If one is discovered, a
the communication between RTRMs or a RTRM with the
source-destination-tuple is generated describing the route
controller unit for memory, I/O or host access.
the request will take.
Fields of application are embedded systems with real-time
Routing requests are managed via two request FIFOs
demands or RTRMs, acting as processing elements, which
keeping them in order. New request-tuples are inserted into
have low buffer capabilities and therefore have to process
the first request FIFO from which they can be assigned
and deliver data just in-time. The goal is to provide a
to one of the channels described later in this section. If a
very resource-efficient implementation of a communication
request is not feasible since its route is currently blocked by
architecture for the real-time communication of RTRMs in
another packet being transmitted, it is moved from the first
a 2-D mesh. This is achieved by the application of the
to the second request FIFO. This raises its priority ensuring
following two techniques. First, the design is driven bottom-
it will be served as soon as the blocking transmission is
up by the device primitives of the FPGA. Secondly, the logic
finished while allowing other feasible requests to be served
complexity and dependencies is reduced by applying RTR
in the meantime.
not only for the RTRMs but also for updating routing paths
The data flow is managed by two independent channels
during the run-time.
consisting of a FIFO and control logic. These data FIFOs
are not to be confused with the request FIFOs used by the A. Design
route generation logic. FIFOs have been introduced to the
Real-time tasks based on RTRMs rely on a CA with
design to allow a continuous flow of data thus minimizing
deterministic behavior and low worst case latencies for
wait states. They are implemented using Virtex SliceM
the communication, whereas the resource utilization should
distributed RAM allowing fast operation while keeping
be kept at a minimal. As a basic approach, time division

51
multiplexing (TDM) in combination with circuit switching
(channel switching) is applied for each connection. A
minimal routing logic is applied, where for each output
(NO , EO , SO , MO , WO ), refer to Figure 3 and 4, of
a router an input is selected, based on a slot within a
communication cycle. Furthermore no message format, i.e.
header, is needed, which has to be processed. To notify
the receiver that new data is available, the additional signal
’valid’ is forwarded beside the data signals of the links.
This allows to reduce the protocol stack to a minimum.
A communication cycle is divided into a fixed number of

Figure 4: Architecture and working principle of a real-time (RT)


driven router

FPGAs, in the way that RTR is not only applied for the
configuration of RTRMs, but also for changing routes of a
channel for a slot during the run-time, which is done by
Figure 3: Example of an assignment of outputs (NO , EO , SO , the controller unit. Typically 49 clock cycles at 100 MHz,
MO , WO ) of a router, here R-10, to inputs originating from sender respectively 490 ns, including the CRC value, are required
RTRMs on a slot basis within a communication cycle for real-time on Virtex-5 FPGAs to change all slots within one router,
communication
due to the frame-based DPR behavior of Virtex FPGAs. It
should be noted, that changing routes during run-time is
slots. If there is no change of slots, the slots are equal not required in the standard case, because routes between
for the next cycle. For each slot, one input is asigned for communication partners are set up in a slot before or during
each output port of a router. Figure 3 depicts the division the configuration of a RTRM and are kept alive until the ter-
and an example of an assignment for a router. The route, mination of a communication partner. Removing the active
respectively channel, has to be established before the start routing algorithm, e.g. dimension-based, within the routers
of the communication of the RTRM to the selected target. dramatically decreases their complexity. Furthermore, an
This can be done before the RTRM is run-time reconfigured external route generation allows the utilization of more
in case of static communication relations or during the run- sophisticated routing strategies since powerful devices such
time for dynamic relations. The task of the establishment as embedded processors can be used for route generation.
of channels is assigned to the controller unit, which also
manages RTR. Due to the external route generation by B. Implementation
the controller unit different routing strategies are possible,
Due to the novel methodology of utilizing RTR even
whereas dimension X-Y based routing is chosen for sim-
for configuring the slots of a cycle, already mentioned
plicity. When a channel is set up, it can not be interrupted
in section V-A, the implementation does rely only on a
or disturbed by the communication of other RTRMs during
few components shown in Figure 4. This simplifies the
the duration of a slot, i.e. real-time constraints can be met
design and allows higher overall clock speeds. Two shift
with this design. The configurations for the switches within
registers, one for counting the clock cycles within a slot
the router are stored in distributed RAM, i.e. SliceMs for
and a second for the slots within a communication cycle,
Xilinx Virtex FPGAs. Run-time reconfiguration is taken to
are required. The latter addresses the memory, where the
the extreme on DPR-capable FPGAs, e.g. Xilinx Virtex
configuration of the routers is stored on a slot basis. Two

52
bits are required for the configuration of each switch, i.e. TP 32-bit distr-1
it takes a total of 10 bits for each slot and each router. For 290
TP 32-bit distr-2
a resource-efficient implementation, the shift registers are 280 TP 64-bit distr-1
instantiated as SRLC32E Virtex primitives. For the RAM, 270
TP 64-bit distr-2
two RAM32X8S (RAM32M) primitives are chosen, which
260

speed (MHz)
are based on distributed RAM available in every second
logic cell (SliceMs) in Virtex FPGAs. With the RAM32X8S 250

slots can be realized within a cycle. If more slots per 240

cycle are required, additional RAMs can be concatenated. 230


Due to the instantiation as primitives, the location of the 220
RAM32X8S can be constrained to a fixed position within
210
the FPGA. This allows to set and change the routes for
RTRMs during run-time by the controller unit. 12 4 8 16 24 48
#RTRMs
VI. C ASE S TUDY OF 2-D RTRM M ESHES
Figure 5: Speed of the throughput-driven communication archi-
tecture design for a 2-D mesh of RTRMs with 32 and 64-bit link
A. Overview and Goals width
As proof of concept for the creation, the management
and the communication of a 2-D mesh of RTRMs, a case
study is conducted. The throughput- and real-time-driven
approach of sections IV and V are applied as CAs with named TP-CA respectively RT-CA, for 2-D meshes of
various amounts of RTRMs, different RTRM/RTRR sizes RTRMs. In Figure 5 and 6 the speed of TP- and RT-CA
and link widths. Besides the verification of the design by designs are shown for different amounts of RTRMs, link
behavioral and post route simulation for master-slave and widths and RTRM distributions. It can be seen that for the
arbitrary intermodule communication patterns, the designs implementation of the TP-CA as well as the RT-CA the
were implemented on a Xilinx Virtex-5 XC5VLX330- maximum design speed for distribution-2 is slower than
2ff1760 FPGA with ISE 12 design tools. A pattern matcher for distribution-1. This is due to higher data path delays
[8], which finds patterns in a byte stream, was chosen as between the routers and clock path skew for the larger
RTRMs. To assess the pure speed of the CAs, the pattern height of the RTRMs respectively RTRRs in distribution-2,
matcher is fully pipelined and yields up to 365 MHz. e.g. 1.6/3.3 ns data path delay and 0.011/0.126 ns clock
path skew for 4 RTRMs. For yielding maximal speed the
B. RTRM Placement in a 2-D RTRM Mesh height of a RTRM should not be higher than one clock
Two different distributions of RTRRs are investigated, region. A higher design speed can be achieved for up to
which differ in the height of a RTRR respectively the 4 RTRMs in the case of distribution-1 for the TP-CA and
amount of logic blocks (CLBs). The smallest height (20 the RT-CA due to the reduction of output links of routers.
CLBs) of a RTRR for a RTRM is directly related to the In comparison to competitive research results [7] (DyNoC,
height of a configuration frame [11] for Virtex-5 FPGAs. 4 RTRMs) we achieve a 3.19 times (239/75 MHz) faster
This height corresponds to the height of a clock region (CR) design speed for the TP-CA and 3.37 times (253/75 MHz)
in Virtex-5 FPGAs. For distribution-1 (distr-1) all heights of for the RT-CA in the worst case. In the best case (distr.-1)
RTRMs are equal to the height of a clock region. The width the speed up is even higher, 3.89 times (292/75 MHz) for
in Virtex-Slices of a RTRR results from the division of the the TP-CA and 4.73 times (355/75 MHz) for the RT-CA.
total width of the FPGA, minus the space for routers, by the The resource utilizations for the implementations of the TP-
amount of RTRMs placed in a row. Distribution-2 (distr-2) CA equals those in [7], whereas the RT-CA consumes 4.3
is applied for assessing the speed for larger sized RTRMs, times less resources. As expected the RT-CA is faster than
i.e. 4 CRs for 2 and 4 RTRMs and 3 CRs for 8 RTRMs. the TP-CA counterpart, i.e. 330 versus 260 MHz for 64-bit.
Due to longer data path delays between the routers, slower Nevertheless, the TP-CA itself is not the limiting factor for
design speeds are expected compared to distribution-1. The standard RTRM designs up to 260 MHz for distribution-1.
survey for the throughput-driven CA with 48 RTRMs is The higher performance of the RT-CA is directly related
conducted due to completeness and the investigation of the to the lower complexity of the design originating from a
borderline use case. consistent bottom-up design based on FPGA primitives and
by applying RTR for the configuration of the routes. The
VII. P ERFORMANCE AND C OSTS lower complexity also leads to a smaller resource utilization
This section summarizes the results of the case study con- for RT-CAs in contrast to TP-CAs, refer to Table I. The
ducted to assess the performance and costs of the throughput maximal throughput of the routers for both CAs are listed
and real-time driven communication architectures, further in Table I. Due to the restriction on the utilization of

53
RT 32-bit distr-1
design complexity. The outstanding performance to resource
385 utilization ratio for the RT-CA is achieved by a consistent
RT 32-bit distr-2
375
365 RT 64-bit distr-1 bottom-up design based on FPGA primitives in combination
355 RT 64-bit distr-2 with the novel methodology of applying RTR. Due to this
345
335
high performance to resource utilization ratio the RT-CA
speed (MHz)

325 is also suitable for throughput-driven communication with


315 more than 16 RTRMs and slow changing loads. These
305
295 results should encourage the community to deploy a 2-D
285 mesh of RTRMs within FPGAs for run-time reconfigurable
275
265
computing.
255
245 IX. F UTURE W ORK
235
The results encourage us to support 2-D meshes of RTRMs
12 4 8 16 24 48
#RTRMs
in our accelerator framework ACCFS for host coupled
FPGAs to take advantage of the additional performance gain
Figure 6: Speed of the realtime-driven communication architecture
design for a 2-D mesh of RTRMs with 32 and 64-bit link width of 2-D mesh.
X. ACKNOWLEDGMENT
link Slices Slices % thr.-put
type RTRMs width /router total FPGA /router The project is performed in collaboration with the Center
TP 16 32 240 3840 7.4 2.1 GB/s of Advanced Study Boeblingen, IBM Deutschland Research
TP 16 64 338 5408 10.43 4.2 GB/s
TP 24 32 241 5784 11.16 2.1 GB/s & Development GmbH in Germany.
TP 24 64 338 8112 15.65 4.2 GB/s
TP 48 32 267 12816 24.72 2.1 GB/s R EFERENCES
TP 48 64 345 16560 31.94 4.2 GB/s [1] N. A. Woods and T. VanCourt, “FPGA Acceleration of Quasi-Monte
RT 24 32 54 1296 2.5 7.0 GB/s Carlo in Finance,” in FPL. IEEE, 2008, pp. 335–340.
RT 24 64 88 2112 4.07 13.2 GB/s
[2] G. L. Zhang, P. H. W. Leong, C. H. Ho, K. H. Tsoi, C. C. C.
RT 48 32 74 3552 6.85 7.0 GB/s
Cheung, D.-U. Lee, R. C. C. Cheung, and W. Luk, “Reconfigurable
RT 48 64 115 5520 10.65 13.2 GB/s
Acceleration for Monte Carlo Based Financial Simulation,” in FPT,
G. J. Brebner, S. Chakraborty, and W.-F. Wong, Eds. IEEE, 2005,
Table I: Resource utilization and throughput of a throughput pp. 215–222.
and real-time driven communication architecture for a 2-D mesh [3] D. Koch, C. Beckhoff, and J. Teich, “ReCoBus-Builder a Novel Tool
(distribution-1) and Technique to Build Statically and Dynamically Reconfigurable
Systems for FPGAs,” in Proceedings of International Conference on
Field-Programmable Logic and Applications (FPL 08), Heidelberg,
Germany, 2008.
2 FIFOs, only 2 of the 5 output ports of a router are [4] Xilinx, “Two Flows for Partial Reconfiguration: Module Based or
concurrently able to forward data in a TP-CA resulting in Difference Based,” in Application Note: Virtex, Virtex-E, Virtex-II,
Virtex-II Pro Families (XAPP290), 2004.
a maximal theoretical throughput of 2/5 of the RT-CA. The
[5] J. Hagemeyer, B. Kettelhoit, M. Koester, and M. Porrmann, in De-
maximal throughput of 4.2 GiB/s for the TP-CA and 13.2 sign of Homogeneous Communication Infrastructures for Partially
GiB/s for the RT-CA was yielded in case of 64-bit link Reconfigurable FPGAs (ERSA). CSREA Press, 2007.
width. Especially for the RT-CA this results in an excellent [6] J. Surisi, C. Patterson, and P. Athanas, “An efficient run-time router
for connecting modules in FPGAs,” in Proceedings of International
performance to resource utilization ratio. The design of Conference on Field-Programmable Logic and Applications (FPL
the RT-CA approach could be more suitable than the TP- 08), Heidelberg, Germany, 2008.
CA for more than 16 RTRMs, not frequently changing [7] T. Pionteck, C. Albrecht, K. Maehle, E., Hübner, M., and Becker,
J., “Communication Architectures for Dynamically Reconfigurable
communication relations and loads between RTRMs, due FPGA Designs,” in Proceedings of IEEE International Parallel and
to the excellent performance to resource utilization ratio. Distributed Processing Symposium, IPDPS USA, 2007.
[8] J. Strunk, A. Heinig, T. Volkmer, W. Rehm, and H. Schick, “Run-
VIII. C ONCLUSION Time Reconfiguration for HyperTransport coupled FPGAs using AC-
CFS,” in proceedings of the Workshop on HyperTransport Research
The implementation of a 2-D mesh of RTRMs on a FPGA, and Applications (WHTRA). HeiDOK, University of Heidelberg,
which allows the arbitrary placement of RTRMs to RTRRs 2009.
applying RTR without affecting the functionality and the [9] J. Strunk, A. Heinig, T. Volkmer, and W. Rehm, “ACCFS - Virtual
File System Support for Host Coupled Run-Time Reconfigurable
communication of non-targeted RTRMs during run-time, is FPGAs,” in Advances in Parallel Computing, Volume 19, Parallel
feasible. This was shown for a TP-CA and RT-CA for differ- Computing : From Multicores and GPU’s to Petascale, 2010.
ent amounts of RTRMs and distributions. In comparison to [10] A. Flynn, A. Gordon-Ross, and A. D. George, “Bitstream Relocation
with Local Clock Domains for Partially Reconfigurable FPGAs,” in
competitive research results a speedup in throughput of up DATE. IEEE, 2009, pp. 300–303.
to 3.89 for the TP-CA and 4.73 for the RT-CA was achieved. [11] Xilinx, “Configuration Memory Frames,” in Virtex-5 FPGA Config-
Besides this, we have shown a novel methodology for a RT- uration User Guide (UG191), 2008.
CA by applying RTR, which leads to a further reduction of

54

S-ar putea să vă placă și