Sunteți pe pagina 1din 15

Hazard-Free Implementation of the Self-Timed Cell Set in a Xilinx FPGA

Kapilan Maheswaran Venkatesh Akella maheswar@ece.ucdavis.edu akella@ece.ucdavis.edu Computer Engineering Research Laboratory Department of Electrical & Computer Engineering University of California Davis, CA 95616

Abstract. When designing asynchronous systems, the problems of hazards becomes an important issue. This paper deals with the hazard-free implementation of asynchronous logic in a look-up table based FPGA. First, the definitions of hazards and techniques to deal with them in gate-level asynchronous circuits are surveyed. Then the look-up table (LUT) model and its associated timing properties are presented. Finally, a list of line delay constraints for a hazard-free implementation of the self-timed modules is presented and the conditions under which the cell set is hazard-free is explained. Our technique is illustrated in the implementation of the basic asynchronous macromodules outlined in Sutherland[16], Ebergen[8] and Brunvand[6]) using the Xilinx 4000 Series FPGA.

1. Introduction As technology improves, the systems that can be built become larger, faster, and more complex. Speed of synchronous systems is limited by clock skew and the worst case signal path. In an asynchronous system, the various subsystems can operate concurrently and the performance is restricted only by data and control dependencies. Hence, asynchronous systems tend to exhibit average case performance. In addition, asynchronous systems can result in minimizing power dissipation because only components contributing to the current computational task are activated. Thus, asynchronous systems promise to be a viable alternative for the high performance computing systems of the future. However, at the present, there is a lack of commercially viable design methodologies and cell libraries to support this style of design. Field Programmable Gate Arrays (FPGAs) offer rapid prototyping and experimentation of digital circuits with minimum engineering costs. Today, they are limited to only synchronous circuits. FPGAs will be ideal for investigating novel asynchronous architectures provided suitable circuit structures can be built in a hazard-free manner. FPGA-based designs are prone to hazards because of two reasons: (1) They do not ensure predictable routing delays and (2) one cannot restrict the input changes at various circuit inputs. This is not a major issue in the design of synchronous systems but poses to be the main obstacle in the hazard-free implementation of asynchronous logic. In this paper, we propose a scheme to build hazard-free asynchronous circuits in a commercial FPGA such as the Xilinx 4000 series part. First, we review the definitions of various possible hazards in asynchronous circuits and then examine why the existing techniques cannot be used directly to realize hazard-free circuits in a look-up table based FPGA. This brings us to the characterization of the timing behavior of a look-up table. Then we present a set of timing constraints under which a look-up table based implementation of a macromodule is hazard-free. Finally, we present the list of delay constraints that have to be satisfied by each asynchronous macromodule for a hazard-free realization.

2. Background and Definitions The following definitions are taken from [13], which considers single-output functions having binary input and output variables. Define B = {0,1}. A Boolean function, f, of n variables, x1, x2, ..., xn, is defined as a mapping: f: Bn B, for all 1 i n and xi B. Each element in the domain Bn of the function f is called a minterm of the function. A minterm is also called an input state of the function. The ON-set of a function is the set of minterms for which the function has value 1. The OFF-set is the set of minterms for which the function has value 0. Each variable x has two corresponding literals: x and x. Literal x = 1 if and only if variable x in a minterm evaluates to 1; literal x = 1 if and only if x evaluates to 0. A product term is a Boolean product (AND) of literals. If a product term evaluates to 1 for a given minterm, the product term is said to contain the minterm. A cube is a set of minterms which can be described by a product term. A transition cube is a cube that has a start point and an end point. Given input states A and B, the transition cube from A to B, denoted as [A,B], has start point A, end point B, and a cube which contains all minterms that can be reached during the transition from A to B. Let the i-th literals of A and B be described by Ai and Bi , respectively. The i-th literals for the product term of cube [A,B] is the Boolean function (Ai + Bi ). A sum-of-products (SOP) represents a set of products; it is denoted by Boolean sum (OR) of product terms. A SOP is said to contain a minterm if some product in the set contains the minterm. An implicant of a function is a product term which contains no minterm in the functions OFFset. A prime implicant of a function is an implicant contained in no other implicant of the function. An essential prime implicant is a prime implicant containing a minterm contained in no other prime implicant. A cover of a Boolean function is a SOP which contains all the minterms of the ON-set of the function and none of the minterms of the OFF-set.

3. Hazard Analysis 3.1 Combinational Hazards In a combinational circuit, a transient error occurring due to the presence of stray (unplanned) delays is named combinational hazard[20]. Combinational hazard can be classified into various classes: function, logic and essential hazards. 3.1.1 Functional Hazards A function which does not change monotonically during a sequence of input changes is said to have a functional hazard for that input change[13]. Function hazards are a property of the logic function. They can only be eliminated through proper placement of delay elements, and arbitrary input and gate delays cannot be assumed. If a network has a function hazard for a given transition, then it cannot also have a logic hazard for the same transition[15]. The following definitions are from Bredeson and Hulina[5] and are equivalent to Eichelbergers definition[9]. DEFINITION 1. A Boolean function f contains a static function hazard for the input change A to C if and only if 1. f(A) = f(C). 2. There exists at least one input state B [A,C] such that f(B) f(A).

DEFINITION 2. A Boolean function f contains a dynamic function hazard for the input change A to D if and only if 1. f(A) f(D). 2. There exists at least one pair of input states B and C such that (a) B [A,D] and C [B,D] and (b) f(B) = f(D) and f(C) = f(A). 3.1.2 Logic Hazards A logic hazard is purely a property of the implementation. It is caused by possible delays in the actual logic gate realization in the absence of function hazard for an input transition. Following definitions, also from Bredeson and Hulina[5], illustrate this type of hazard. DEFINITION 3. A combinational network for a function f contains a static logic hazard for the input change A to B if and only if 1. f(A) = f(B). 2. No static function hazard exists for input change from A to B. 3. During the input change from A to B, a momentary pulse may be present on the output. DEFINITION 4. A combinational network for a function f contains a dynamic logic hazard for the input change A to B if and only if 1. f(A) f(B). 2. No dynamic function hazard exists for input change from A to B. 3. During the input change from A to B, a momentary 0 output and a momentary 1 output may appear. 3.2 Sequential Hazards There are two types of hazards that may be present in asynchronous sequential circuits. If there exists some distribution of stray delays such that a circuit may reach an incorrect state for some input transition, then the circuit is said to contain a steady state hazard. A transient (or output) hazard is said to be present if a spurious output pulse may be produced during some transition for some distribution of stray delays, and the final total state remains unchanged. Sequential hazards that occur regardless of the actual implementation and are obvious in the flow table (FT) specification are essential hazards. Sequential hazards also occur due to critical races and combinational hazards, and can be eliminated with a proper state encoding and the elimination of combinational hazards as discussed in the last section. 3.2.1 Essential Hazards A FT of a sequential function has an essential hazard for some initial total state and input variable x, when it does not confirm to the d-trio condition, i.e., three consecutive changes in x take the system to exactly the same state as the state reached after a single x-change (Figure 3-A(i)). An essential hazard is caused by a change in the input reaching different parts of the circuit at different times and sometimes even after the new feedback value has been produced, resulting in the network going to a wrong state[20]. Essential hazards can be eliminated by adding delays to the network. The presence of these inserted delays ensures that the combinational logic will have completed its response to an input change before it sees any state variable changes produced by that input change[14][10]. Delay elements will be necessary only for state variables that change during transitions involving essential hazards. For example in Figure 3-A(ii), to eliminate the essential hazard starting in state 1, it is necessary to delay the change in Y2 until the change in X1 has had a chance to propagate through the network. FTs without essential hazards can be realized without inserted delays, but special state assignments may be necessary in some cases. Such realizations will not contain steady state hazards but may contain output hazards.

X1X2 I1 1 3 3 I2 2 2 2 Y1Y2 00 01 11 10 00 01 1 3 3 1 2 2 2 4 11 _ 6 6 6 10 5 5 7 7 The flow table has essential hazards starting in 1 and changing X1 5 and changing X1 4 and changing X2

(i)

(ii)

Figure 3-A. A d-trio (left) and a flow table with essential hazards (right) Properties of sequential hazards[12]: 1. 2. 3. In single input change mode, any FT without essential hazards can be implemented free of steadystate hazards without the need for delay padding. No FT with essential hazards has an implementation without steady-state hazards independent of feedback delays, i.e., there is no delay-insensitive hazard-free implementation for it. Any FT has a steady-state hazard-free implementation using at most one delay element if it is operated under single input change, if the upper and lower bounds on the delays of gates and wires are known, and if the lower bound on that delay can be arbitrarily set by the designer. The multiple input change scenario is more complicated, especially if different orderings of multiple transitions which occur simultaneously lead to different stable states. Then the feedback delays must be chosen to be larger than the maximum separation between transitions (1 as defined by Huffman mode in the next section).

4.

3.3 Hazard-Free Transitions A multi-level expression can be transformed into a SOP expression in a static hazard-preserving manner using the associative, distributive and DeMorgan Laws[20] and keeping track of different paths for each literal. The following necessary and sufficient conditions detailed by Nowick[13], previously proven by Unger[20], Bredeson and Hulina[5], Bredeson[4] and Beister[2], will then ensure that the SOP implementation is logic hazard-free for a given input transition, provided that we assume no product contains a pair of complemented literals. In addition, assume that the transition is function hazard-free from input state A to B for a combinational function f, and C is any cover of f implemented in AND-OR logic. LEMMA 1. If f has a 0-0 transition in cube [A,B], then the implementation is free of logic hazards for the input change from A to B. LEMMA 2. If f has a 1-1 transition in cube [A,B], then the implementation is free of logic hazards for the input change from A to B if and only if [A,B] is contained in some cube of cover C. The conditions for the 0-1 and 1-0 cases are symmetric. LEMMA 3. If f has 1-0 transition in cube [A,B], then the implementation is free of logic hazards for the input change from A to B if and only if no cube c in the cover C intersects [A,B] unless c also contains A. Lemma 1 requires that the cube [A,B] is not a vacuous term (i.e. a term that contains a variable and its complement, such as xxy). Lemma 2 requires that in a 1-1 transition, some single gate maintains the output value at 1 throughout the transition. Lemma 3 insures that no single gate is turned on momentarily in the middle of a 1-0 transition: all products change value monotonically during the transition [13]. Sometimes Lemma 3 requires a function to be implemented with a single gate, for example, the C-element function, unless it uses storage elements[5].

4. Previous Work 4.1 Hazards in Gate-Level Implementations Many methods for detecting and eliminating combinational hazards have been published. Huffman, McCluskey and Unger[20] have developed techniques for detecting and eliminating hazards in combinational networks. These techniques use Boolean algebra and are restricted to single-input variable changes. Detection of static function and logic hazards in combinational or sequential switching circuits for multiple-input changes using ternary algebra has been considered by Eichelberger[9]. While logic hazards can be eliminated in a sum-of-products (SOP) realization by including all prime implicants, function hazards cannot be eliminated by modifying the logic network[9][15]. Conditions to avoid dynamic combinational function and logic hazards for multiple input changes in two-level and multi-level circuits are presented in Unger[20], Beister[2], Bredeson and Hulina[5], and Bredeson[4], but they indicate that these conditions cannot always be satisfied. The hazard-free cover proposed by Nowick[13] based on the notion of hazard-free transitions introduced in the previous section, ensures that an AND-OR implementation is hazard-free for a given set of input transitions. However, this is constrained to only combinational logic. Siegel[15] looks at the problem of technology mapping for asynchronous designs. The paper states that an implementation f has a dynamic logic hazard for a given 0-1 transition if there exists a cube which intersects this transition, but does not contain the end point. Although Siegel[15] proposes techniques to find all possible dynamic logic hazards, elimination of these hazards is not discussed. The algorithm described by Bredeson and Hulina[5], which was first presented by Armstrong, Friedman and Menon[1], uses sequential storage elements to eliminate combinational hazards and works for all functions under the assumption that line delay is less than loop delay. Few years later, Bredeson[4] introduced an algorithm that does not use the expensive storage elements, but is restricted as it can be applied only to combinational switching circuits without feedback. All these methods have been proposed for gate-level implementation of asynchronous circuits. Hence they are not directly applicable (as we will see in the next section) to the problem of look-up table based implementation which is the focus of this paper. 4.2 FPGA Implementations of Self-timed Circuits A self-timed cell set has been designed by Brunvand[6] using the Actel FPGA. However, unlike the Xilinx FPGA, the Actel product is a non-reusable FPGA . Once programmed, the Actel FPGA cannot be changed or re-programmed. In addition, the hazard behavior of this cell set has not been characterized. Researchers at University of Washington have proposed an asynchronous FPGA[11] but it differs from this paper in the sense that we are trying to develop techniques to implement self-timed circuits in existing FPGAs such as Xilinx 4000 series, as opposed to designing a new architecture specifically for asynchronous circuits.

5. Look-Up Table Based FPGA In this section we will discuss the characteristics of a look-up table using the Xilinx 4000 series part as an example. Xilinx FPGAs provide the benefits of custom CMOS VLSI, while avoiding the initial cost, time delay, and inherent risk of a conventional masked gate array. The XC4000 family provides flexible, programmable Configurable Logic Blocks (CLBs), interconnected by abundant routing resources and surrounded by programmable Input/Output Blocks (IOBs). The XC4000 CLB consists of three function generators, two flip-flops and several multiplexers. The function generators are capable of implementing any arbitrarily defined Boolean function of their four inputs. They are implemented as memory LUTs; therefore, the propagation delay is independent of the function being implemented. An XC4000 CLB can be used to implement any two independent functions of up-to-four variables, or any

single function of five variables, or any function of four variables together with some functions of five variables, or it can implement even some functions of up to nine variables. Both the number of blocks and the delay in the signal path is reduced by implementing wide functions in a single block, achieving increased density and speed[19]. Next we give an introduction to function generators (LUTs) and their properties. Then simple gate-level and LUT based implementations are compared. Finally, the implementation of combinational and sequential circuits using LUTs is discussed 5.1 Function Generators . A function generator is a multiplexer (built of transfer gates), with the N-inputs as select lines and the 2N configuration bits as data lines, that implements any function with one output by indexing into the truth table in memory (Figure 3)[17]. Following is a list of properties of the XC4000 logic block: 1. All its inputs can change independent of each other and at any time. 2. It has a balanced design with similar (almost equal) propagation delays from the select inputs to the data output[19]. Therefore, all blocks have the same intrinsic delay independent of configuration, inputs and outputs[18]. 3. The block delay is transport (or pure) and not inertial, as an input pulse however small will propagate through to the output after the block delay and would not be swallowed by the logic block[21]. 4. There can never be a decoding glitch when one input changes. Even a non-overlapping decoder (equivalent to a static logic hazard) cannot generate a glitch problem, since the node capacitance will retain the previous logic level until the new transfer gate is activated about a nanosecond later[19]. 5. For more than one simultaneous input change, glitches are possible in the presence of intermediate code which produces a different result (same as function hazard). Figure 2 illustrates all the possible multiple input change sequences for the given function f implemented using simple gates. The first of the input sequences leads to a hazard-free transition, but the other sequences result in hazards. A dynamic hazard results in the sequence shown by the second Karnaugh map. The output goes through a 0 1 0 transition before settling at 1. The hazard results from the cubes wxz and wxy turning on and off before cube xyz turns on. Siegel[15] noted that this particular hazard can only be eliminated by implementing the function with a single gate. The function hazard in the third Karnaugh map is caused by the presence of an intermediate state wxyx with a different output (in this case 0). According to Figure 3, a LUT implementation of the same function is logic hazard-free, but not function hazard-free due its properties listed above. This is proven in the next section. 5.1.1 Inferred Properties of a LUT Based Implementation PROPOSITION 1. A function f is logic hazard-free for any transition [A,B] for multiple input changes when implemented using a Xilinx LUT. Proof. The transition should not consist of any intermediate code which produces a different result or a function hazard (Definitions 1 and 2), because logic hazards are defined in the absence of functions hazards (Definitions 3 and 4). When the function hazard-free transition [A,B] of the function f is implemented using simple gates, logic hazards could occur during the transition from A to B due to gate delays in the combinatorial logic (Figure 2). In contrast, a LUT produces the output and holds it stable during the transition, eliminating logic hazards. This is due to the balanced design of the LUT with almost equal propagation delays from its inputs to its output[19]. For example, during a function hazardfree 1-1 transition (possible static hazard), a data line with the same value is chosen by the new select inputs to the function generator and the output is held steady, and during a function hazard-free 0-1 transition (possible dynamic hazard), a data line with a different value is selected, the output is changed and held steady (Figure 3).

YZ WX 00 01 11 10 M.I.C. Sequences Output Changes Hazard Behavior

00 01

11

10

YZ WX 00 01 11 10

00 01 1

11

10

YZ WX 00 01 11 10

00 01 1

11 1 1

10

1 1

1 1

w+ => x+ => y+ f 0 1 Hazard-free

y+ => x+ => w+ f 0 1 0 1 Dynamic Hazard

x+ => w+ => y+ f 01 0 1 Function Hazard

w' x z w' x y x y z

f = w'xz + w'xy + xyz

Figure 2. Simple-Gate Level Implementation

W
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

X
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

Y
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

Z
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

F
0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1

D0 D1 D2

16:1 MUX
D14 D15

OUT Output from the function generator

Truth Table in Memory

f = w'xz + w'xy + xyz


YZ WX 00 01 11 10 x+ => w+ => y+ Function Hazard 1 1 1 1 00 01 11 10

W X Y Z

Input variables as select inputs to the mux

Functions implemented using LUTs have no logic hazards, but function hazards are possible as show in the K-map above.

Figure 3. Xilinx Look-Up Table Based Implementation PROPOSITION 2. If a function f has a function hazard during the transition [A,B] and if a set of multiple input changes causes a transition from A to B, then it may produce a glitch at the LUT output. Proof. When more than one input changes simultaneously, the presence of any intermediate code that produces a different result may cause a decoding glitch. The glitch might be only a few nanoseconds

long, but that is long enough to upset an asynchronous design[19]. One way of eliminating this glitch is by changing only one of the select inputs to the LUT at a time during a transition [A,B]. It can also be avoided by using appropriate delay elements as mentioned before, but as we have no control over the delays inside a function generator, these hazards cannot be eliminated. A possible occurrence of a function hazard and a resultant glitch at the output of a function generator is illustrated in Figure 3, where the m.i.c. transition [wxyz,wxyz] has an intermediate state wxyz which produce a different output. PROPOSITION 3. All functions that are implemented using a look-up table are also essential hazard-free. Proof. These hazards are caused by a change in the input reaching different parts of the circuit at different times. These timing problems due to propagation delays are possible in gate-level, but not in LUT based implementations. In the case of a LUT, a change in the input is detected by the function generator, which implements the entire function, at the same time and then the corresponding output is selected from the configuration bits. Therefore, the new output is not feedback until the entire circuit has detected the input change. PROPOSITION 4. All functions that are implemented using a Xilinx LUT are hazard-free for single input changes. Proof. Based on Propositions 1, 2 and 3. 5.2 Multiplexers The multiplexers that route the outputs of the function generators to the outputs of the CLBs have to be also analyzed for hazards. PROPOSITION 5. The multiplexers inside the CLB are hazard-free. Proof. The select inputs of the multiplexers are hardwired when a function is mapped onto a CLB, which means that the output function of the multiplexer can be specified by a two literal product term. Consequently, the multiplexer cannot make any transitions that may have function or logic hazards. These multiplexers are logic hazard-free, because their truth tables have no adjacent or intersecting product terms (Lemmas 2 and 3). Function hazard-freeness is ascertained due to the absence of intermediate states that produce different outputs (Definitions 1 and 2). This is illustrated below.
S A B OUT AB 00 01 11 10 S= 1 1 1 0 1

The select is hardwired. Therefore, the multiplexer is hazard-free.

Figure 4. Hazard behavior of a multiplexer inside a CLB

5.3 Routing Since the CLB implements any combinational logic in a static, dynamic and essential hazard-free manner, the next analysis involves the effects of the routing delays on the hazard behavior of the selftimed cell set. This analysis has to be done individually on each element of the cell-set in order to arrive at the line delay constraints. Presently, one can only specify a line as critical and have it routed along the

shortest path, but we are unable to ensure the delay along one path be greater than the delay along another.

6. Making the Cell-Set Hazard-Free By implementing a hazard-free cell set one can ensure asynchronous circuits designed using them will be hazard-free, provided that the set is used according to the constraints they enforce on the environment[6][8][16]. We have already analyzed the hazard behavior of the LUT. Individual elements can still be vulnerable to hazards due to bursts of input changes that are not separated by sufficient time for the circuit to be stable. This time can be large due to unfavorable routing. If input-output mode of operation (i/o mode) is assumed, then new inputs can arrive as soon as an output is produced and not wait until the entire circuit has stabilized. Assuming unfavorable routing, a new feedback value in a sequential circuit might be delayed longer than the new inputs, leading to a potential hazard. The following proposition constrains the delay on the feedback in order to eliminate hazard possibilities due to worst case routing of the feedback line. Note that a self-timed element could occupy more than one CLB as shown in Figure 5. If fundamental mode of operation had been considered, then the new set of inputs cannot arrive until the circuit is stable, i.e. after the new feedback value is available.

Feedback

(FB) CLB 1
Inputs

CLB 2 inational

CLB N Logic)

()

CLB N+1 (OCIC)

()

(Comb

Outputs

Figure 5. A model to illustrate the feedback delay constraint

PROPOSITION 6. The delay on the feedback line of an element has to be less than or equal to the sum of the minimal delay in detecting the output change and producing a new input, and the minimal delay on the input line (Figure 5). 0 < (FB) min(O) + min(OCIC) + min(I) (1)

where min and max are the minimum and maximum delays respectively, O is the output line, OCIC is the combinational logic that detects the output change and produces a new input, I is the input line and FB is the feedback Proof. All the elements in the cell-set are sequential elements and thus have feedback. The feedback, denoting the present state has to be available/stable when the new inputs arrive in order to preserve hazard-free behavior. However, the inputs are not changed until the output is detected due to i/o mode of behavior, but can be changed immediately after the detection. Therefore, the delay in the feedback line, (FB), of an element (one or more CLBs) has to be less than or equal to the sum of the minimal delay in detecting an output change and producing new inputs, min(OCIC), including the delay on the output line, min(O), and the smallest of the input line delays through which the new input arrives, min(I). On the other hand, the lower bound on the feedback delay is only constrained to be positive,

because the feedback input to the element is changed only when a new output is produced and not before. Therefore, the i/o mode behavior is maintained and there is no necessity to delay the feedback input. 6.1 Hazard-free Implementation of a Muller C-Element The Muller C-element is the quintessential asynchronous circuit element. In this section we will discuss the implementation of the C-element based on the discussion above. The C-element is designed as shown in Figure 6. The inputs to the C-element, like all the other elements, have to be monotonous, i.e. two consecutive transitions are not allowed on one input line and a transition on one input line has to be followed by a transition on the other input, unless of course the output changes following the first input transition. It is the only element in the cell-set that allows multiple-input changes, and hence could have function hazards. However, the C-element is function hazard-free, because it does not have transitions where intermediate code produces a different result. Nonetheless, a large delay in the feedback line (as discussed before) or a hurried change in an input (while following i/o mode behavior) could result in hazardous behavior. This is as a result of the C-element detecting an input and producing the corresponding output before it has seen the feedback value as shown in Figure 6. In this example, the input B is changed before the new feedback value reaches the CLB. Therefore, the feedback line of the Celement should be routed so that Equation 1 is not contradicted, i.e. the feedback delay should be less than the sum of the delay to detect the output and change the input B (Figure 6).
A
F4 C4 G4 YQ G1 C1 K Y

Feedback

AB 00 01 11 10 0 1 1 1 1 1

This behavior is noticed when the input to the c-element is changed before the circuit has become stable to the new feedback value.

CLB

G3

C3 F1 F3 X XQ F2 C2 G2

B Y = G1G2 + G1G3 + G2G3 C = AB + AC + BC

A+ => Output+ => B- => Output- => Feedback+ => Output+ => Feedback- => Output- (toggles)

max(FB) < min(min(A), min (B)) + min(C) + min(OCIC). Figure 6. Hazard Behavior of a C-Element 6.2 Self-timed Cell Set Using the criteria developed in the last section we implemented the basic asynchronous macromodule set described in Sutherland[16], Brunvand[6] in the Xilinx FPGA. The cells implemented include the Muller C-element, transition LATCHes, TOGGLE, generalized C-element, CALL element, SELECT, Q-SELECT, ring-style ARBITER|, and SEQUENCER. Some performance results of each element is given in Table 1. This table also compares the Xilinx FPGA implementation with a CMOS 2m implementation[3]. The cell-set is designed assuming monotonous input discipline. The implementation assumes a two-phase transition signaling based handshake protocol with the data being transmitted with the data-bundling assumption (Table 2). The transition latches also has an ordering of its input events. The feedback delay constraints of all the elements according to Equation 1 are listed in Table 3. If all these constraints are met when the macromodules are mapped onto the Xilinx FPGA, then the entire cell set will be hazard-free.

10

Asynchronous Macromodules C-Element Latch Toggle Generalized C-Element Call Select Q-Select Q-Select with Init Arbiter Sequencer

Xilinx # of CLBs 1/2 1/2 1 2 2 2 4 6 8 12

FPGA Xdelay (ns) 7.7 7.8 7.6 10.9 7.7 12.1 46.0 46.8 40.8 40.8

2m Area (sq. m) 169x52 176x99 331x274 299x113 300x165 308x208 675x378 365x328

Library Delay (ns) 2.146 2.064 3.38 3.442 1.406 2.501 3.711 6.109

Table 1. Self-Timed Asynchronous Cell Set

Elements Transition Latches (transparent) (opaque) Select

Behavior Constraints max(D) < min(min(C), min (P)) max(C) < min(P) max(P) < min(C) max(SEL) < min(IN) Table 2. Behavior Constraints

Elements C-Element Transition Latches Toggle Call Select

Feedback Delay Constraints max(FB) < min(min(A), min (B)) + min(C) + min(OCIC) max(FB) < min(min(D), min (C), min (P)) + min(Q) + min(OCIC) max(FB) < min(IN) + min(min(OUT0), min (OUT1)) + min(OCIC) max(FB) < min(min(R1), min (R2), min (As)) + (A1) + min(OCIC) max(FB) < min(min(SEL), min (IN)) + min(min(OUTT), min (OUTF)) + min(OCIC)

Table 3. Feedback Delay Constraints for a Hazard-Free Cell-Set

7. Examples The following sections give two examples of using this cell set to build self-timed circuits. 7.1 Modulo Counters This first example consists of three modulo counters: modulo-3, 5 and 7 (refer to Figure EX.1 in the Appendix). They are constructed using Toggle elements which behave like counters, and XOR gates for initialization purposes. A Toggle element directs its input transition to its outputs alternately starting with OUT0 (the circled output). By connecting Toggle elements in a ring, counters of varied magnitude can be built. In order to ensure correct operation of the modulo counters, the circuit has to be initialized (when the maximum value is reached) in such a way that the next input transition to the circuit will be steered to the OUT0 output of every Toggle element in the ring. This is guaranteed by the XOR element, which causes a transition on OUT1 of chosen Toggle elements during initialization. For example, in the modulo-3 counter, the first Toggle element has to be initialized after counting up to three, because its final

11

output transition is on OUT0 (when the count reaches three) and not on OUT1 as in the case of the second Toggle. Consequently, the second Toggle element does not have to initialized after each counting cycle. 7.2 An Arbitration Circuit This circuit was designed in order to test as many of the macromodules as possible and could have no significant practical use. It consists of a Call element, an Arbiter and two Qselect elements all communicating with each other (Figure EX.2 in the Appendix). The Arbiter and the Call element together choose between two requesters who wish to access a shared resource. One requester is chosen while the other awaits its turn. The shared resource in this case is a router circuit, which in round-robin fashion checks for data on either one of its two input channels[6]. When data is available on any of its input streams, an acknowledgment is sent back to the user of the shared resource. The Qselinit prevents any new data from reaching itself or the Qselect when it has received its data. Examples Modulo-3 Counter Modulo-5 Counter Modulo-7 Counter Arbiter Circuit 4x4 Micropipelined Multiplier Xilinx # of CLBs 3 4 4 21 75 FPGA XDelay (ns) 34.5 46.8 46.9 171.6 184.5

Table 4. Performance of the example circuits 7.3 Protocol Converters Different modules use different signaling conventions in self-timed systems. In order to connect together these modules, protocol converters must be used as interfaces between two-phase and four-phase signaling modules. The four-phase protocol could use narrow or broad convention, and the examples we use are based on the narrow convention (Figures EX.3 and 4). 7.4 FIFO Buffer This circuit is a 2-bit First-In First-Out (FIFO) buffer (Figure EX.5). There is a requestacknowledge interface both at the input as well as the output. A FIFO buffer accepts new data upon initialization. After that a buffer can accept new data only after its successor buffer has accepted the data that was present at its output. 7.5 Distributed Mutual Exclusion The arbiter module can be used to give exclusive access of a shared resource to a process, when more than one independent process is competing for that resource. Instead of dealing with arbitration, the contending processes can also be set up to arbitrate amongst themselves. This scheme is known as Distributed Mutual Exclusion[3] (Figure EX.6).

12

Examples

Two-Phase to Four-Phase Narrow Converter Four-Phase Narrow to 2 34.1 Two-Phase Converter FIFO Buffer 2 46.0 Distributed Mutual Exclusion 14 130.2 * The Xdelay values include input and output pad delays.

Xilinx # of CLBs 2

FPGA Xdelay (ns)* 46.9

2m Area (sq. m) 344x344 300x165 528x378 745x600

Library Delay (ns) 5.257 5.513 8.915 23.696

Table 5. Comparison with CMOS equivalent circuits

8. Key Contributions Showed that circuits designed using LUTs are logic hazard-free, but could produce function hazards

for multiple input changes.


A set of feedback delay constraints for each of the self-timed elements that are necessary to achieve

hazard-free behavior was formulated. When placing and routing the macromodules, these constraints have to be met.

9. Conclusion and Future Work Implementing asynchronous circuits in a given technology demands a careful characterization of the hazard behavior in the technology. This paper examines the issues of mapping asynchronous cell libraries into Xilinx FPGA. A Xilinx FPGA is made up of look-up tables which are different in terms of timing characteristics from simple gates like AND, OR, NAND, etc. Hence, we first provide a survey of existing work on the hazard analysis which is applicable to simple gate-level implementations. Then we characterize the behavior of a LUT using the Xilinx 4000 family FPGA as an example and present delay constraints that have to be satisfied to achieve a hazard-free implementation of an asynchronous cell library for control circuits. Of course, the proposed scheme is valid only if the circuit elements are used according to their environmental constraints. In addition, the scheme does not eliminate function hazards. Implementation of self-timed datapath in a FPGA is being investigated currently.

13

REFERENCES

[1] D.B. Armstrong, A. D. Friedman and P.R. Menon, "Realization of Asynchronous Sequential Circuits Without Inserted Delay Elements," IEEE Transactions on Computers, C-17(2):129-134, Feb. 1968. [2] J. Beister, "A Unified Approach to Combinational Hazards," IEEE Transactions on Computer, C23(6):566-575, Jun. 1974. [3] N. Birak, Implementation of Self-Timed CMOS Circuits, MS Thesis, Dept. of Electrical and Computer Engineering, Univ. of California, Davis, 1995. [4] J.G. Bredeson, "Synthesis of multiple input-change hazard-free combinational switching circuits without feedback," International Journal of Electronics, 39(6):615-624, 1975. [5] J.G. Bredeson and P.T. Hulina, "Elimination of Static and Dynamic Hazards for Multiple Input Changes in Combinational Switching Circuits," Information and Control, 20(2):114-124, Mar. 1972. [6] E. Brunvand, "A Cell Set for Self-Timed Design using Actel FPGAs," Research Report UUCS-91-013, Dept. of Computer Science, Univ. of Utah, Aug. 1991. [7] J.A. Brzozowski and J. C. Ebergen, On the Delay-Sensitivity of Gate Networks," IEEE Transactions on Computers, 41(11):1349-1359, Nov. 1992. [8] J.C. Ebergen, A Formal Approach to Designing Delay-Insensitive Circuits, Report, Dept. of Computing Science and Mathematics, Eindhoven University of Technology, May 1988. [9] E.B. Eichelberger, "Hazard Detection in Combinational and Sequential Switching Circuits," IBM Journal of Research and Development, p.90-99, Mar. 1965. [10] A.D Friedman, Fundamentals of Logic Design and Switching Theory, Madison: Computer Science Press, 1986, p.161-219. [11] S. Hauck, S. Burns, G. Borriello and C. Ebeling, An FPGA for Implementing Asynchronous Circuits, Report, Dept. of Computer Science and Engineering, Univ. of Washington. [12] L. Lavagno and A. Sangiovanni-Vincentelli, Algorithms for Synthesis and Testing of Asynchronous Circuits, Massachusetts: Kluwer Academic Publishers, 1993. [13] S.M. Nowick and D.L. Dill, "Exact Two-Level Minimization of Hazard-Free Logic with MultipleInput Changes," IEEE International Conference on Computer-Aided Design, p.626-30, 1992. [14] C.H. Roth, Jr., Fundamentals of Logic Design, Third Edition, Minnesota: West Publishing Company, 1985. [15] P. Siegel, G.D. Micheli and D. Dill, "Automatic Technology Mapping for Generalized FundamentalMode Asynchronous Designs," Technical Report CSL-TR-93-580, Computer Systems Laboratory, Stanford University, Jun. 1993. [16] I.E. Sutherland, Micropipelines, Communications of the ACM, 32(6):720-58, June 1989. [17] S. Trimberger, Beyond Logic - FPGAs for Digital Systems, FPGAs: Edited from the Oxford 1991 International Workshop on Field Programmable Logic and Applications, Abingdon, England : Abingdon EE&CS Books, 1991.

14

[18] J-M. Vuillamy, Z.G. Vranesic and J. Rose, Performance Evaluation and Enhancement of FPGAs, FPGAs, Abingdon, England : Abingdon EE&CS, 1991. [19] Xilinx Incorporation, The XC4000 Data Book, p.2-9,10; 2-70,71; 9-5, 1993. [20] S.H. Unger, "Asynchronous Sequential Switching Circuits," New York: Wiley-Interscience, 1969. [21] Private Communication from Xilinx.

15

S-ar putea să vă placă și