High Performance Single Clock Cycle CMOS Comparator

High Performance Single Clock Cycle CMOS Comparator Hing-mo Lam, Chi-ying Tsui Department of Electronic and Electrical Engineering ‘The Hon Kong University of Science and Technology Hong Kong, SAR, PRC eejeff'@ee ust. ik, eetsui@ee-ust hk Abstract—In this paper, a novel comparison algorithm is introduced, which uses’a parallel MSB checking method instead of the traditional priority-encoding based comparison algorithm. By doing so, fast dynamic NOR gates are used instead of high-fanin NAND gates and this results in significant improvement in perforn _ test chip has been built in AMS 0.3Sum technology and from both post-layout simulation and test chip measurement results, it is shown that the proposed design is 22% faster than the cesisting fastest single-cycle comparator based on priority- encoder respect L High speed comparator is a fundamental computation clement for most of the digital systems such as state-of-the- i microprocessor and DSP design. The simplest way to implement a comparator is using an adder and for this, the adder performance becomes the major limitation of the Comparator operating speed and it needs Tot of transistors if a very high speed adder is used. Wang et.al. [2] proposed to use a tree structure with the all-n-transistor (ANT) dynamic CMOS logic to build a fast comparator. Heavy pipelining is used for this design and it can achieve a very fast clock speed. However due to the pipelining, each comparison needs three clock cycles to finish. For applications that need single-cycle comparison, this design may not be suitable. Recently Huang ef, al. proposed a single-cycle comparator based on the priority-encoding algorithm and dynamic circuit design technique [1]. It was shown that it is 16% faster than the design in [2] when measured by the total execution delay The core element of the computation is the priority encoder which is based on a dynamic NAND-gate design [3]. For a ‘S-it comparator the longest discharge path can be up to 7 mos transistors, The long discharge path of the large-fanin dynamic NAND gate becomes the bottleneck of the performance of the comparator, In this work, we propose a parallel MSB comparison algorithm which does not need a priority encoder. The algorithm facilitates the use of NOR- Ixrropucnioy This research in pa suppored by the Hong Kong Reseach Grant Couns under grant CERG HKUST296 048 (0-7803-9390-2/06/820,00 ©2006 IEEE oo) ‘gate logic in the implementation and hence results in high performance when dy namic logic is used. u, ‘The comparison algorithm used in [1] is very straight forward. Figure 1 shows the algorithm, Here we just describe the steps to check which number is greater, In the first step. XOR gate is used to detemmine whether cach corresponding bit of the two numbers is equal oF not. In the second step, a priority encoder [3] is used to set the most significant unequal bit of the result from step 1 to ‘1° and reset all other its o “0°, In the third step, the result of step 2 is “ANDed” with the two input mumbers A and B._ In the final step, all the bits of the results of step 3 are “ORed” together to determine which number is greater, Figure 2 shows the implementation of the 8 bit comparator (This figure is reproduced from [1]) which is constructed by cascading two 4-bit comparators due to the long discharge ppath in the priority encoder. It can be seen that a 7-input dynamic NAND gate is needed for the 8 bit comparator and this will affect the performance. EXISTING DESIGN BASED ON PRIORITY ENCODER [I] m1 ‘The proposed algorithm uses a parallel MSBs bit cchecking method instead of priority encoding to determine the location of the frst significant bit that the two inputs are different, Using this method facilitates the use of NOR-type logic gate and results in faster speed for dynamic logic implementation. Tite PROPOSED ALGORITIM Figure 3 shows a 4-bit example of the proposed algorithm (The structure of the proposed design is very easy to expand from 4-bit to more bits without any cascading of the comparators). Let A and B be two 4-bit binary inputs rmbers, their valucs are 41010 and 41001, respectively ‘The proposed comparison algorithm is divided into 4 steps. In the first step, we use AND logic gate to compute both ISCAS 2006AB and BA. Unlike the algorithm in [1], which uses XOR logic gate to find the bit locations where A and Bare ferent, we also keep the information of which nunber is larger at that particular bit location. For example. ‘AB 4"bO010 indicates that at bit location 1, A is larger than B, and A is either smaller or equal to B at the other bit locations. seontrAza Et, EE secon reco * Foc tang 06 —> Bis Figue 1. ‘The digram ofthe algorithm in [2] Figure 2. The schema ofthe 8-bit comparator in [1] en FN Soh gree aie Se een _, Fen Eien gen igre 3. Te dag ofthe propo algritin In the second step, we do a data conversion (from Ail 10 ‘A® and AP. to B®) to determine the most significant bit that is a “1° in the result of step 1. Different from the prionty encoder, instead of setting the most significant I-bit to “I and resting all the other bits to 0", we set al the preceding bits of the most signifieant -bit (not including the most significant L-bit itself) to “I” and reset all the other bits to “0 For example, in Figure 3, A* and B* are 4b°1100 and 40°1110, respectively. By doing so the implementation can be done using NOR type of dynamic logic. We will discuss this inthe next section. Actually, the preceding I’s indicate the bit locations of corresponding number is smaller than or equal to the same bit locations of another number starting from the MSB, Note that for A® and B*, they will not have the same running length of 0° In step 3, we caleulate A*B* and AB. WAY has a longer running length of zero, A*7™* will be all zero and ‘ASB will have some bits equal to 1, and vice versa, For example, in Figure 3, the oupuls of A*B* and Av3* are +0000 and 4"60010, respectively. In step 4, we check whether the result of step 3 is an all zero vector or not by ORing all the bits together. A corresponding zero vector means thatthe other input is the sreater one, In Figure 3, after step 4, we know that A is 780. larger than B since Bis an all zero vector but A: not IV. Figure 4 shows the block diagram of a 8-bit comparator based on the proposed algorithm introduced in the previous IMPLEMENTATION OF THE PROPOSED ALGORITHM =} Figure 4. The block diagram af the proposed ®t comparator We se static pass-gate logic to implement the AND logic in step 1. Figure 5 (b) shows the schematic of the AND logic ite. Figure 6 shows the design of the data converter used in slep 2. Basically itis just a bunch of dynamic NOR gales with number of inputs ranged from 1 to 8. Ths i also ‘the main difference between our design and the design in [1] ‘where dynamic NAND gate is used to implement the priority encoder. It is shown that our approach can be sealed up easily since NOR gate is used. While for the design in [1] cascaded 4-bit comparators need to be used to implement the ‘bit comparator in order to limit the number of fanin of the NAND gate. In Figure 6, the outputs (A*; down to A*,) of the dynamic NOR gates are preclarged to ‘I’ at the beginning; depends on the location of the most significant 1- bit where A larger than B, the preceding bits will remain in “1 and al the rest bit willbe discharged to "0" including the most significant l-bit itself. Same situation for the output of BY, down to B*, For the step 3, we use a dynamic 2 input AND gate (as shown in Figure 7) to implement. The cascading between the n-type NOR gate instep 2 wit the n- ype AND gate in step 3 may pose some racing issue because fof some non-inverting outputs from dynamic n-type NOR gale. However, since the path through the inverter (ie. the ‘A® input for A%8*) always comes later than the B* input, the race problem will not occur. Figure 9 (b) shows the example of the A, and BY, inputs for A#B%. We verified this through HSPICE simulation and the results are shown in Figure 9 (a). The worst case timing scenario, ic. Aty is the fastest output (with seven pulldown paths are on) and B*y is the latest output (with only one pulldown path is on), was used for the simulation. It can be seen that BY, input to the AND gate goes to 0 much earlier than %, goes to 1 and hence there is no race issue. Finally. step 4 is just implemented by a dynamic OR gate which is showa in Figure 8 To implement a high fanin comparator, say 64-bit comparator, we use a hierarchical approach similar to that used in [1]. The 64-bit comparator is broken down into 2 stages, each stage is implemented using 8-bit comparator as the basic component, Figure 10 shows the block diagram ofthe implementation of the 64-bit comparator. To decrease the clock cycle time, a phase pipelining approach similar 10 [1] is used. The first stage of the comparator is pre-charged at negative clock phase and evaluated at the positive clock phase. The second stage is pre-chuarged at the positive clock phase and evaluated at the negative clock phase. To make sure the inputs to the dynamic gate are stable when the evaluation clack phase is active, the outpat af the static AND gate of the 8-bit comparator should be stable at the clock edge and the delay of the AND gate can be treated as the setup time of previous stage of the pipeline. For a clock with. 50-0 duty cycle, the delay of the dynamic gates in the $-bit comparator and the delay of the static AND gate of the Comparator at the next stage (treated as the setup time of the lath) dictate the eycle time of the clock. ry Figure $. The schematic ofthe AND loge in stp 1 (eT cet ae eos a5 < Fines 6. Therma an erent prop sos te, So A TH Dow Figure. ‘The schematic of step 3 inthe proposed algritho 781 room! ad at Feet Bee oe Figure & The schematic ofthe sep 4 in [1] andthe proposed algorithm V.__ SIMULATION AND MEASUREMENT RESULTS ‘Two 64-bit comparators were implemented in AMS 0.35um process technology, one based on the proposed design and the other based on the design in [1]. The layouts are shown in Figure 11. Postlayout simulation was carried ‘out using HSPICE with typical simulation model of PMOS and NMOS, Table 1 summarizes the area, number of transistors and the performance of the two comparators. 3.3V is used as the supply voltage. To optimize the delay for both designs, the transistor sizing of each design is different. The transistor size along the critical path (MO ~ MII) for design in [1] (Figure 2) is 10um/0.3Sum, and the transistor size for the NOR gate (Figure 6) in the proposed design is 2,6Sum/0,35um. To have a fair comparison of the arca, the ‘unused area due to different placement and routing of the ‘two designs should not be accounted for. Hence, in Table | wwe use the layout area of a 8-bit comparator instead of the 64-bit comparator for area comparison. From Table 1, we ccan see that the total post layout simulation delay of the design in [1] is 2.20ns and that of the proposed design is 1.7 Ins, which means a 22% improvement in performance e Figure 9. SPICE simulation rest forthe racing se ‘The test chip of the proposed design and the design in [1] has been fabricated and the die photo has been shown in figure 12. In order to save VO pads, we connect all same-changing input signals to the single input pad. Figure 13 shows the measured waveforms of the design in [1] and the proposed design. The measured clock signals are the output of clock buffers. The table 1 summarizes the measurement results, which shows that the total delay of design in [1] is 2.793ns and the proposed design is 2.165us, which means 22% improvement in performance. VI. Cone sion A novel comparison algorithm has been presented in this paper, which uses a parallel MSB checking method instead Of the priority-encoder based design, This facilitates the use of NOR4ype gate and results in higher performance. Both, HSPICE simulation and measurement result show that the proposed design is around 22% faster than the current fastest single-cycle design in [1), Figure 12 Photograph ofthe facta chip ee ets/ etal e vis im, mecumciacarsmaenrmer racic el ——— ae eee pan , a goa hy By or Be ao Fhe Veo Pe wil Powe g Reba t—s . Tee Figure 10, The block diagram of the 64-bit comparator Figure 11, Layouts ofthe proposed design andthe design i (1) 782 BI Figure 13. (a) Measred waveforms ofthe design in [1]. ¢b) Measured ‘eavefirms ofthe proposed desi. REFERENCES. CChung-Hsun Huang: Jiw-Shyan Wang, “Highperormance and poweraffcient CMOS comparator”, IEEE J. SoldStat Crit NOL 38 pp. 254-262, Feb. 2008, Wang, C-C; Wa, C-Fi Tai, K-C "I Gllz Gtbit highspeed comparator wing ANT. namie Topic with ove-phase clocking’, ‘Computers and) Digital Tecimigues, TEE. Proceedings-Volume 14S, sue 6, No, 1998 Papes)483 6, 3-8. Wang and C-H. Huang, “High-speed and low-power CMOS ‘ort encoder”, IEEE 4 Solid-State Cirtits, vo. 85, pp 151 TSt4, Ost 2000

High Performance Single Clock Cycle CMOS Comparator

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

High Performance Single Clock Cycle CMOS Comparator

Încărcat de

Drepturi de autor:

Formate disponibile

S-ar putea să vă placă și