Sunteți pe pagina 1din 512

Contents

Preface xi

About the Author xiii

Part I Original Contributions

Devices and Circuits for Phase-Locked Systems 3


B. Razavi

Delay-Locked Loops—An Overview 13


C-K. Ken Yang

Delta-Sigma Fractional-TV Phase-Locked Loops 23


/. Galton

Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems 34
R. C. Walker

Predicting the Phase Noise and Jitter of PLL-Based Frequency Synthesizers 46

K. S. Kundert

Part II Devices

Physics-Based Closed-Form Inductance Expression for Compact Modeling of Integrated Spiral Inductors 73
S. Jenei, B. K. J C. Nauwelaers, and S. Decoutere {IEEE Journal ofSolid-State Circuits, January 2002)
The Modeling, Characterization, and Design of Monolithic Inductors for Silicon RF IC's 77
J R. Long and M. A. Copeland {IEEE Journal of Solid-State Circuits, March 1997)

Analysis, Design, and Optimization of Spiral Inductors and Transformers for Si RF IC's 89
A. M. Niknejad, and R. G. Meyer {IEEE Journal of Solid-State Circuits, October 1998)

Stacked Inductors and Transformers in CMOS Technology 101


A. Zolfaghari, A. Chan, and B. Razavi {IEEE Journal of Solid-State Circuits, April, 2001)

Estimation Methods for Quality Factors of Inductors Fabricated in Silicon Integrated Circuit
Process Technologies 110
K. O {IEEE Journal of Solid-State Circuits, August 1998)

A Q-Factor Enhancement Technique for MMIC Inductors 114


M. Danesh, J. R. Long, R. A. Hadaway, and D. L. Harame {Dig. IEEE Radio Frequency Integrated Circuits
Symposium, April 1998)

On-Chip Spiral Inductors with Patterned Ground Shields for Si-Based RF IC's 118
C. Patrick Yue and S. S. Wong {IEEE Journal of Solid-State Circuits, May 1998)
The Effects of a Ground Shield on the Characteristics and Performance of Spiral Inductors 127
S.-M. Yim, T. Chen, and K. O {IEEE Journal of Solid-State Circuits, February 2002)

Temperature Dependence of Q and Inductance in Spiral Inductors Fabricated in


a Silicon-Germanium/BiCMOS Technology 135
R. Groves, D. L. Harame, and D. Jadus (IEEE Journal of Solid-State Circuits, September 1997)

Substrate Noise Coupling Through Planar Spiral Inductor 140


A. L Pun, T. Yeung, J Lau, E J R. Clement, and D. K. Su (IEEE Journal of Solid-State Circuits, June 1998)

Design of High-g Varactors for Low-Power Wireless Applications Using a Standard CMOS Process 148
A.-S. Porret, T. Melly, C C Enz, and E. A. Vittoz. (IEEE Journal of Solid-State Circuits, March 2000)

On the Use of MOS Varactors in RF VCO's 157

P. Andreani and S. Mattisson (IEEE Journal of Solid-State Circuits, June 2000)

Part III Phase Noise and Jitter

Low-Noise Voltage-Controlled Oscillators Using Enhanced LC-Tanks 165


J. Craninckx and M. Steyaert (IEEE Transactions on Circuits and Systems-II, December 1995)
A Study of Phase Noise in CMOS Oscillators 176
B. Razavi (IEEE Journal of Solid-State Circuits, March 1996)

A General Theory of Phase Noise in Electrical Oscillators 189


A. Hajimiri, andT.H Lee (IEEE Journal of Solid-State Circuits, February 1998)

Physical Processes of Phase Noise in Differential LC Oscillators 205


J. J. Rael, and A. A. Abidi (IEEE Custom Integrated Circuits Conference, May 2000)

Phase Noise in LC Oscillators 209


K. A. Kouznetsov and R. G. Meyer (IEEE Journal of Solid-State Circuits, August 2000)

The Effect of Varactor Nonlinearity on the Phase Noise of Completely Integrated VCOs 214
JWM. Rogers, J A. Macedo, and C Plett (IEEE Journal of Solid-State Circuits, September 2000)

Jitter in Ring Oscillators 221


JA. McNeill (IEEE Journal of Solid-State Circuits, June 1997)

Jitter and Phase Noise in Ring Oscillators 231


A. Hajimiri, S. Limotyrakis, andT. H Lee (IEEE Journal of Solid-State Circuits, June 1999)

A Study of Oscillator Jitter Due to Supply and Substrate Noise 246


E Herzel, and B. Razavi (IEEE Transactions on Circuits and Systems-II, January 1999)

Measurements and Analysis of PLL Jitter Caused by Digital Switching Noise 253
P. Larsson (IEEE Journal of Solid-State Circuits, July 2001)

On-Chip Measurement of the Jitter Transfer Function of Charge-Pump Phase-Locked Loops 260

B. R. Veillette, and G. W.Roberts (IEEE Journal ofSolid-State Circuits, March 1998)

Part IV Building Blocks

A Low-Noise, Low-Power VCO with Automatic Amplitude Control for Wireless Applications 271
M.A. Margarit, J. L. Tham, R. G Meyer, and M. J. Been (IEEE Journal of Solid-State Circuits, June 1999)
A Fully Integrated VCO at 2 GHz 282
M. Zannoth, B. Kolb, J. Fenk, and R. Weigel (IEEE Journal of Solid-State Circuits, December 1998)

vi
Tail Current Noise Suppression in RF CMOS VCOs 287
RAndreani and K Sjoland {IEEE Journal ofSolid-State Circuits, March 2002)

Low-Power Low-Phase-Noise Differentially Tuned Quadrature VCO Design in Standard CMOS 294
M. Tiebout {IEEE Journal of Solid-State Circuits, July 2001)

Analysis and Design of an Optimally Coupled 5-GHz Quadrature LC Oscillator 301


J. van der Tang, P. van de Ven, D. Kasperkovitz, and A. van Roermund {IEEE Journal of Solid-State Circuits,
May 2002)

A 1.57-GHz Fully Integrated Very Low-Phase-Noise Quadrature VCO 306


P. Vancorenland and M. S. J Steyaert {IEEE Journal of Solid-State Circuits, May 2002)

A Low-Phase-Noise 5GHz Quadrature CMOS VCO Using Common-Mode Inductive Coupling 310
S. L. J. Gierkink, S. Levantino, R. C. Frye, and V. Boccuzzi {European Solid-State Circuits Conference,
September 2002)

An Integrated 10/5GHz Injection-Locked Quadrature LC VCO in a 0.18jjLm Digital CMOS Process 314
A. Ravi, K. Soumyanath, L. R. Carley, and R. Bishop {European Solid-State Circuits Conference, September 2002)

Rotary Traveling-Wave Oscillator Arrays: A New Clock Technology 318


J. Wood and S. Lipa {IEEE Journal of Solid-State Circuits, November 2001)

35-GHz Static and 48-GHz Dynamic Frequency Divider IC's Using 0.2-jjum AlGaAs/GaAs-HEMT's 330
Z. Lao, W. Bronner, A. Thiede, M. Schlechtweg, A. Hulsmann, M. Rieger-Motzer, G. Kaufel, B. Raynor, and M. Sedler
{IEEE Journal of Solid-State Circuits, October 1997)

Superharmonic Injection-Locked Frequency Dividers 337


H. R. Rategh and T. H. Lee {IEEE Journal of Solid-State Circuits, June 1999)

A Family of Low-Power Truly Modular Programmable Dividers in Standard 0.35-|xm CMOS Technology 346
C. S. Vaucher, I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z Wang {IEEE Journal of Solid-State Circuits,
July 2000)

A 1.75-GHz/3-V Dual-Modulus Divide-by-128/129 Prescaler in 0.7-|mm CMOS 353


J. Craninckx and M. S. J. Steyaert {IEEE Journal of Solid-State Circuits, July 1996)

A 1.2 GHz CMOS Dual-Modulus Prescaler Using New Dynamic D-Type Flip-Flops 361
B. Chang, J Park, and W Kirn {IEEE Journal of Solid-State Circuits, May 1996)

High-Speed Architecture for a Programmable Frequency Divider and a Dual-Modulus Prescaler 365
P. Larsson {IEEE Journal of Solid-State Circuits, May 1996)

A 1.6-GHz Dual Modulus Prescaler Using the Extended True-Single-Phase-Clock CMOS Circuit
Technique (E-TSPC) 370
J N. Soares, Jr. and W A. M. Van Noije {IEEE Journal of Solid-State Circuits, January 1999)

A Simple Precharged CMOS Phase Frequency Detector 376

H. O. Johansson {IEEE Journal of Solid-State Circuits, February 1998)

Part V Clock Generation by PLLs and DLLs

A 320 MHz, 1.5 mW @ 1.35 V CMOS PLL for Microprocessor Clock Generation 383
V von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra {IEEE Journal of Solid-State Circuits, Nov. 1996)
A Low Jitter 0.3-165 MHz CMOS PLL Frequency Synthesizer for 3 V/5 V Operation 391
H. C Yang, L. K. Lee, and R. S. Co {IEEE Journal of Solid-State Circuits, April 1997)

VII
Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques 396
1 G. Maneatis (IEEE Journal ofSolid-State Circuits, Nov. 1996)

A Low-Jitter PLL Clock Generator for Microprocessors with Lock Range of 340-612 MHz 406
D. W. Boerstler (IEEE Journal of Solid-State Circuits, April 1999)

A 960-Mb/s/pin Interface for Skew-Tolerant Bus Using Low Jitter PLL 413
S Kim, K. Lee, Y Moon, D.-K. Jeong, Y Choi, and H K. him (IEEE Journal of Solid-State Circuits, May 1997)

Active GHz Clock Network Using Distributed PLLs 422


V Gutnik and A. P Chandrasakan (IEEE Journal of Solid-State Circuits, Nov. 2000)

A Low-Noise Fast-Lock Phase-Locked Loop with Adaptive Bandwidth Control 430


J. Lee andB. Kim (IEEE Journal of Solid-State Circuits, August 2000)

A Low-Jitter 125-1250-MHz Process-Independent and Ripple-Poleless 0.18-|xm CMOS PLL Based on a


Sample-Reset Loop Filter 439
A.Maxim, B. Scott, E. M. Schneider, M. L. Hagge, S. Chacko, and D. Stiurca (IEEE Journal of Solid-State Circuits,
Nov. 2001)

A Dual-Loop Delay-Locked Loop Using Multiple Voltage-Controlled Delay Lines 449


Y-JJung, S.-W.Lee, D. Shim, W.Kim, and C Kim (IEEE Journal of Solid-State Circuits, May 2001)

An All-Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide-Range Operation
and Low-Jitter Performance 456
Y. Moon, J Choi, K. Lee, D.-K. Jeong, and M.-K. Kim (IEEE Journal of Solid-State Circuits, March 2000)

A Semidigital Dual Delay-Locked Loop 464


S. Sidiropoulos and M. A. Horowitz (IEEE Journal of Solid-State Circuits, Nov. 1997)

A Wide-Range Delay-Locked Loop with a Fixed Latency of One Clock Cycle 474
H.-H. Chang, J.-W. Lin, C-Y Yang, and S.-I Liu (IEEE Journal of Solid-State Circuits, August 2002)

A Portable Digital DLL for High-Speed CMOS Interface Circuits 481


B. W. Garlepp, K S Donnelly, J. Kim, P. S. Chan, J L Zerbe, C Huang, C V Tran, C. L. Portmann, D. Stark,
Y-F. Chan, T. H. Lee, and M. A Horowitz (IEEE Journal of Solid-State Circuits, May 1999)

CMOS DLL-Base 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated
Tunable Oscillator 493
C J. Foley and M. P Flynn (IEEE Journal of Solid-State Circuits, March 2001)

A 1.5 V 86 mW/ch 8-Channel 622-3125-Mb/s/ch CMOS SerDes Macrocell with Selectable Mux/Demux Ratio 499
F. Yang, J. O 'Neill, P Larsson, D. Inglis, and J. Othmer (Dig. International Solid-State Circuits Conference, Feb. 2002)

A Register-Controlled Symmetrical DLL for Double-Data-Rate DRAM 502


F Lin, J Miller, A. Schoenfeld, M. Ma, and R. J Baker (IEEE Journal of Solid-State Circuits, April 1999)

A Low-Jitter Wide-Range Skew-Calibrated Dual-Loop DLL Using Antifuse Circuitry for High-Speed DRAM 506
S. J Kim, S. H. Hong, J.-K. Wee, J. H Cho, P. S. Lee, J. H Ahn, and J Y Chung (IEEE Journal of Solid-State Circuits,
June 2002)

Part VI RF Synthesis

An Adaptive PLL Tuning System Architecture Combining High Spectral Purity and Fast Settling Time 517
C S. Vaucher (IEEE Journal of Solid-State Circuits, April 2000)

A 2-V 900-MHz Monolithic CMOS Dual-Loop Frequency Synthesizer for GSM Receivers 530
W.S.T. Yan and H C Luong (IEEE Journal of Solid-State Circuits, Feb. 2001)

viii
A CMOS Frequency Synthesizer with an Injection-Locked Frequency Divider for a 5-GHz Wireless
LAN Receiver 543
H R. Rategh, H Samavati, and T. H Lee {IEEE Journal ofSolid-State Circuits, May 2000)

A 2.6-GHz/5.2-GHz Frequency Synthesizer in 0.4-|xm CMOS Technology 551


C. Lam and B. Razavi (IEEE Journal of Solid-State Circuits, May 2000)

Fast Switching Frequency Synthesizer with a Discriminator-Aided Phase Detector 558


C.-Y. Yang and S.-L Liu (IEEE Journal of Solid-State Circuits, Oct. 2000)

Low-Power Dividerless Frequency Synthesis Using Aperture Phase Detection 566


A. R. Shahani, D. K. Shaeffer, S. S. Mohan, H Samavati, H R. Rategh, M. del M. Hershenson, M. Xu, C. P Yue,
D. J Eddleman, M A. Horowitz, and T. H Lee (IEEE Journal of Solid-State Circuits, Dec. 1998)

A Stabilization Technique for Phase-Locked Frequency Synthesizers 574


T.-C. Lee and B. Razavi (Dig. Symposium on VLSI Circuits, June 2001)

A Modeling Approach for X-A Fractional-TV Frequency Synthesizers Allowing Straightforward Noise Analysis 578
M. H Perrott, M. D. Trott, and C G. Sodini (IEEE Journal of Solid-State Circuits, Aug. 2002)

A Fully Integrated CMOS Frequency Synthesizer with Charge-Averaging Charge Pump and Dual-Path
Loop Filter for PCS- and Cellular-CDMA Wireless Systems 589
Y Koo, H Huh, Y Cho, J Lee, J Park, K Lee, D.-K. Jeong, and W. Kim (IEEE Journal of Solid-State Circuits,
May 2002)

A 1.1-GHz CMOS Fractional-TV Frequency Synthesizer With a 3-b Third-Order 2-A Modulator 596
W.Rhee, B.-S. Song, and A. AH (IEEE Journal of Solid-State Circuits, Oct. 2000)

A 1.8-GHz Self-Calibrated Phase-Locked Loop with Precise I/Q Matching 603


C.-H. Park, O. Kim, and B. Kim (IEEE Journal of Solid-State Circuits, May 2001)

A 27-mW CMOS Fractional-TV Synthesizer Using Digital Compensation for 2.5-Mb/s GFSK Modulation 610
M. H Perrott, T. L Tewksbury III, and C G. Sodini (IEEE Journal of Solid-State Circuits, Dec. 1997)

A CMOS Monolothic 2A-Controlled Fractional-N Frequency Synthesizer for DSC-1800 622

B. De Mauer and M. S. J Steyaert (IEEE Journal of Solid-State Circuits, July 2002)

Part VII Clock and Data Recovery

A 2.5-Gb/s Clock and Data Recovery IC with Tunable Jitter Characteristics for Use in LAN's and WAN's 635
K. Kishine, N. Ishihara, K Takiguchi, and H Ichino (IEEE Journal of Solid-State Circuits, June 1999)
Clock/Data Recovery PLL Using Half-Frequency Clock 643
M. Ran, T. Oherst, R. Lares, A. Rothermel, R. Schweer, and N. Menoux (IEEE Journal of Solid-State Circuits,
July 1997)

A 0.5-jxm CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling 647
C.-K. K. Yang, R. Farjad-Rad, andM.A. Horowitz (IEEE Journal of Solid-State Circuits, May 1998)

A 2-1600-MHz CMOS Clock Recovery PLL with Low- Vdd Capability 656
P Larsson (IEEE Journal of Solid-State Circuits, Dec. 1999)

SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application 666
Y M. Greshishchev and P Schvan (IEEE Journal of Solid-State Circuits, Sept. 2000)

A Fully Integrated SiGe Receiver IC for 10-Gb/s Data Rate 673


Y M. Greshishchev, P Schvan, J L Showell, M.-L Xu, J J Ojha, andJ E. Rogers (IEEE Journal of Solid-State
Circuits, Dec. 2000)

ix
A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector 681
J. Savoj and B. Razavi (IEEE Journal of Solid-State Circuits, May 2001)

A 10-Gb/s CMOS Clock and Data Recovery Circuit with Frequency Detection 688
J. Savoj and B. Razavi (Dig. International Solid-State Circuits Conference, Feb. 2001)

A 10-Gb/s CDR/DEMUX with LC Delay Line VCO in 0.18p,m CMOS 691


J. E. Rogers andJ. R. Long (Dig. International Solid-State Circuits Conference, Feb. 2002)

A 40-Gb/s Integrated Clock and Data Recovery Circuit in a 50-GHz/y, Silicon Bipolar Technology 694
M. Wurzer, J. Bock, H. Knapp, W.Zirwas, E Schumann, and A. Felder (IEEE Journal of Solid-State Circuits,
Sept. 1999)

A Fully Integrated 40-Gb/s Clock and Data Recovery IC With 1:4 DEMUX in SiGe Technology 699
M. Reinhold, C. Dorschky, E. Rose, R. Pullela, P. Mayer, E Kunz, Y Baeyens, T. Link, andJ-P. Mattia
(IEEE Journal of Solid-State Circuits, Dec. 2001)

Clock and Data Recovery IC for 40-Gb/s Fiber-Optic Receiver 707


G. Georgiou, Y. Baeyens, Y-K. Chen, A. H. Gnauck, C. Gropper, P. Paschke, R. Pullela, M. Reinhold, C Dorschky,
J.-P. Mattia, T. Winkler von Mohrenfels, and C Schulien (IEEE Journal of Solid-State Circuits, Sept. 2002)

Index 713
Devices and Circuits for Phase-Locked Systems
Behzad Razavi

Abstract—This turtorial deals with the design of devices design of the stage(s) driven by the VCO. On the other hand,
such as varactors and inductors and circuits such as ring to avoid forward-biasing the varactors significantly, Vx and
and LC oscillators. First, MOS varactors are introduced as Vy must remain above approximately Vcont — 0.4 V. Thus,
a means of frequency control for low-voltage circuits and the peak-to-peak swing at each node is limited to about 0.8 V.
their modeling issues are discussed. Next, spiral inductors Note that the cathode terminals of the varactors also introduce
are studied and various geometries targetting improved Q substantial n-well capacitance at X and Y, further constraining
or higher self-resonance frequencies are presented. Noise- the tuning range.
tolerant ring oscillator topologies are then described. Fi- In contrast to pn junctions, MOS varactors are immune to
nally, a procedure for the design of LC oscillators is out- forward biasing while exhibiting a sharper C-V characteristic
lined. and a wider dynamic range. If configured as a capacitor [Fig.
2(a)], a MOSFET suffers from both a nonmonotonic C-V be-
The design of phase-locked systems requires a thorough
understanding of devices, circuits, and architectures. Intended CGs
as a continuation of [1], this tutorial provides an overview G Accumulation Strong Inversion
of concepts in device and circuit design for phase-locking in
digital, broadband, and RF systems.

I. PASSIVE DEVICES
S ^TH VGS
The demand for low-noise PLLs has encouraged extensive
(a)
research on active and passive devices. In this section, we
study varactors and inductors as essential components of LC ^var

oscillators.
Accumulation Depletion
A. Varactors
As supply voltages scale down, pn junctions become a less
attractive choice for varactors. Specifically, two factors limit p-substrate
the dynamic range of pn-junction capacitances: (1) the weak 0 vQS
dependence of the capacitance upon the reverse bias voltage, (b)
e.g., Cj = C ; o/(1 + VR/^B)"1, where m w 0.3.; and (2) the
narrow control voltage range if forward-biasing the varactor Fig. 2. (a) Simple MOSFET operating as capacitor, (b) MOS varactor.
must be avoided. havior and a high channel resistance in the region between
As an example, consider the LC oscillator shown in Fig. 1. accumulation and strong inversion. To avoid these issues, an
It is desirable to maximize the voltage swings at nodes X and "accumulation-mode" MOS varactor is formed by placing an
NMOS device inside an n-well [Fig. 2(b)]. Providing an
Voo
ohmic connection between the source and drain for all gate
voltages, the n-well experiences depletion of mobile charges
under the oxide as the gate voltage becomes more negative.
X Y
Thus, the varactor capacitance, Cvar, (equal to the series com-
bination of the oxide capacitance and the depletion region
"cont capacitance) varies as shown in Fig. 2(b). Note that for a
sufficiently positive gate voltage, Cvar approaches the oxide
capacitance.
Fig. 1. LC oscillator using pn-junction varactors. The design of MOS varactors must deal with two important
Y so as to both minimize the relative phase noise and ease the issues: (1) the trade-off between the dynamic range and the
channel resistance, and (2) proper modeling for circuit simu- C
var
lations. We now study each issue.
''max
Dynamic Range Deep-submicron MOSFETs exhibit susb-
tantial overlap capacitance between the gate and source/drain
terminals. For example, in a typical 0.13-/mi technology, a
Cmin
transistor having minimum channel length, Lmin, displays an
VGS
overlap capacitance of 0.4 fF/^m and a gate-channel capaci- 0
tance of 12 fF/fim2. In other words, for an effective channel
length of 0.12 /im and a given width, the overlap capacitance Fig. 4. Typical MOS varactor characteristic.
between the gate and source/drain terminals of a varactor con-
stitutes 2 x 0.4 fF /(0.12 x 12 fF+2 x 0.4 fF) « 36% of the circuits in terms of voltages and currents (e.g., SPICE) interpret
total capacitance. Thus, even if the gate-channel component the nonlinear capacitance equation correctly. On the other
varies by a factor of two across the allowable voltage range, the hand, programs that represent the behavior of capacitors by
overall dynamic range of the capacitance is given by (0.12 x 12 charge equations (e.g., Cadence's Spectre) require that the
fF+2 x 0.4 ff)/(0.12 x 6 fF +2 x 0.4 fF) = 1.47. model be transformed to a Q-V relationship [3]:
In order to widen the varactor dynamic range, the transistor
length can be increased, thereby raising the voltage-dependent Qv = I CvardVGS (2)
component while maintaining the overlap capacitance rela-
tively constant. This remedy, however, leads to a greater resis- Cmax — Cmin T , . i ., . VGS J
tance between the source and drain, lowering the Q. The re- = 2 Vo In cosh(a + - y - )
sistance reaches a maximum for the most negative gate-source (-'max + l^min T ,
(3)
voltage, at which the depletion region's width is maximum and + ~ VGS,
the path through the n-well the longest (Fig. 3).1 Note that
which is then used to compute

*var — (4)
dt
If used in charge-based analyses, Eq. (1) typically overesti-
mates the tuning range of oscillators.
p-substrate B. Inductors
The design of monolithic inductors has been studied exten-
Fig. 3. Effect of n-well resistance in MOS varactor. sively. The parameters of interest include the inductance, the
the total equivalent resistance that appears in series with the Q, the parasitic capacitance (i.e., the self-resonance frequency,
varactor is equal to 1/12 of the drain-source resistance. This fsR), and the area, all of which trade with each other to some
is because shorting the drain and source lowers the resistance extent. For a spiral structure such as that in Fig. 5, the line
by a factor of 4 and the distributed nature of the capacitance width, the line spacing, the number of turns, and the outer
and resistance reduces it by another factor of 3 [2]. Depending
on both the phase noise requirements and the Q limitations
imposed by inductors, the varactor length is typically chosen
between Lmin and3L m t n .
Modeling The C-V characteristics of MOS varactors can be
approximated by a hyperbolic tangent function with reasonable
accuracy. Using the characteristic shown in Fig. 4 and noting
that tanh(±oo) = ± 1, we can write
„ , T / .. Cmax ~ Cn , , , VGS x . Cmax + Cmin
Cvar{VGS) = ~
• tanh(a+—- )+
0) Fig. 5. Spiral inductor.
Here, a and Vo allow fitting for the intercept and the slope,
respectively, and C m , n includes the overlap capacitance. dimension are under the designer's control, chosen so as to
The above model yields different characteristics in different obtain the required performance.
circuit simulation programs! Simulation tools that analyze Quality Factor The quality factor of monolithic inductors
1
Fortunately, the capacitance reaches a minimum at this point, and the Q has been the subject of many studies. Before considering the
degrades only gradually. phenomena that limit the Q, it is important to select a useful
and clear definition for this quantity. For a simple inductor
operating at low frequencies, the Q is denned as

where Rs denotes the metal series resistance. In analogy with


this expression, a more general definition is sometimes given
as
_ lm(ZL)

where ZL represents the overall impedance of the inductor at


the frequency of interest. While reducing to Eq. (5) at low fre-
quencies, this definition yields Q = 0 if the inductor resonates
with its own capacitance and/or any other capacitance. This is
because at resonance, the impedance is purely resistive. Since
nearly all circuits employ inductors in a resonance mode,2 this
expression fails to provide a meaningful measure of inductor
performance in circuit design. A more versatile definition as-
sumes that a resonant tank can be represented by a parallel
combination [Fig. 6(a)], yielding
Fig. 7. Inductor loss mechanisms: (a) metal resistance, (b) substrate loss due
to electric coupling, (c) substrate loss due to magnetic coupling.
Q=f^-. (7)
(2) the flow of displacement current through the series combi-
where WR is the resonance frequency. Note that the tank nation of the inductor's parasitic capacitance and the substrate
reduces to Rp at u> = UR, exhibiting a finite (rather than zero) resistance; (3) the flow of magnetically-induced ("eddy") cur-
Q. Hereafter, we consider the behavior of inductors at or near rents in the substrate resistance. At low frequencies, the dc
resonance. resistance is dominant, and as the frequency rises, the other
i/ components begin to manifest themselves.
With the above observations in mind, let us construct a cir-
Lp RP CP cuit model for inductors. Depicted in Fig. 8(a) is a simple
LP Vout
model where Rs denotes the series resistance at the frequency
Rp CP
V,n' L Rs
«1

(a) (b) <M C2 Ci c2


R
P
Fig. 6. (a) Parallel tank for definition of Q, (b) common-source stage using a R c3: C4
S1 «S2 "si *S2
tank.
The utility of Eq. (7) can be seen in the example il-
(a) (b)
lustrated in Fig. 6(b). Here, the knowledge of Lp and
Q = Rp/(LpL>ji) directly provides the voltage gain and the
output swing, whereas the Q given by Eq. (6) serves no pur- Fig. 8. (a) Inductor model including magnetic coupling to substrate, (b)
simplified model.
pose.
The Q of inductors is limited by resistive losses: parasitic of interest, Rsi and Rs2 represent the substrate resistance
resistances dissipate a fraction of the energy that is recipro- through which the diplacement current flows, the transformer
cated between the inductor and the capacitor in a tank. Note models magnetic coupling to the substrate, and Rp is the sub-
that the finite Q is also accompanied by generation of noise. strate resistance through which the eddy currents flow. This
For example, in the circuit of Fig. 6(b), Rp produces an out- model reveals how the Q drops at high frequencies. As the
put noise voltage of V£ = AkTRp = AkTQLpu>n per unit impedance of C\ and Ci falls, Rs\ and Rs2 appear as a con-
bandwidth if Lp resonates with Cp. stant resistance in parallel with the inductor, lowering the Q
The losses in inductors arise from three mechanisms (Fig. as u rises. Similarly, at high frequencies, the effect of Rp be-
7): (1) the series resistance of the spiral, including both low- comes relatively constant, shunting Lp and further reducing
frequency resistance and current crowding due to skin effect; theQ.
In practice, the model of Fig. 8(a) is modified as shown
2
One exception is inductive degeneration in low-noise amplifiers. in Fig. 8(b) to both allow an easier fit to measured data and
account for the substrate capacitance. The model is usually VDD
assumed to be symmetric, i.e., C\ = C2,C3 = C4, and Rsi =
LP Lp
Rs2, implying that the equivalent parasitic capacitance, Ceq, is
one-half of the total capacitance, Ctot, if one end of the inductor "out
is grounded. This result, however, is not correct because the M2
»1
distributed nature of the structure yields Ceq = Ctot/3 in
this case [5]. To avoid this inaccuracy, the inductor must be 'ss
modeled as a distributed network [5].
Characterization Most inductor modeling programs pro- Fig. 10. Setup for "in-situ" measurement of Q.
vide limited capabilities in terms of the type of structure that
they can analyze or the maximum frequency at which their until the circuit fails to oscillate. For such value of Rp, we
results are valid. For this reason, it is often necessary to fabri- have Q = Rp/(Lu). Of course, this technique assumes that
cate and characterize monolithic inductors and use the results the value of the inductor and the oscillation frequency are
to revise the simulated models, thereby obtaining a better fit. known.
Owing to the need for precise measurements at high fre- The above method proves useful if (a) thefrequencyof inter-
quencies, inductors are typically characterized by direct on- est is so high and/or the inductance so low that direct measure-
wafer probing. High-speed coaxial probes having a tightly- ments are difficult, or (b) an oscillator has been fabricated but
controlled 50-£2 characteristic impedance and a low loss are the inductors are not available individually, requiring "in-situ"
positioned on pads connected to the inductor. Figure 9(a) measurement of the Q. Note that other oscillator parameters
shows an example where one end of the spiral is tied to the such as phase noise and ouput swing are also functions of Q,
but it is much more straightforward to place the circuit at the
edge of oscillation than to calculate the Q from phase noise or
output swing measurements.
Choice of Geometry The design of inductors begins with the
choice of the geometry. Shown in Fig. 11 are two commonly-
used structures. The asymmetric spiral of Fig. 11 (a) exhibits

(a) (b)

Fig. 9. (a) On-wafer measurement of inductor using coaxial probe, (b) cali- (a) (b)
bration structure.
"signal" (S) pad and the other to the "ground" (G) pads. The Fig. 11. (a) Asymmetric and (b) symmetric inductors.
signal pad is sensed by the center conductor and the ground a moderate Q, about 5 to 6 at 5 GHz, and its interwinding
pads by the outer shield of the coaxial probe. capacitance does not limit the self-resonance frequency be-
Since the capacitance of the pads and the wires connecting to cause adjacent turns sustain a small potential difference. The
the spiral is typically significant, the test device is accompanied line spacing is therefore set to the minimum allowed by the
by a calibration structure [Fig. 9(b)], where the spiral itself technology.
is omitted. The scattering (S) parameters of both structures The symmetric geometry of Fig. 11 (b) provides a greater Q
are measured by means of a network analyzer across the band if stimulated differentially [4], about 7 to 10 at 5 GHz, but its
of interest and subsequently converted to Y parameters. Sub- interwinding capacitance is typically quite significant because
traction of the Y parameters of the calibration geometry from of the large voltage difference between adjacent turns. For this
those of the device under test yields the actual characteristics reason, the line spacing is chosen to be twice or three times
of the spiral. the minimum allowable value, lowering thefringecapacitance
An alternative method of measuring the Q of inductors is considerably but degrading the Q slightly.
illustrated in Fig. 10. Here, inductors are incorporated in In differential circuits, the use of symmetric inductors ap-
an oscillator and the tail current can be controlled externally. pears to save area as well. For example, two asymmetric
In the laboratory measurement, the output is monitored on 1-nH inductors can be replaced by a symmetric 2-nH struc-
a spectrum analyzer while Iss is reduced so as to place the ture, which occupies less area. However, a cascade of dif-
circuit at the edge of oscillation. Next, the value of Iss thus ferential stages employing multiple symmetric inductors [Fig.
obtained is used in the simulation of the oscillator and the 12(a)] faces routing difficulties. As illustrated in Fig. 12(b),
equivalent parallel resistance of each tank, Rp, is lowered the signal lines must travel across the spirals, impacting the
Voo cantly. However, the capacitance between the spirals may
limit the self-resonance frequency. For the two-layer structure
of Fig. 13(a), the overall equivalent capacitance is given by

[5] 4Cl+C
^eg — > (8)
12
Thus, if the bottom layer is moved down [Fig. 13(b)], then
Ceq falls considerably. For example, in a typical 0.13-/im
(a) CMOS technology having eight metal layers, the geometry of
Fig. 13(b) exhibits one-fifth as much as capacitance as the
structure in Fig. 13(a)does.
Stacked structures use lower metal layers, which typically
suffer from a greater sheet resistance than the topmost layer.
As explained below, the resistance can be reduced by placing
spirals in parallel.
Figure 14 illustrates three other configurations aiming to
improve the quality factor. In Fig. 14(a), multiple spirals are
(b) (c)

Fig. 12. (a) Cascade of inductively-loaded differential pairs, (b) layout of first
stage using a symmetric inductor, (c) layout of first stage using asymmetric
inductors.

performance of the inductors. Furthermore, the power and


ground lines must either cross the spirals or go around with
adequate spacing. With asymmetric inductors, on the other
hand, the lines can be routed as shown in Fig. 12(c), leaving
the inductors undisturbed. Note that B\ is quite larger than B2
because the symmetric structure must provide an inductance
twice that of each asymmetric spiral. Thus, the signal lines in
Fig. 12(b) are longer.
The two geometries of Fig. 11 can also be converted to
stacked structures, wherein spirals in different metal layers are
placed in series so as to achieve a greater inductance per unit Fig. 14. (a) Parallel combination of spirals to reduce metal resistance, (b)
area. Figure 13(a) depicts an example using metal 8 and metal tapered metal width, (c) patterned shield.
placed in parallel so as to reduce the series resistance, but at the
cost of larger capacitance to the substrate. Nonetheless, in a
typical process having eight metal layers, metal 6 capacitance
is about 30% greater than that of metal 8. Since metal 8 is
typically twice as thick as metal 6 or metal 7, this topology
lowers the series resistance by twofold while raising the par-
asitic capacitance by 30%. By the same token, in the stacked
structure of Fig. 13(b), addition of a metal 2 spiral in parallel
with the metal 3 layer decreases the overall resistance by 30%
while increasing the equivalent capacitance by about 15%.
At frequencies above 5 GHz, the skin depth of aluminum
(a) (b)
falls below 2 fim, making the parallel combination of spirals
less effective. Electromagneticfieldsimulations may therefore
Fig. 13. Stack of (a) metal 8 and metal 7 spirals, (b) metal 8 and metal 3 be necessary to determine the optimum configuration.
spirals. The structure in Fig. 14(b) employs tapering of the line
7 spirals. The total inductance is equal to L\ + L2+2M, where width to reduce the resistance of the outer turns. The idea is
M denotes the mutual coupling between L \ and L2. Owing to to maintain a relatively constant inductance-resistance product
the strong magnetic coupling, the value of M is close to L j and per turn, achieving a slightly higher Q for a given inductance
L2, suggesting a fourfold increase in the overall inductance as and capacitance. Unfortunately, most inductor simulation pro-
a result of stacking. In the general case, n stacked identical grams cannot analyze such a geometry.
spirals raise the inductance by a factor of approximately n2. Shown in Fig. 14(c) is a method of lowering the loss due
Stacking reduces the area occupied by inductors signifi- to the electric coupling to the substrate. A heavily-conductive
shield is placed under the spiral and connected to ground so rates or retain a low-frequency clock in the "sleep" mode; (2)
that the displacement current flowing through the inductor's ring oscillators occupy substantially less area than LC topolo-
bottom-plate capacitance does not experience resistive loss. gies do, an important issue if many oscillators are used; (3)
To stop the flow of magnetically-induced currents, the shield the behavior of ring oscillators across process, supply, and
is broken regularly. Note that eddy currents still flow through temperature corners is predicted with reasonable accuracy by
the substrate, dissipating energy. standard MOS models, whereas the design of LC oscillators
The conductive shield in Fig. 14(c) may be realized in n~ heavily relies on inductor and varactor models.
well, n + , or p + diffusion, polysilicon, or metal, thus bearing In mostly-digital systems such as microprocessors, ring os-
a trade-off between the parasitic capacitance and the Q en- cillators experience considerable supply and substrate noise,
hancement. The resulting increase in the Q depends on the making differential topologies desirable. Figure 15(a) shows
frequency of operation and the type of shield material, falling an example of a differential gain stage that allows several
in the range of 5 to 10%.
Thus far, we have studied square spirals. However, for a
given inductance value, a circular structure exhibits less series VDD VDD
resistance. Since mask generation for circles is more difficult, \M5 "e M5 MB
some inductors are designed as octagonal geometries to benefit
Af3 M4
from a slightly higher Q. X "out" X*
.My M2
C. MOS Transistors vln
The modeling of MOSFETs for analog and high-frequency ^cont 'ss
design continues to pose challenging problems as sub-0.1-
fim generations emerge. BSIM models provide reasonable
(a) (b)
accuracy for phase-locked system design, with the exception
that their representation of thermal and flicker noise may err
Fig. 15. (a) Differential stage for use in a ring oscillator, (b) effect of supply
considerably. This issue becomes critical in the prediction of noise.
oscillator phase noise. decades of frequency tuning with relatively constant voltage
The thermal noise arising from the channel resistance is swings. Here, M5 and M$ define the output common-mode
usually represented by a current source tied between the source (CM) level while M 3 and M4 pull nodes X and Y to VDD,
and drain and having a spectral density 1% = 4kTjgm, where maintaining a constant voltage swing even at low current lev-
7 is the excess noise coefficient. For long-channel devices, els.
7 = 2 / 3 , but for submicron transistors, j may reach 2.5 to Unlike a simple differential pair, the stage of Fig. 15(a)
3. Since some MOS models lack an explicit 7 parameter that does respond to input CM noise even with an ideal Iss . This is
the user can set, it is often necessary to artificially raise the because the gate voltages of M 3 and M 4 are referenced to VDD ,
effective value of 7 in circuit simulations. For linear, time- introducing a change in the drain currents if the input CM level
variant circuits, this can be accomplished using a noise copying varies. In the presence of asymmetries, such a change results
technique [6]. However, the time variance of currents and in a differential component af the output. Nevertheless, since
voltages in oscillators make it difficult to apply this method. As the input CM level of each stage in the ring is referenced to
a first-order approximation, the contribution of the transistors VDD by the diode-connected PMOS devices in the preceding
to the overall phase noise can be increased by a factor equal to stage [Fig. 15(b)], the oscillator exhibits low sensitivity to
2.5/(2/3) before all of the noise components are summed.3 supply voltage.
The flicker noise parameters are usually obtained by mea- Figure 16(a) depicts another ring oscillator topology that
surements. It is therefore important to check the validity of the has become popular in low-voltage digital systems. Here, the
device models by comparing measured and simulated results.
Owing to their buried channel, PMOS transistors exhibit sub-
VDD
stantially less flicker noise than NMOS devices even in deep VDD
M2 "1
submicron technologies. Vcont X{ 'DD
Mz /1
II. RING OSCILLATORS CB

Kcont
Despite their relative high noise and poor drive capabil-
ity, ring oscillators are used in many high-speed applications.
Several reasons justify this popularity: (1) in some cases, the (a) (b)
oscillator must be tuned over a wide frequency range (e.g.,
one decade) because the system must support different data
Fig. 16. (a) Constant-current ring oscillator, (b) transistor-level implementa-
3 tion of (a).
In reality, the effective value of 7 also depends on the drain-source voltage
to some extent, further complicating the matter. inverters in the ring are supplied by a current source, IQD,
rather than a voltage source, and frequency tuning is also ac- 16 relates to frequency tuning by means of current sources.
complished through IDD- If IDD is designed for low sensi- The voltage-to-current (V/I) conversion required here presents
tivity to VDD, then the oscillator remains relatively immune difficulties at low supply voltages. In the example of Fig.
to supply noise—the principal advantage of this configuration 16(b), as Vcont rises and Vx falls, transistor M3 eventually
over standard inverter-based rings that are directly connected enters the triode region, thus making I\ supply-dependent. The
to the supply voltage. useful range of Vcont is therefore given by VTHN < Vcont <
In practice, the nonidealities associated with IDD limit the VDD - I VGSP\ - VTHN, suggesting the use of a wide device
supply rejection. Shown in Fig. 16(b) is a transistor imple- for Mi to minimize |PGSP|-
mentation where M\ operates as a contolled current source. If
I\ is constant, V\ tracks VDD variations whereas Vy does not, III. LC OSCILLATORS
yielding a change in IDD through channel-length modulation
LC oscillators have found wide usage in high-speed and/or
in M\. Choosing long channels for M\ and Mi alleviates
low-noise systems. Extensive research on inductors, varac-
this issue while necessitating wide channels as well to allow
tors, and oscillator topologies has provided the grounds for
a relatively small drain-source voltage for M\. However, the
systematic design, helping to demystify the "black magic."
resulting high drain junction capacitance of M\ at Y creates a
LC oscillators offer a number of advantages over ring struc-
low-impedance path from VDD to this node at high frequen-
tures: (a) lower phase noise for a given frequency and power
cies. To suppress both resistive and capacitive feedthrough of
dissipation; (b) greater output voltage swings, with peak levels
VDD noise, a bypass capacitor, CB, is tied from Y to ground.
that can exceed the supply voltage; and (c) ability to operate
However, the pole associated with this node now enters the
at higher frequencies.
VCO transfer function, complicating the design of the PLL.
However, LC VCO design requires precise device and cir-
Let us now study the response of the circuit of Figs. 15(a)
cuit modeling because (a) the narrow tuning range calls for
and 16 to substrate noise, VSub- In the former, V8Ub manifests
accurate prediction of the center frequency; (b) the phase noise
itself through two mechanisms (Fig. 17): (1) by modulat-
is greatly affected by the quality of inductors and varactors and
ing the drain junction capacitance of M\ and M2 and hence
the noise of transistors. Also, occupying a large area, spiral
inductors pick up noise from the substrate and make it difficult
<
to incorporate many such oscillators on one chip.
*1 M2 The design of LC VCOs targets the following parameters:
v* Vln center frequency, phase noise, tuning range, power dissipation,
"h
voltage headroom, startup condition, output voltage swing,
p
cP fsub and drive capability. The last two have often received less
attention, but they directly determine the design difficulty and
Vsub /ss
f power consumption of the stages following the oscillator. That
is, a buffer placed after the VCO may consume more power
than the VCO itself!
Fig. 17. Effect of substrate noise on a differential stage.
the delay of the stage (a static effect); and (2) by injecting a A. Design Example
common-mode displacement current through Cp (a dynamic
effect). If injected slightly before or after the zero crossings As an example of VCO design, let us consider the topology
of the oscillation waveform, such a current gives rise to a dif- shown in Fig. 18. Here, M\ and M2 present a small-signal
ferential component at the drains of Mi and Mi because these negative resistance of -2/gm\}i between nodes X and Y,
transistors display unequal transconductances as they depart
from equilibrium. *fao
In the circuit of Fig. 16(b), Vsub modulates both the drain Cp RP LP LP RP CP
junction capacitance of the NMOS devices and their threshold
voltage (and hence the transition points of the waveform). X Y
"1 M2
Both effects are static, making the circuit susceptible even to
low-frequency noise.
/ss
It is instructive to determine the minimum supply voltage for
the above two circuits. At the midpoint of switching, where the
input and output differential voltages are around zero, the stage
Fig. 18. LC oscillator.
of Fig. 15(a) requires that VDD > | V G S P | 4- VQSN + Viss,
compensating for the resistive loss in the tanks and sustaining
where VQSP abd VQSN denote the gate-source voltages of
oscillation. Each tank is modeled by a parallel RLC network,
M3-M4 and M\-M2, respectively, and Viss is the minimum
with all loss mechanisms lumped in Rp.4
voltage necessary for Iss> Interestingly, the circuit of Fig.
16(b) imposes the same minimum supply voltage. 4
For a narrow frequency range, series resistances in the tank elements can
Another critical issue in the circuits of Figs. 15(a) and be transformed to parallel components.
The design process begins with a power budget and hence (typically a buffer), CL- Thus, the allowable varactor capaci-
a maximum value for Iss • This is justified by the following tance is given by the difference between Ctot and the sum of
observation. Once completed and optimized for a given power these components:
budget, the design can readily be scaled for different power
levels, bearing a linear trade-off with phase noise while main- Cvar = (LPU2)-1-CLP-CDB-CGS-4CGD-CL. (13)
taining all other parameters constant. For example, if Iss,
the width of M\ and M2, and the total tank capacitance are This expression gives the center value of the tolerable varactor
doubled and the inductance value is halved, the phase noise capacitance. Of course, a negative Cvar means the inductance
power falls by a factor of two but the frequency of oscillation is excessively large, calling for a lower Lp, a smaller Rp, and
and the output voltage swings remain unchanged.5 hence a larger Iss. However, to steer a greater tail current,
Since subsequent stages typically require the VCO core to the circuit must employ wider MOS transistors, thus incur-
provide a minimum voltage swing, Vmin9 we assume M\ and ring a larger capacitance at nodes X and Y and approaching
M 2 steer nearly all of Iss to their correponding tanks and diminishing returns. This ultimately limits the frequency of
write IssRp = Vmin- Thus, the minimum inductance value oscillation in a given technology.
is given by For a given supply voltage and oscillator topology, the var-
actor capacitance exhibits a known dynamic range Cvar,min <
LP = %• (9) CVar < Cvar,max, yielding a tuning range of u)min < u>osc <
Umax, where
l 0 )
- IssQu'
UJmin = (14)
IT ir -x-r \
where it is assumed the tank Q is limited by that of the induc- y L>p\Lsvar,max ~r Ofixed)
tor. Note that this calculation demands knowledge of the Q
"max = . , (15)
before the inductance is computed, a minor issue because for a VLP\^var,min + ^ fixed)
given geometry and frequency of operation, the Q is relatively
independent of the inductance. and Cfixed = CLP -f CDB + CGS + 4CGD + CL.
We now determine the dimensions of Mi and M2. Increas- Figure 19(a) depicts the oscillator with MOS varactors di-
ing the channel length beyond the minimum value allowed rectly tied to X and Y. Since the output common-mode level
by the technology does not significantly lower 7 unless the
length exceeds approximately 0.5 fim. For this reason, the V
DD VDD vb
minimum length is usually chosen to minimize the capaci-
vb
tance contributed by the transistors. The transistors must be M L2 Li L2
*1 R2
X Y X Y
wide enough to steer most of Iss while experiencing a voltage
C
swing of Vmin at nodes X and Y. Viewing M\ and M2 as a C1 CC1

differential pair, we note that M\ must turn off as Vx - Vy


reaches Kn»n. For square-law devices, Mv1 /W v 2 Mv1 Mv2
Vcont ^cont

VminZ= (U)
\l»nC0!w/L' (a) (b)

and hence Fig. 19. LC oscillator with (a) direct coupling and (b) capacitive coupling of
2Iss
w- (n) varactors to tanks.
a r 1/2 IT ' ^ ' is near VDD > M3 and M4 sustain only a positive gate-source
but for short-channel devices, W must be obtained by simula- voltage (if 0 < VCOnt < VDD). A S seen from the C-V charac-
tions using proper device models. This choice of W typically teristic of Fig. 2(b), this limitation reduces the dynamic range
guarantees a small-signal loop gain greater than unity, enabling of the capacitance by about a factor of two. As a remedy, the
the circuit to start at power-up. varactors can be capacitively coupled to X and Y, allowing
With Lp computed from Eq. (10), the total capacitance independent choice of dc levels. Illustrated in Fig. 19(b), such
at nodes X and Y is calculated as Ctot = (Lpu2)~l. This an arrangement defines the gate voltage of Mv\ and Mv2 by
capacitance includes the fo\\ovfingfixed components: (1) the Vb « VDD/2 through large resistors R\ and R2.
parasitic capacitance of Lp, CLP\ (2) the drain junction, gate- The coupling capacitors, Cc\ and Cci, must be chosen
source, and gate-drain capacitances of Mi and M 2 , CDB + much greater than the maximum value of Cvar so as not
Cos + 4CGD>6 and (3) the input capacitance of the next state to limit the tuning range. For example, if Cc\ — Cci —
5 5Cvartmax, then the equivalent series capacitance reaches only
We assume that, at a given frequency, the Q is relatively independent of
the inductance value. 5Ciar,max/(6Cvartmax) = 0.83Cvar,maar, Suffering from a
6
Since CQD experiences a total voltage swing of 2V p m m, its Miller effect 17% reduction in dynamic range. On the other hand, large cou-
translates to a factor of two for each transistor. pling capacitors display significant bottom-plate capacitance,

10
thereby loading the oscillator and limiting the tuning range.7 V
DD
It is possible to realize Cc\ and Cci as "fringe" capacitors t.1 L2
(Fig. 20) [7] to exploit the lateral field between adjacent metal X Y

Cu Cu Cu Cu

Fine Control
Coarse Control Coarse Control
(a)

'out
Fig. 20. Fringe capacitor. Fewer
lines. This structure exhibits a bottom-plate parasitic of a few Capacitors
Switched in
percent, but its value must usually be calculated by means of
field simulators.
The tuning range of LC VCOs must be wide enough to
encompass (a) process and temperature variations, (b) uncer- (b)
tainties due to model inaccuracies; and (c) the frequency band
of interest. In wireless communications, the last component Fig. 21. (a) VCO with fine and coarse digital control, (b) resulting character-
istics.
makes the design particularly difficult, especially if a single
VCO must cover more than one band. For example, in the the use of NMOS devices with a gate-source voltage equal to
Global System for Mobile Communication (GSM) standard, VDD , minimizing their on-resistance.
the transmit and receive bands span 890-915 MHz and 935-960 The above technique entails three critical issues. First, the
MHz, respectively. For one VCO to operate from 890 MHz trade-off between the on-resistance and junction capacitance
to 960 MHz, the tuning range must exceed 7.8%. With an- of the MOS switches translates to another between the Q and
other 7 to 10% required for variations and model inaccuracies, the tuning range. When on, each switch limits the Q of its
the overall tuning rang reaches 15 to 18%, a value difficult corresponding capacitor to (ROnCuu)~]• When off, each
to achieve. In such cases, two or more oscillators may prove switch presents its drain junction and gate-drain capacitances,
necessary, but at the cost of area and signal routing issues. CPB + CGD, in series with Cu, constraining the lower bound
The phase noise of each oscillator topology must be quanti- of the capacitance to CU(CDB + CGD)/(CU + CDBCGD)
fied carefully. The reader is referred to the extensive literature rather than zero. In other words, wider switches degrade the
on the subject. overall Q to a lesser extent but at the cost of narrowing the
discrete frequency steps.
B. Digital Tuning
The second issue relates to potential "blind" zones in the
Our study thus far implies that it is desirable to maximize the characteristic of Fig. 21(b). As exemplified by Fig. 22, if the
tuning range. However, for a given supply voltage, a wider tun-
ing range inevitably translates to a greater VCO gain, Kvco,
thereby making the circuit more sensitive to disturbance ("rip-
ple") on the control line. This effect leads to larger reference
sidebands in RF synthesizers and higher jitter in timing appli-
cations. With the scaling of supply voltages, the problem of
high Kvco has become more serious, calling for alternative
solutions.
A number of circuit and architecture techniques have been Fig. 22. Blind zone resulting from insufficient fine tuning range.
devised to lower the sensitivity of the VCO to ripple on the discrete step resulting from switching out one unit capacitor is
control line. For example, a digital tuning mechanism can be greater than the range spanned continuously by the varactors,
added to perform coarse adjustment of the frequency, allowing then the oscillator fails to assume the frequency values between
the analog (fine) control to cover a much narrower range. Il- /i and f for any combination of the digital and analog controls.
lustrated in Fig. 21 (a), the idea is to switch constant capacitors For this2reason, the discrete steps must be sufficiently small to
into or out of the tanks, thereby introducing discrete frequency ensure overlap between consecutive bands.8
steps. The varactors then tune the frequency within each step, The third issue stems from the loop settling speed. As
leading to the characteristic shown in Fig. 21(b). Note that described below, the PLL takes a long time to determine how
the switches are placed between the capacitors and ground -
rather than between the tank and the capacitors. This permits 8
With afiniteoverlap, however, more than one combination of digital and
analog controls may yield a given frequency. To avoid this ambiguity, the
7
This is relatively independent of whether the bottom plates are connected loop must begin with a minimum (or maximum) value of the digital control
to nodes X and Y or to R\ and Rz. and adjust it monotonically.

11
many capacitors must be switched into the tanks. Thus, if a REFERENCES
change in temperature or channel frequency requires a discrete
[1] B. Razavi, "Design of Monolithic Phase-Locked Loops
frequency step, then the system using the PLL must remain idle
and Clock Recovery Circuits - A Tutorial," in Monolithic
while the loop settles.
Phase-Locked Loops and Clock Recovery Circuits, B.
When employed in a phase-locked loop, the oscillator of Fig.
Razavi, Ed., Piscataway, NJ: IEEE Press, 1996.
21 (a) requires additional mechanisms for setting the digital
[2] P. Larsson, "Parasitic Resistance in an MOS Transistor
control. Figure 23 depicts an example for frequency synthesis.
Used as On-Chip Decoupling Capacitor," IEEEJ. Solid-
State Circuits, vol. 32, pp. 574-576, April 1997.
[3] K. Kundert, Private Communication.
VM-
[4] M. Danesh et al., "A Q-Factor Enhancement Technique
Logic for MMIC Inductors," Proc. IEEE Radio Frequency In-
VL< Coarse tegrated Circuits Symp., pp. 217-220, April 1998.
Control [5] A. Zolfaghari, A. Y. Chan, and B. Razavi, "Stacked In-
Charge
Pump VCO
Fine ductors and Transformers in CMOS Technology," IEEE
Control Journal of Solid-State Circuits, vol. 36, pp. 620-628,
April 2001.
[6] F. Behbahani, et al., "A 2.4-GHz Low-IF Receiver for
Wideband WLAN in 0.6-//m CMOS," IEEE Journal of
Fig. 23. Synthesizer using fine and coarse frequency control. Solid-State Circuits, vol. 35, pp. 1908-1916, December
Here, the oscillator control voltage is monitored and compared 2000.
with two low and high voltages, VL and VJJ, respectively. If [7] O. E. Akcasu, "High-Capacity Structures in a Semicon-
Vcont falls below Vi, the oscillation frequency is excessively ductor Device," US Patent 5,208,725, May 1993.
low 9 , and one unit capacitor is switched out. Conversely, if
Vcont exceeds V#, one unit capacitor is switched in. After each
switching, the loop settles and, if still unlocked, continues to
undergo discrete frequency steps.

9
We assume the frequency increases with VCOnt.

12
Delay-Locked Loops - An Overview
Chih-Kong Ken Yang

Abstract — Phase-locked loops have been used for a wide the data bus, the actual sampling clock is no longer properly
range of applications from synthesizing a desired phase aligned with the data. A DLL is commonly used to lock the
or frequency to recovering the phase and frequency of an phase of the buffered clock to that of the input data. The
input signal. Delay-locked loops (DLLs) have emerged as phase locking significantly reduces timing uncertainty in
a viable alternative to the traditional oscillator-based sampling the data, which then enables higher data rates as in
phase-locked loops. With its first-order loop [3].
characteristic, a DLL both is easier to stabilize and has Although aperiodic signals can also be delayed by the
no jitter accumulation. The paper describes design delay line in a DLL, the inputs to delay lines are typically
considerations and techniques to achieve high clock signals. By using a periodic signal, the delay lines do
performance in a wide range of applications. Issues such not need arbitrarily long delays and typically only need to
as avoiding false lock, maintaining 50% clock duty cycle, span the period of the clock to generate all possible phases.
building unlimited phase range for frequency synthesis, A data signal can be delayed by sampling the data with the
and multiplying the reference frequency are discussed. appropriately delayed clock.
The motivation for using DLLs is that the design of the
I. INTRODUCTION control loop is simplified by having only phase as the state
Many applications require accurate placement of the variable. Section II reviews how such a loop is
phase of a clock or data signal. Although simply delaying the unconditionally stable and has better jitter characteristics.
signal could shift the phase, the phase shift is not robust to However, a DLL is not without its own limitations. The
variations in processing, voltage, or temperature. For more variable delay line has a finite delay range and finite
precise control, designers incorporate the phase shift into a bandwidth. Section II also discusses these design
feedback loop that locks the output phase with an input considerations. Section III describes different
reference signal that indicates the desired phase shift. In implementations of the variable delay line. Within the past
essence, the loop is identical to a phase-locked loop (PLL) ten years, modifications to the basic DLL architecture have
except that phase is the only state variable and that a enabled clock and data recovery applications in
variable-delay line replaces the oscillator. Such a loop is "plesiochronous" systems [4] where the sampling rates for
commonly referred to as a delay-line phase-locked loop or clock and data differ by a few hundred parts-per-million in
delay-locked loop (DLL). As with a PLL, the goals are (1) frequency. Delay lines with effectively infinite delay are also
accurate phase position or low static-phase offset, and (2) addressed in Section III.
low phase noise or jitter. More recently, several researchers such as [5] and [6]
have introduced architectures that permit frequency
Because a DLL does not contain an element of variable
multiplication based on delay lines which further extends
frequency, it historically has fewer applications than PLLs.
their use in clock generation and frequency synthesis.
Bazes in [1] demonstrated an example of precisely delaying
Section IV describes these architectures.
a signal in generating the timing of the row and column
access strobe signals for a DRAM. Another common
II. DLL CHARACTERISTICS
application uses a DLL to generate a buffered clock that has
the same phase as a weakly-driven input clock. Johnson in The basic loop building blocks are similar to that of a
[2] synchronizes the timing of the buffered clock of a PLL: a phase detector, a filter, and a variable-delay line.
floating-point unit with the clock of a microprocessor. A Figure 1 illustrates the three main functional blocks. Since
similar application recovers the data of a parallel bus by phase is the only state variable, a control loop higher than
generating a properly positioned sampling clock. Typically, first-order is not needed to compensate a fixed phase error.
these systems provide a sampling clock with the same The resulting transient impulse response is a simple
sampling rate but with an arbitrary phase as compared to the exponential. Although the simple loop characteristics are an
data (i.e. a "mesochronous" system [4]). A clocked DRAM advantage that DLLs have over PLLs, the design is
data bus is an example of such a system. A clock propagates complicated by the additional circuitry that is needed to
with the data as one of the signals in the bus and therefore overcome having a limited delay range and not producing its
has a nominally known phase relationship with the data. own frequency.
However, in order to receive and buffer the clock to sample
A. First-order Loop
A phase detector compares the phase of the reference
C.K. Ken Yang is with University of California at Los Angeles, input and the delay-line output. The comparison yields a
yang@ee.ucla.edu. signal proportional to the phase error. The error is low-pass

13
1.2 PLL
DLL
in PD Filter 1
KpD Gp(s)
0.8
VC
0.6
Delay Line
dly__in *^DL dly_put 0.4

Figure 1: DLL architecture. 0.2

&
20 60
*HLOOP open loop
° ° ^eO*) °
H(s) Figure 3: Step response of PLL and DLL (with same loop charac-
teristics).
(dB) ,20dB/dec

the tracking of the phase of the input clock changes at


closed loop different frequencies. Based on the transfer function, the
1 loop bandwidth is (Obw = KPDKDLGF. For frequencies
within the loop bandwidth the phase of the output clock will
track that of the reference input and reject noise within the
log CO loop. The phase characteristics of the output clock above the
Figure 2: Open- and closed-loop transfer characteristics. bandwidth of the loop depend on the phase behavior of the
delay-line input and the noise from the delay line. The noise
transfer function from a noise source lumped at the
filtered to produce a control voltage or current that adjusts delay-line output is a high-pass response.
the delay of the delay line. The delay-line input can be either
the reference input or a clean clock signal.
The s-domain representation of each loop element is 1 -I- (s/KpDKDLGF)
depicted within each block in Fig. 1. The open-loop transfer In some degenerate cases, the delay-line input is also the
function can be written as T(s) = KPDKDLGF(s) where reference input. The feedback loop would guarantee a fixed
Kprj is the phase-detector gain, Gp(s) is the filter transfer phase relationship between the delay-line output and the
function, and KTJL *S t n e delay-line gain. If the loop has reference so any phase variations in the reference would
finite gain at dc, the resulting output signal will exhibit a directly appear at the delay-line output in an all-pass
static phase error as shown in the following equation. response. However, noise due to the delay line is still
high-pass filtered.
0 )
^ l - o - l + l/{KPDKDLGF(s))s =o B. Advantages over a PLL
The loop characteristics are considerably simpler than
To eliminate the static phase error, the filter is often an those of a PLL. A PLL would contain at least two states to
integrator to store the phase variable. This results in a store both the frequency and phase information. In order to
first-order closed-loop transfer function. maintain loop stability, an additional zero is needed. A DLL
is less constrained with only a single pole. The loop gain
H(s) = - (2) directly determines the desired bandwidth. The only stability
1+ (s/KPDKDLGF) consideration is when the loop bandwidth is very near the
The equation assumes that the delay-line input is a clean reference frequency. The periodic sampling nature of the
reference as opposed to the reference input. Higher-order phase detection and the delay in the feedback loop degrade
loop filters have not commonly been used but can enable the phase margin. For instance, if the feedback delay is one
better tracking of a phase ramp (i.e. a frequency difference). reference cycle, the loop bandwidth should not exceed 1/4 of
Figure 2 shows the open-loop and closed-loop transfer the reference frequency.
functions. With only a single integrator, the open-loop phase Figure 3 illustrates the response to a noise step applied
margin is 90°. The loop is unconditionally stable as long as to the control voltage for both a PLL and a DLL. A PLL
the delay in the loop does not degrade the phase margin accumulates phase error due to its higher-order loop
excessively. The closed-loop transfer function illustrates that characteristic. In response to a phase error, the control

14
data sample
Rcvr

+7T
lock 2nd lock
data transition sample point point
Rcvr Filter 0

Vc
ref_clk -71
Delay Line
Delay (V c )
sampling clock Figure 5: Delay line phase/delay characteristic.

However, the data receiver is ultimately a binary


comparator and the phase detector does not indicate an error
that is proportional to the phase difference. Hence, the
timing-recovery loop is nonlinear. Although a higher-order
PLL using early-late control can be made conditionally
stable [8], the resulting phase dithers with a limit cycle
translating into jitter. The oscillation depends on the loop
parameters and can be considerable for high bandwidth
loops. With an early-late DLL, the phase of the clock output
also dithers. But because the stability only depends on the
delay within the loop, the dithering would only be a few
cycles and can be significantly less than the dithering of a
Figure 4: Early-late receiver architecture using the receiver as the PLL.
phase detector. Timing diagram showing early and late data.
C. Design Considerations in a DLL
voltage alters the frequency of an oscillator. The output A typical DLL involves several design considerations.
phase is an integration of the frequency change. In response First, the delay line usually has a finite delay range. If the
to a noise perturbation, the loop accumulates a phase error desired phase of the output signal is beyond the delay range,
before correcting. In contrast, a DLL attenuates the phase the loop will not lock properly. Second, the output of the
error by the time constant of the loop. In the figure, both DLL also depends greatly on the input to the delay line.
loops are designed with the same 3-dB bandwidth, the same Since the delay-line input propagates to the DLL output,
delay elements, and the PLL is a 2nd-order loop with a tracking jitter and the output's duty cycle depend not only on
damping factor of unity. Clearly, the PLL suffers from larger the delay-line design but also on the delay-line input. Third,
phase errors due to the phase accumulation. the basic DLL cannot generate new frequencies different
A second advantage relates to clock and data recovery from that of the delay-line input.
applications. An effective way to recover the timing for A variable-delay line adjusts the delay by varying the
sampling a data input is to use the data receiver as a phase RC time constant of a buffer and often has limited
detector. The architecture, depicted in Fig. 4, uses the adjustment range. Section III will describe several
180°-shifted clock to sample the data transitions in addition techniques in greater detail. Even though the delay range is
to sampling the data values [7]. Whenever data changes limited, DLLs for a periodic clock signal only need the range
values, the sampled transition and the data values can be to exceed 2n in phase across process and systematic
combined to indicate whether the sampling clock edge is variations to cover all possible phases. For systems with a
earlier or later than the data transition. Phase information is range of operating frequencies, the delay line must span 2K
only present with data transitions. The feedback loop locks for the lowest input frequency.
when the transition sampling clock samples a metastable An issue known as false-locking occurs when the delay
value. This commonly used design is known as an early-late range exceeds IK. There can be several secondary lock
or bang-bang architecture. The timing diagram in Fig. 4 points repeating every 2TC. Figure 5 depicts an example of the
illustrates examples of the data being early and late. Due to characteristic of a delay line with two lock points. Since
the inherent setup time of the data receiver, the transition phase detectors must be periodic, if the delay line initializes
sampling clock may not occur at the same time as the data within 7i of the second lock point, the phase detector will
transition. The phase shift compensates for the receiver setup push the delay line toward lock with a longer than necessary
time and maximizes the margin of error for the data delay. Long delays require large RC time constants for a
sampling. given variable-delay buffer element. The bandlimiting by the

15
filter would significantly attenuate a high-frequency input
clock. The attenuation increases the jitter and may even clock in
prohibit the input from reaching the output.
Even if the delay line is constrained to span only one Vref
lock point but greater than 2rc, a second similar issue exists. clock out
It is difficult to design a delay line such that the adjustable
range is exactly *c to -tic across different operating and offset
processing conditions. If initialized at the minimum or
maximum delay, the phase detector may push the loop vref
toward either the maximum or minimum delay limit and
"false-lock" to an incorrect phase.
To address false-locking, designers employ several
techniques depending on the application. For systems that
require a delay line with a known fixed delay, operating Vrefl
condition variations may be small enough such that the delay / \vrcf2
line only needs a small variable range that is less than +n and
clock in /
-n. For systems that lock to a fixed phase over a wide range \
of frequencies, one design [9] uses an auxiliary
frequency-sensing loop that generates a voltage to coarsely
set the delay for the given input frequency. Then DLL only duty c y c l e
fine tunes the delay for the desired phase. For data recovery *—
clock out ™ —»» reduction
applications where the clock phase can be arbitrary with mmmmmmm

respect to the data, a common design uses a startup circuit


for the DLL that initializes the delay line at its minimum Figure 6: Duty-cycle corrector block diagram. Timing diagram
delay to avoid any secondary lock points. However, as shows change in duty cycle with changing offset.
mentioned earlier, the phase detector may keep the delay line
at the minimum delay. A sensing circuit or a state machine
detects when the delay line is at its limit and optionally Because the buffer input has finite slew rate, changing the
inverts the feedback clock. The phase would flip by 180° and threshold effectively adjusts the output high and low
the loop would lock properly. As will be discussed in half-periods. The loop settles when the high and low
Section III, a more robust alternative reconfigures the delay half-periods are equal. Figure 6 illustrates the reduction in
line such that the delay only spans 2n and wraps back to 0° duty cycle as the threshold shifts from Vrefi to Vref2. Since
when the delay exceeds 360°. random variation of the duty cycle effectively appears as
The jitter and duty cycle of the delay-line output clock jitter, single-ended implementations such as that shown in
depend on the input, the coupling of the input to the delay the figure can be very sensitive to common-mode noise. For
line, and the delay line itself. Often the input is from off-chip this reason, differential architectures are preferred [3].
and, therefore, it must be carefully received to prevent For low jitter on the output clock, the loop components
supply and substrate noise from coupling onto the signal as must be carefully designed. Many of the loop components
jitter. In contrast, the high-frequency phase noise of the are very similar to that of a PLL and are well described in
clock output of an oscillator-based PLL depends primarily [10]. For a charge-pump based loop filter, since the filter is
on the oscillator design. An improperly received input clock only first-order, a simple capacitor replaces the RC filter. As
can often result in worse jitter performance in a DLL as in a PLL, noise on the control voltage directly translates into
compared to a PLL. Similarly, while the duty cycle from an jitter. Designers may use additional filtering to suppress the
oscillator is only modestly distorted (by the difference noise. The loop element that has deviated the most from PLL
between the rising edge and falling edge delays), the duty design and is critical for functionality and performance is the
cycle of the DLL's input clock can be significantly distorted design of the delay line.
as it propagates to the output. Since duty cycle is a
systematic error, a good design corrects duty cycle using an III. D E L A Y - L I N E ARCHITECTURES
explicit block instead of compounding the difficulty of the
delay-line design. The primary characteristics of a delay line are (1) gain
(i.e. change in delay for a given change in voltage), and (2)
A duty-cycle corrector (DCC) is commonly added to
delay range. For most applications using periodic inputs, the
either the DLL input or output. Figure 6 illustrates the basic
absolute delay is not critical as long as the range spans 27U.
components of the feedback loop: an input with finite slew
Because delay lines are relatively short, they do not
rate, a buffer element with adjustable threshold, a
contribute significant thermal or 1/f phase noise. However,
comparator, and an integrator. The comparator determines
for large digital systems, low supply/substrate sensitivity is
the threshold crossing of the clock waveform. The result is
needed to reject the on-chip switching noise.
integrated and used to skew the threshold of the buffer stage.

16
vc
resistive

out out a control (d)

in + m
- in+ in_

vb~ Vc
capacitive
(a) (b) control (f)

V
CP

out vc
Figure 8: Delay versus voltage for two different delay buffer ele-
in out ments: types (d) and (f) of Fig. 7.

in+ in. For push-pull type elements such as inverters, the delay
VOT can be changed by changing the rate at which the output
Vb~ Vc capacitance is charged [Fig. 7-(d)]. An adjustable current
source limits the peak current of an inverter and varies the
delay. An alternative method regulates the supply voltage of
(c) (d) the inverters and uses the control voltage to set the supply
Vc voltage [Fig. 7-(e)]. The effective switching resistance varies
in out with the supply voltage. Instead of changing the resistance,
the effective capacitance can also be made adjustable [Fig.
vc 7-(f)]. A transistor that behaves as an adjustable resistance
in can be used to decouple an explicit output capacitance. The
out
larger the resistance the less capacitance is seen at the
output.
(e) (f) Figure 8 illustrates the delay versus control voltage for a
Figure 7: Six different delay elements. resistively-controlled delay element. For the element of Fig.
7-(d), either \fcs"^th o r m e ^*as c u r r e n t c a n De z e r o an d,
therefore, a single element's delay can span from the
A. Basic Delay Line minimum buffer delay to infinite. However, since the time
constant is proportional to the delay, a long delay setting
A delay line comprises of a chain of variable-delay would significantly attenuate a high-frequency clock. Delay
elements. Each element is controllable by either a voltage or lines with a wide range for high clock frequencies require a
a current. The delay of each element is proportional to its RC large number of broadband delay elements.
time constant and changing the effective resistance or Unlike resistive control, the maximum delay in a
capacitance adjusts the delay. capacitively-controlled element [Fig. 7-(f)] is proportional to
Figure 7 depicts several examples of buffer elements. R(C int +C exp ) and the minimum delay is proportional to
For a differential buffer, the load resistance can be an MOS RC int where C int is the intrinsic capacitance of the buffer
transistor in the triode region [Fig. 7-(a)] where the and the load of the subsequent stage, and C e x p is the explicit
resistance is proportional to Vos-^th- Varying the gate capacitance added to the circuit. Because of the limited
voltage adjusts the delay of the element. A non-linear device range per buffer, obtaining a wide delay range involves a
such as a diode can also serve as a load resistance [Fig. large number of buffers. The maximum delay of each buffer
7-(b)]. Since the resistance varies with the current, varying is chosen to avoid attenuating the signal. In designs where
the bias current of the buffer would adjust the delay. the clock has a large voltage swing, the transistor in series
Similarly, a negative transconductance that changes with the with the explicit capacitance no longer appears as a variable
bias current can be placed in parallel with a fixed load resistor because the device enters saturation and cut-off. For
resistance [Fig. 7-(c)]. The varying negative these buffers, the control voltage determines the fraction of
transconductance changes the effective load resistance and current and period of time in which the buffer's current
hence varies the delay. Because nonlinear elements have charges the explicit capacitance.
resistances that depend on both voltage and current, they can An example of the delay versus control voltage for a
be more sensitive to supply noise. capacitively-controlled element is overlaid in Fig. 8. Most

17
180° Phase <*inO & cI W>
Detect +
Filter ck
in0
Ckjnl c
^outl
yc
<*inl
X
<*outO
ck
inO ^Io
c
clock^ ^outOl ^outl
4*90,270 f ^0,180
^45,225
t 0135,315 cllWoi
Figure 9: 180°-locked DLL to generate intermediate phases that are
a fraction of a cycle.
I 0-oc)I0
\ JPhaselntei^Iartor, ^
delay elements exhibit some nonlinearity. As a result, the Figure 10: Phase interpolator design by shorting of the output of
delay-line gain, K DL , is a function of the delay. Because a two integrators/buffers..
DLL is unconditionally stable, the loop still functions with
the varying loop parameter. However, more linear elements
are better for designs that require a constant loop bandwidth. Multiplexers are needed to select the phases to interpolate
To compensate for the variable K D L , designers add between. For example, with phases tapped from a 4-stage
programmability to the loop-filter capacitor. delay line, if the desired output clock phase is 120°, the
The control signal for either type of delay elements can interpolator inputs would be from the second and third delay
be digital. In a digital implementation [11], the current elements.
source is binary weighted and switched by a digital word. Interpolators essentially perform a weighted average of
For capacitively-controlled elements, the capacitance can be the input phases. As shown in Fig. 10, ideally, the two input
binary weighted and switched. A nearly all-digital DLL is phases drive two integrators which charge a single output.
then possible by using a simple counter to replace the analog The weighting of the average is by the relative currents of
integrating filter. the two integrators. When <x=l, the output clock phase
depends only on ckinQ. When a=0.5, i.e. the current is split
B. Phase Interpolation
equally between the two integrators, the output phase is
Instead of only using the clock phase at the end of a additionally delayed by half the phase difference. As
delay line, an earlier clock phase can be tapped from the illustrated in Fig. 10, the phase of the interpolated output
middle of a delay line. Some applications require the delay (ckoutQj) falls between the phases of the non-interpolated
line to produce a delay that is a fixed fraction of the outputs (ckout0 and ck0UtJ).
input-clock period. Figure 9 shows one implementation that With ideal integrators, the interpolation is linear,
uses a DLL to lock the input clock to the output. An 180° resulting in a constant KpL. Alternatively, an interpolator
phase detector would guarantee the absolute delay of a delay can effectively be formed with buffer elements instead of
line to be a half-cycle. Tapping from different points on the integrators. By weighting the drive strength or current of two
delay line provides different phases. As shown in Fig. 9, for buffer elements whose outputs are shorted together, one can
a 45° phase shift, the clock can be tapped from the first delay adjust the output phase. Because the output is not integrated,
stage of a 4-stage differential delay line. If an arbitrary phase the resulting interpolation is slightly nonlinear and depends
is needed, each delay stage can be tapped and multiplexers on (1) the phase difference between the inputs and (2) the
can select the nearest desired phase. The number of delay slew rate (or time constant) of the input and output signals
elements quantizes the phase step and limits the resolution [13]. Figure 11 depicts the linearity of the interpolation for
[12]. Fine phase resolution requires longer delay lines. Yet, two different input phase separations, s=r and S=2T where x
the resolution is limited at high clock frequencies because is the buffer's time constant. The larger phase spacing results
the maximum number of delay elements needed to span 180° in greater nonlinearity. Similar to RC delay elements, the
is limited. interpolation can be digitally controlled. Since the weighting
An arbitrary intermediate phase can be obtained by of the interpolation depends on the proportional current, the
"interpolating" between two clock phases that are tapped current sources of the integrators or buffers can be digitally
from a delay line. Depending on the weighting, an weighted and programmed.
interpolator produces a clock that has a programmable In a design for clock and data recovery by [3],
output phase in between the input clock phases. As long as quadrature clocks are interpolated to generate an
discrete clock phases that span the entire cycle are available intermediate clock phase within a quadrant. Figure 12
as inputs, any phase for the interpolator's output is possible. illustrates the mostly analog architecture. An analog control

18
clocks
3.61 Phase Generator
l^O J<feo J<Pl80 % 7 0

,Mux A
Control
I 2.6T
Interpolator
datajn docksamp
Phase
1.6T Detect
Figure 12: Infinite-range delay line based on phase rotation.
0.0 0.2 0.4 0.6 0.8 1.0
current partition (X)
Figure 11: Buffer based phase interpolator linearity.

voltage produced by the phase detector and filter determines


the interpolator currents. Comparators indicate when the
Nc-^K^-K^I
current is fully steered to one integrator. A finite state
machine driven by the comparators selects the appropriate ^ ^ O 0,180
quadrant by switching the interpolator inputs such that all t
4*45,225
t
$135,315
360° phases are possible. The quadrature input clocks is
generated from an external reference clock through the use Figure 13: Oscillator with tapped outputs for multiple phases.
of a divide-by-two circuit.
Interestingly, because the phase rotates from one feedback of the first loop [15] as long as the stability of the
quadrant to the next, the architecture effectively has an loop is carefully considered.
unlimited delay range. If the input data rate and the reference The data recovery portion of a dual-loop design is
clock frequency are slightly different, a DLL would conducive to a digital implementation. The binary output of
continually increase or decrease the delay in order to track the receiver-replica phase detector can be accumulated using
the accumulating input phase. A typical DLL with a finite a digital counter. The counter output selects the appropriate
delay range would run out of delay or lose lock. On the other phase from the oscillator and controls the digitally
hand, plesiochronous operation is possible with an programmable interpolators [13],[14]. As long as the
interpolator-based delay line since the phase smoothly quantized phase step is small, the small error only minimally
rotates between quadrants. impacts the data recovery.
Interpolating between clocks with large phase spacings
such as quadrature clocks results in an output clock with C. Overs amp led Implementation
slow slew rate. Such waveforms are more susceptible to An alternative purely digital approach to clock and data
noise and result in higher jitter. An enhancement uses more recovery can be implemented by oversampling the data.
closely-spaced phases that span the cycle. The finer phases Figure 14 illustrates an example of a digital architecture.
spacing is possible using a multi-stage ring oscillator. As Multiple finely-spaced clock phases oversample the data
shown in Fig. 13, a 4-stage differential oscillator would input. The sampled results are digitally processed to
generate 8 phases 45° apart. To guarantee a correct period determine both the correct data value and the optimal phase
for each clock phase, the ring oscillator is locked to the of the data sample. The digital processing can vary in
external reference clock using a PLL. The role of the PLL is complexity. Simple implementations use the optimal data
solely for generating the phases. A purely DLL-based sample as the received data [18] or take a majority vote from
architecture is also possible by replacing by using the DLL the samples of a single bit [17]. The bit boundaries
in Fig. 9 that locks the delay-line output with a 180° phase determine the samples associated with a bit. Transitions that
shift [13]. are detected in the samples from the prior or current bits
The architecture is commonly known as a dual-loop indicate the bit boundaries.
design because the first loop, a PLL or DLL, generates the The sampling rate limits the timing error margin.
phases and the second loop, the interpolation-based DLL, Greater amount of oversampling reduces the data-recovery
recovers the data and phase. Since the first loop is not in the timing error, but increases the number of clock phases. Low
feedback of the second loop (or vice versa), the overall data rate UARTs [16] typically use 8 to 16 times
system is stable as long as each loop is individually stable. A oversampling. For high data rates, generating accurate clock
dual-loop design is possible with the second loop within the phases separated by sub-lOOps is very challenging. More

19
datait1 Startable Oscillator

L L Lr
D
>
C D
>
C D
>
C
••• Receiver
Samplers
dafcin

TDH 180P Delay Line

cloclT l^o
[1:N]
R. I
-£>\
r
18(f Delay Line

Transition
do'

Detect
dock^p
Decision
Logic Receiver delay control

f received data
received data Figure 15: Clock/data recovery using startable oscillator.
Figure 14: Oversampled data recovery architecture.

uses logical AND-ORs to combine the multiple phased


aggressive designs with the least amount of clocking clocks into a single high-frequency clock [23]. Alternatively,
overhead and high data rates use a minimum of 3x the method in [24] converts each phase into a small pulse
oversampling [17], [18]. and ORs the pulses together to form the output clock. In
Even though phase spacing scales with the gate delay of cases where the output capacitance of the logic gates limits
a technology, so does the bit time in each generation of the output frequency, one design [6] uses phases to excite a
applications. For oversampling of the data bits, finely-spaced tuned LC tank to combine the clock phases.
clock phases are needed. Tapping from a delay line produces Instead of edge combining, the multiplied clock can be
phases separated by a buffer delay. For even finer phases, the direct output of a delay line. The architecture is similar to
several techniques are commonly used. For example, several a technique for clock and data recovery that uses a startable
interpolators can be used where each interpolator has slightly oscillator [22]. As shown in Fig. 15, the architecture uses
different weighting to generate intermediate phases with data transitions to trigger startable oscillators: high-value
spacing less than a buffer delay [19]. An alternative method data triggers one oscillator and low-value data triggers
uses a chain or array of coupled oscillators [20]. By taking a another. Each startable oscillator comprises of a delay line
chain of oscillators and coupling them such that the output and an AND gate. The data value enables the AND gate and
and input of the chain are separated by only one gate delay, the triggered oscillator propagates an edge through the delay
sub-gate-delay phase spacings result from the outputs of elements and produces a clock edge delayed by a half-cycle.
each oscillator. Lastly, if the data can be delayed with a The edge is used to sample the data. In the absence of input
chain of delay buffers along with the clock, the clock at each transitions, the delay line is configured an oscillator and
delay stage can be used to sample the data of the generates a sampling edge every cycle. Whenever a new data
corresponding stage. As long as the data and the clock delay transition occurs, the oscillator resynchronizes its phase to
lines have slightly different delays, the sub-sampled outputs that of the input. In the implementation by [22], the natural
are effectively an oversampling of the data. The effective oscillation frequency of the oscillator is determined by an
phase spacing depends only on the difference between the external plesiochronous clock reference. The architecture
data delay and clock delay [21]. The architecture has a has not been widely applied to higher data rate designs
drawback in that it requires delaying the data and clock by because the sampling phase is directly derived from the input
long delays of several cycles, which can significantly data without any filtering. The deterministic and random
increase jitter. jitter inherent in the data are effectively doubled and can be
considerable.
IV. CLOCK MULTIPLICATION
If the input is a low-jitter reference clock, a similar
With a dual-loop architecture, a DLL can produce a architecture can be used for clock multiplication [5]. As
frequency plesiochronous to the delay-line input. However, illustrated in Fig. 16, a lower frequency but clean reference
the rate at which the interpolator weight changes limits the clock is one input to a multiplexer that feeds into a delay
frequency difference. Generating a significantly different or line. The output of the delay line is fed back to the
multiplied frequency from a low-frequency input reference multiplexer as the second input. When a reference clock
is not possible with the architecture. edge is available, the multiplexer selects the reference input.
Recently designers have explored several methods of Otherwise, the multiplexer configures the delay line as an
using DLLs for frequency multiplication. One method uses a oscillator with the output frequency controlled by the delay.
delay line that is locked to 180°. With the phases that span an The multiplexer inputs are selected by a counter circuit that
entire cycle, the tapped clock edges are combined to form a determines the number of cycles to oscillate before accepting
clock with multiplied frequency. The most direct method the next reference clock edge. A phase detector compares the

20
restore a clock's duty cycle, the output clock requires
t correction circuitry. To use DLLs in plesiochronous systems,
Counter+ the delay line must have even more circuitry to achieve an
Control unlimited delay range. In clock multiplication applications,
very careful matching in the DLL components is critical to
eliminate reference tones. In the many designs that have
clock^ Delay Line addressed these subtleties, DLLs have demonstrated
Nxf ref low-jitter clock outputs for a variety of clock generation and
data recovery applications.
Filter 1
4 REFERENCES

• [I] Bazes, M., "A Novel Precision MOS Synchronous Delay


Detect ** 1 Line," IEEE Journal of Solid-State Circuits, vol sc-20, no
6, Dec. 1985, pp. 1265-71
Figure 16: DLL-based clock multiplication. [2] Johnson, M.G., E.L. Hudson, "A Variable Delay Line
PLL for CPU-Coprocessor Synchronization," IEEE Jour-
reference input with the oscillator output and tunes the delay nal of Solid-State Circuits, vol 23, no 5, Oct. 1988, pp.
1218-23
of the delay elements. Once locked, the resulting output
[3] Lee, T.H., et. al., "A 2.5V CMOS Delay-Locked Loop for
clock frequency is a multiple of the input reference an 18 Mbit, 500Megabytes/s DRAM," IEEE Journal of
frequency. Recent designs [26] extend the frequency range Solid-State Circuits, vol 29, no 12, Dec. 1994, pp. 1491-6
and use an interpolator instead of a multiplexer to blend the [4] Messerschmitt, D.G., "Synchronization in Digital System
delay-line feedback and the low-frequency reference clocks. Design," IEEE Journal on Selected Areas in Communica-
Both edge-combining multiplication and delay-line tions, Oct. 1990, pp. 1404-1420
multiplication reduce the phase noise of the output clock [5] Waizman, A., "A Delay Line Loop for Frequency Syn-
because the core DLL does not have an oscillator that thesis of De-Skewed Clock," IEEE ISSCC Dig. of Tech.
accumulates phase error. After N cycles, where N is the Papers, Feb. 1994, San Francisco, Session 18.5
divide ratio, a new clean reference clock edge arrives and [6] Chien, G., P.R. Gray, "A 900-MHz Local Oscillator
resets any accumulated phase error to zero. The architecture Using a DLL-Based Frequency Multiplier Technique for
PCS Applications," IEEE Journal of Solid-State Circuits,
potentially lowers jitter by eliminating the peaking in the
vol 35, no 12, Dec. 2000, pp. 1996-9
transfer function and allows a high tracking bandwidth.
[7] Alexander, J.D., "Clock Recovery from Random Binary
However, matching is critical in these designs. Mismatches Data," Electronic Letters, vol 11, Oct. 1975, pp 541-2
in the phase detector or charge pump result in a static phase [8] D1 Andrea, N.A., F. Russo, "A binary quantized digital
error that modulates the output frequency at the input phase locked loop: a graphical analysis," IEEE Transac-
reference frequency. Similarly, in the edge combining tions on Communications, vol.COM-26, (no.9), Sept.
implementations, if the delay line is mismatched, the output 1978.p.l355-64
clock would contain significant reference tones. Designers [9] Moon, Y,, "An All-Analog Multiphase Delay-Locked
either choose the reference frequency carefully so that the Loop Using A Replica Delay Line for Wide-Range Oper-
tones do not impact the system performance or employ ation and Low-Jitter Performance," IEEE Journal of
additional circuitry to compensate for the mismatches. Solid-State Circuits, vol 35, no 3, Mar. 2000, pp. 377-84
[10] Razavi, B., "Design of Monolithic Phase-Locked Loops
and Clock Recovery Circuits - A Tutorial," Monolithic
V. CONCLUSION
Phase-locked Loops and Clock Recovery Circuits, IEEE
DLLs have been commonly used for generating precise Press 1996 New Jersey, pp. 1-28
phase delays of a signal and have been increasingly popular [II] Dunning, J., et. al. "An All-Digital Phase-Locked Loop
in clock generation and data recovery applications. Most with 50-Cycle Lock Time Suitable for High-Performance
microprocessors," IEEE Journal of Solid-State Circuits,
importantly, because of the first-order loop characteristics
vol 30, no 4, Apr. 1995, pp. 412-22
that controls the phase directly, DLLs can be designed with
[12] Efendovich, A., et. al., "Multifirequency Zero-Jitter
high tracking bandwidths and do not exhibit the phase Delay-Locked Loop," IEEE Journal of Solid-State Cir-
accumulation of an oscillator-based PLL. cuits, vol 29, no 1, Jan. 1994, pp. 67-70
The more simple loop characteristics belie many [13] Sidiropoulos, S., M.A. Horowitz, "A Semidigital Dual
subtleties in DLL design. The delay-line input clock must Delay-Locked Loop," IEEE Journal of Solid-State Cir-
have low-jitter and good duty-cycle. Furthermore, it must be cuits, vol 32, no 11, Nov. 1997, pp. 1683-92
carefully received and coupled to the input of the delay line [14] Garlepp, B., et. al., "A Portable Digital DLL for
to maintain good jitter performance. This source of jitter High-Speed CMOS Interface Circuits," IEEE Journal of
counter-balances the jitter accumulation of PLLs and results Solid-State Circuits, vol 34, no 5, May 1996, pp. 632-44
in less jitter improvement. Additional circuitry is often [15] Larsson, P., "A 2-1600-MHz CMOS Clock Recovery
PLL with Low-Vdd Capability," IEEE Journal of
needed to prevent false-locking. Since a delay line does not

21
Solid-State Circuits, vol 34, no 12, Dec. 1999, pp. [22] Ota, Y. et. al., "High-Speed, Burst-Mode, Packet Capable
1951-60 Optical Receiver and Instantaneous Clock Recovery for
[16] Cordell, R., "A 45-Mbit/s CMOS VLSI Digital Phase Optical Bus Operation," IEEE Journal of Lightwave
Aligner," IEEE Journal of Solid-State Circuits, vol 23, no Technology, vol 12, no 2, Feb. 1994, pp. 325-330
2, Apr. 1988, pp. 323-28 [23] Foley, D., M.P. Flynn, "CMOS DLL-Based 2-V 3.2ps
[ 17] Lee, K., et. al., "A CMOS Serial Link For Fully Duplexed Jitter 1-GHz Clock Synthesizer and Temperature-Com-
Data Communication," IEEE Journal of Solid-State Cir- pensated Tunable Oscillaor," IEEE Journal of Solid-State
cuits, vol 30, no 4, Apr. 1995, pp. 353-64 Circuits, vol 36, no 3, Mar. 2001, pp. 417-23
[18] Yang, C.K., et al., "A 0.5-|im CMOS 4.0-Gb/s Serial [24] Kim, C , I. Hwang, S.M. Kang, "Low-Power Small-Area
Link Transceiver with Data Recovery Using Oversam- +/-7.28ps Jitter lGHz DLL-Based Clock Generator,"
pling," IEEE Journal of Solid-State Circuits, vol 33, no 5, IEEE ISSCC Dig. of Tech. Papers, Feb. 2002, San Fran-
May 1998, pp. 713-22 cisco, Session 8.3
[19] Weinlader, D., et al., "An Eight Channel 36-GS/s CMOS [25] Farjad-rad, R., et. al., "A 0.2-2GHz 12mW Multiplying
Timing Analyzer," IEEE ISSCC Dig. of Tech. Papers, DLL for Low-Jitter Clock Synthesis in Highly-Integrated
Feb. 2000, San Francisco, pp. 170-1 Data Communication Chips," IEEE ISSCC Dig. of Tech.
[20] Maneatis, J., M. Horowitz, "Precise Delay Generation Papers, Feb. 2002, San Francisco, Session 4.5
Using Coupled Oscillators," IEEE Journal of Solid-State [26] Ye, S., L. Jansson, I. Galton, "A Multiple-Crystal Inter-
Circuits, vol 28, no 12, Dec. 1993, pp. 1273-82 face PLL with VCO Realignment to Reduce Phase
[21] Gray, C , et. al., "A Sampling Technique and Its CMOS Noise," IEEE ISSCC Dig. of Tech. Papers, Feb. 2002,
Implementation with lGb/s Bandwidth and 25ps Resolu- San Francisco, Session 4.6
tion", IEEE Journal of Solid-State Circuits, vol 29, no 3, [27] Kim, J., et. al., "A Low-Jitter Mixed-Mode DLL for
Mar. 1994, pp. 340 High-Speed DRAM Applications," IEEE Journal of
Solid-State Circuits, vol 35, no 10, Oct. 2000, pp. 1430-3

22
Delta-Sigma Fractional-TV Phase-Locked Loops
Ian Galton

Abstract—This paper presents a tutorial on delta-sigma


fractional-TV PLLs for frequency synthesis. The presenta- Reference Signal
tion assumes the reader has a working knowledge of inte- Generator
V
ger-TV PLLs. It builds on this knowledge by introducing ref Phase/
U V
% Lowpass out
Frequency
the additional concepts required to understand A£ frac- V
<//v
Detector
d Loop Filter
VCO ]

tional-TV PLLs. After explaining the limitations of integer-


TV PLLs with respect to tuning resolution, the paper intro-
duces the delta-sigma fractional-TV PLL as a means of
avoiding these limitations. It then presents a self- Charge Parop
contained explanation of the relevant aspects of delta-
sigma modulation, an extension of the well known integer-
•f N
TV PLL linearized model to delta-sigma fractional-TV PLLs,
a design example, and techniques for wideband digital
modulation of the VCO within a delta-sigma fractional-TV V
PLL. V
ref Phase/
Frequency
U V
rfv
y
div Detector d U

I. INTRODUCTION d
T
ref-
Over the last decade, delta-sigma (AS)fractional-TVphase
locked loops (PLLs) have become widely used for frequency Figure 1: A typical integer-N PLL.
synthesis in consumer-oriented electronic communications underlying fractional-TV PLLs in general and AS fractional-TV
products such as cellular phones and wireless LANs. Unlike PLLs in particular are presented in Section III. The primary
an integer-TV PLL, the output frequency of a AS fractional-TV innovation in ASfractional-TVPLLs relative to other types of
PLL is not limited to integer multiples of a reference fre- fractional-TV PLLs is the use of AS modulation. Therefore, a
quency. The core of a ASfractional-TVPLL is similar to an self-contained introduction to AS modulation as it relates to
integer-TV PLL, but it incorporates additional digital circuitry ASfractional-TVPLLs is presented in Section IV. A AS frac-
that allows it to accurately interpolate between integer multi- tional-TV PLL linearized model is derived in Section V and
ples of the reference frequency. The tuning resolution de- compared to the corresponding model for integer-TV PLLs. A
pends only on the complexity of the digital circuitry, so con- design example is presented to demonstrate how the model is
siderable flexibility and programmability is achieved. A sin- used in practice. Design issues that arise in AS fractional-TV
gle AS fractional-TV PLL often can be used for local oscillator PLLs but not integer-TV PLLs are presented in Section VI, and
generation in applications that would otherwise require a cas- recently developed enhancements to AS fractional-TV PLLs
cade of two or more integer-TV PLLs. Moreover, the fine tun- that allow wideband digital modulation of the VCO are pre-
ing resolution makes it possible to perform digitally-controlled sented in Section VII.
frequency modulation for generation of continuous-phase
(e.g., FSK and MSK) transmit signals, thereby simplifying
II. INTEGER-TV PLL LIMITATIONS
wireless transmitters. These benefits come at the expense of
increased digital complexity and somewhat increased phase An example of a typical integer-TV PLL for frequency syn-
noise relative to integer-TV PLLs. However, with the relentless thesis is shown in Figure 1 [1], [2]. Its purpose is to generate
progress in silicon VLSI technology optimized for digital cir- a spectrally pure periodic output signal with a frequency of TV
cuitry, this tradeoff is increasingly attractive, especially in /„,/, where TV is an integer, and/ re /is the frequency of the refer-
consumer products which tend to favor cost reduction over ence signal. The example PLL consists of a phase-frequency
performance. detector (PFD), a charge pump, a lowpass loop filter, a voltage
This paper presents a tutorial on ADfractional-TVPLLs. It controlled oscillator (VCO), and an TV-fold digital divider.
is assumed that the reader has a working knowledge of inte- The PFD compares the positive-going edges of the reference
ger-TV PLLs. The paper builds on this knowledge by present- signal to those from the divider and causes the charge pump to
ing the additional concepts required to understand AS frac- drive the loop filter with current pulses whose widths are pro-
tional-TV PLLs. The limitations of integer-TV PLLs with respect portional to the phase difference between the two signals. The
to tuning resolution are described in Section II. The key ideas pulses are lowpass filtered by the loop filter and the resulting
waveform drives the VCO. Within the loop bandwidth phase
The author is with the Department of Electrical and Computer Engi- noise from the VCO is suppressed and outside the loop band-
neering, University of California at San Diego, La Jolla, CA, USA. width most of the other noise sources are suppressed, so the

23
/ . - 4 0 kHz fw - 2.402 GHz Phase/
Charge Loop
+ *MHz<wk Freq.
^-492 Phase/ Charge Loop
Detector Pump Filter
f**r2-403 GHz
Freq. VCO 19.68 MHz (on average)
Detector Pump Filter

19.68 MHz -MM+JPI-I

^•60050 + 2 5 * _r
d
Shift Register with 51
ones and 441 zeros y\»\
Figure 2: An example integer-N PLL for generation of the Bluetooth wireless
LAN RF channel frequencies. Figure 3: A fractional-.^ PLL that generates non-integer multiples of the refer-
ence frequency, but has phase noise consisting of large spurious tones.
PLL can be designed to generate a spectrally pure output sig-
nal at any integer multiple of the reference frequency,/*/. scribed in the next section, a singlefractional-TVPLL can be
As indicated by the timing diagram in Figure 1, the loop used.
filter is updated by the charge pump once every reference pe-
riod. This discrete-time behavior places an upper limit on the III. THE IDEA BEHIND AS FRACTIONAL-//PLLs
loop bandwidth of approximately fnj/l0 above which the PLL
tends to be unstable [1]. In integrated circuit PLLs, it is com- In this section, the example problem of generating the sec-
mon to further limit the bandwidth to approximately f^/20 to ond Bluetooth channel frequency, 2.403 GHz, with a reference
allow for process and temperature variations. frequency of 19.68 MHz is used as a vehicle with which to
The output frequency can be changed by changing N, but explain the idea behind AE fractional-N PLLs. First, a pair of
N must be an integer, so the output frequency can be changed "bad"fractional-TVPLLs are presented that achieve the desired
only by integer multiples of the reference frequency. If finer frequency but have poor phase noise performance. Then the
tuning resolution is required the only option is to reduce the AEfractional-TVPLL technique is presented as a means of im-
reference frequency. Unfortunately, this tends to reduce the proving the phase noise performance.
maximum practical loop bandwidth, thereby increasing the The output frequency of an integer-N PLL with a reference
settling time of the PLL, the noise contributed by the VCO, frequency of 19.68 MHz is 2.40096 GHz when the divider
and the in-band portions of the noise contributed by the refer- modulus, N, is set to 122 and 2.42064 GHz when N is set to
ence source, the PFD, the charge pump, and the divider. 123. The problem is that to achieve the desired frequency of
This fundamental tradeoff between bandwidth and tuning 2.403 GHz, TV would have to be set to the non-integer value of
resolution in integer-Af PLLs creates problems in many appli- 122 + 51/492. This cannot be implemented directly because
cations. For example, a PLL that can be tuned from 2.402 the divider modulus must be an integer value. However the
GHz to 2.480 GHz in steps of 1 MHz is required to generate divider modulus can be updated each reference period, so one
the local oscillator signal in a direct conversion Bluetooth option is to switch between N = 122 and N= 123 such that the
transceiver [3]. An integer-N PLL capable of generating the average modulus over many reference periods converges to
local oscillator signal from a commonly used crystal oscillator 122 + 51/492. In this case, the resulting average PLL output
frequency, 19.68 MHz, is shown in Figure 2. A reference fre- frequency is 2.403 GHz as desired. This is the fundamental
quency of fref = 40 kHz—the greatest common divisor of the idea behind most fractional-// PLLs [4].
crystal frequency and the set of desired output frequencies—is While dynamically switching the divider modulus solves
obtained by dividing the crystal oscillator signal by 492. The the problem of achieving non-integer multiples of the refer-
resulting PLL output frequency is 60050 + 25k times the ref- ence frequency, a price is paid in the form of increased phase
erence frequency, where k is an integer used to select the de- noise. During each reference period the difference between
sired frequency step. the actual divider modulus and the average, i.e., ideal, divider
The PLL achieves the desired output frequencies, but its modulus represents error that gets injected into the PLL and
bandwidth is limited to approximately 2 kHz, i.e.,/^/20. Un- results in increased phase noise. As described below, the
fortunately, with such a low bandwidth the settling time ex- amount by which the phase noise is increased depends upon
ceeds the 200 jiS limit specified in the Bluetooth standard, and the characteristics of the sequence of divider moduli.
the phase noise contributed by the VCO would be unaccepta- For example, in the fractional-JV PLL shown in Figure 3,
bly high if it were implemented in present-day CMOS tech- the divider modulus is set each reference period to 122 or 123
nology. One solution is to use a 1 MHz reference signal, but such that over each set of 492 consecutive reference periods it
this requires the crystal frequency to be an integer multiple of is set to 122 a total of 441 times and 123 a total of 51 times.
1 MHz, or another PLL to generate a 1 MHz reference fre- Thus, the average modulus is 122-1- 51/492 as required. The
quency. Unfortunately, in low cost consumer electronics ap- sequence of moduli is periodic with a period of 492, so it re-
plications such as Bluetooth, it is often desirable to be com- peats at a rate of 40 kHz. Consequently, the difference be-
patible with all of the popular crystal frequencies, so restrict- tween the actual divider moduli and their average is a periodic
ing the crystal frequencies to multiples of 1 MHz is not always sequence with a repeat rate of 40 kHz, so the resulting phase
an option. In such cases, an additional PLL capable of gener- noise is periodic and is comprised of spurious tones at integer
ating the 1 MHz reference signal with very little phase noise multiples of 40 kHz. Many of the spurious tones occur at low
from any of the crystal frequencies is required, or, as de- frequencies, and they can be very large. Unfortunately, the

24
Phase/ Charge Loop
Freq.
1 Pump Filter rD>°T * Phase/ Charge Loop 2.403 GHz
HOI- H Detector fvaT 2 - 4 0 3 G H z
(on average)
HDr-J
Freq.
r Detector
Pump Filter vco
19.68 MHz
19.68 MHz
122 +y\n\
-f- 122 +y[n)
Randomized Pulse fl with probability 51/492
Density Modulator 51/492
10 with probability 441 /492 «*®*£U Simulated PLL Phase Noise

Figure 4: A fractional-TV PLL that generates non-integer multiples of the refer- 500 kHz Loop Bandwidth
{0,217} -80
ence frequency, but has a large amount of in-band phase noise.
pseudo-random
•100
bit sequence y[n
only way to suppress the tones is have a very small PLL {-1,0,1,2} g-120-
bandwidth, which negates the potential benefit of the frac- -140
tional-N technique. " 16 ° I 50 kHz Loop Bandwidth \
One way to eliminate spurious tones is to introduce ran- -180
domness to break up the periodicity in the sequence of moduli
while still achieving the desired average modulus. For exam- Figure 5: A A I fractional-A^ PLL example.
ple, as shown in Figure 4, a digital block can be used to gener-
ate a sequence, y[n], that approximates a sampled sequence of crease the phase noise of the PLL.
independent random variables that take on values of 0 and 1 Also shown in Figure 5 are PSD plots of the output phase
with probabilities 441/492 and 51/492, respectively. During noise arising from AS modulator quantization noise, em[n], in
the n^ reference period the divider modulus is set to 122 + two computer simulated versions of the example AS frac-
y[n], so the sequence of moduli has the desired average yet its tional-Af PLL, one with a 50 kHz loop bandwidth and the other
power spectral density (PSD) is that of white noise. Thus, with a 500 kHz loop bandwidth. As shown in the next section,
instead of contributing spurious tones, the modified technique the PSD of em[n] increases with frequency, so the phase noise
introduces white noise. Unfortunately, the portion of the PSD corresponding to the 50 kHz bandwidth PLL is signifi-
white noise within the PLL's bandwidth is integrated by the cantly smaller than that corresponding to the 500 kHz band-
PLL transfer function, so the overall phase noise contribution width PLL. For example, the former easily meets the re-
again can be significant unless the PLL bandwidth is small. quirements for a local oscillator in a direct conversion Blue-
In each fractional-M PLL example presented above, the se- tooth transceiver, but the latter falls short of the requirements
quence, y[n], can be written as y[n] =x + em[n], where x is the by at least 23 dB.
desired fractional part of the modulus, i.e., x = 51/492, and
em[n] is undesired zero-mean quantization noise caused by IV. DELTA-SIGMA MODULATION OVERVIEW
using integer moduli in place of the ideal fractional value. In
the first example, em[n] is periodic and therefore consists of As mentioned above, a digital AS modulator performs
spurious tones at multiples of 40 kHz. In the second example, coarse quantization in such a way that the inevitable error in-
em[n] is white noise. Each PLL attenuates the portion of em[n] troduced by the quantization process, i.e., the quantization
outside its bandwidth, but the portion within its bandwidth is noise, is attenuated in a specific frequency band of interest.
not significantly attenuated. Unfortunately, in each example There are many different AS modulator architectures. Most
em[n] contains significant power at low frequencies, so it con- use coarse uniform quantizers to perform the quantization with
tributes substantial phase noise unless the PLL bandwidth is feedback around the quantizers to suppress the quantization
very low. noise in particular frequency bands. Therefore, to illustrate
A AS fractional-N PLL avoids this problem by generating the AS modulator concept, first a specific uniform quantizer
the sequence of moduli such that the quantization noise has example is considered in isolation, and then a specific AS
most of its power in a frequency band well above the desired modulator architecture that incorporates the uniform quantizer
bandwidth of the PLL [5], [6], [7]. An example AS fractional- is presented.
ly PLL is shown in Figure 5. The PLL core is similar to those
A. An Example Uniform Quantizer
of the previous fractional-N PLL examples, but in this case
y[n] is generated by a digital AS modulator. The details of The input-output characteristic of the example uniform
how the AS modulator works are presented in the next section, quantizer is shown in Figure 6. It is a 9-level quantizer with
but its purpose is to coarsely quantize its input sequence, x[n], integer valued output levels. For each input value with a
such that y[n] is integer-valued and has the form: y[n] = x[n - magnitude less than 4.5, the quantizer generates the
2] + em[ri], where em[n] is de-free quantization noise with most corresponding output sample by rounding the input value to
of its power outside the PLL bandwidth. In this example, x[n] the nearest integer. For each input value greater than 4.5 or
consists of the desired fractional modulus value, 51/492, plus less than -4.5, the quantizer sets its output to 4 or - 4 , respec-
a small, pseudo-random, 1-bit sequence. As described in the tively; such values are said to overload the quantizer. By
next section, the pseudo-random sequence is necessary to defining the quantization noise as eg[n] = y[n]-r[n]9 the
avoid spurious tones in the AS modulator's quantization noise, quantizer can be viewed without approximation as an additive
but its amplitude is very small so it does not appreciably in- noise source as illustrated in the figure.

25
-y Delay Delay 9-Level
Quantizer

TI
K

I
4-

9-Levei
3 •

2
f] /J
Quantizer
1
I r
\
-4 5 - 3 5 -2.5 -I 5 | 0.5 1.5 2.5 3.5 4 5
Figure 8: A AX modulator example.

e =y-r
-Si can be used to circumvent this problem. The structure incor-
porates the same 9-level quantizer presented above, but in this
y 0.5 r case the quantizer is preceded by two delaying discrete-time
-0.5
integrators (i.e., accumulators), and surrounded by two feed-
M
No-ovcrload range"
back loops [8], [9]. Each discrete-time integrator has a trans-
Figure 6: A 9-level quantizer example. fer function of z~ 1 /(l-z~ 1 ) which implies that its «* output
sample is the sum of all its input samples for times k < n.
48 kHz sinusoid plus white With the quantizer represented as an additive noise source as
noise (SNR = lOOdB) —
sampled at 48 MHz
9-level
Quantizer
(a),(b) Lowpass Filter
(BW = 500 kHz) -fc> depicted in Figure 6, the AS modulator can be viewed as a
two-input, single-output, linear time-invariant, discrete-time
(a) (b)
system. It is straightforward to verify that
2 y[n] = x[n-2] + em[n], (1)
I* where em[n] is the overall quantization noise of the AS modu-
I- 80
•2

500 1000 1500 2000


lator and is given by
jj - em[n] = eq[n]-2eq[n-l] + eq[n-2]. (2)
|.100 (c)
II 2

< -120 Ao
To illustrate the behavior of the AS modulator, suppose that
-2
the same 48 Msample/s input sequence considered above is
^ -140

10* 10S 1O 9 107 C 500 1000 1500


applied to the input of the AS modulator, and that the discrete-
• 160
Hz time (units of 1/(48 MHz))
time integrators in the AS modulator are clocked at 48 MHz.
Figure 7: (a) A power spectral density plot of the quantizer output in dB, Figure 9(a) shows the PSD plot of the resulting AS modulator
relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot output sequence, y[n], and Figure 9(b) shows a time domain
of the quantizer output, and (c) a time domain plot of the quantizer output plot of y[n] over two periods of the sinusoid. Two important
filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.
differences with respect to the uniform quantization example
To illustrate some properties of the example quantizer, shown in Figure 7 are apparent: the quantization noise PSD is
consider a 48 Msample/s input sequence, x[n], consisting of a significantly attenuated at low frequencies, and no spurious
48 kHz sinusoid with an amplitude of 1.7 plus a small amount tones are visible anywhere in the discrete-time spectrum. For
of white noise such that the input signal-to-noise ratio (SNR) instance, the SNR in the zero to 500 kHz frequency band is
is 100 dB. Figure 7(a) shows the PSD plot of the resulting approximately 84 dB for this example as opposed to 14 dB for
quantizer output sequence, and Figure 7(b) shows a time do- the uniform quantization example of Figure 7. Consequently,
main plot of the quantizer output sequence over two periods of subjecting the AS modulator output sequence to a lowpass
the sinusoid. Given the coarseness of the quantization, it is filter with a cutoff frequency of 500 kHz results in a sequence
not surprising that the quantizer output sequence is not a pre- that is very nearly equal to the AS modulator input sequence
cise representation of the quantizer input sequence. As evi- as demonstrated in Figure 9(c).
dent in Figure 7(a), the quantization noise for this input se- Below about 120 kHz, the PSD shown in Figure 9(a) is
quence consists primarily of harmonic distortion as repre- dominated by the two components of the AS modulator input
sented by the numerous spurious tones distributed over the sequence: the 48 kHz sinusoid component, and the input noise
entire discrete-time frequency band. Even in the relatively component. Above 120 kHz, the PSD is dominated by the AS
narrow frequency band below 500 kHz, significant harmonic modulator quantization noise, em[n], and rises with a slope of
distortion corrupts the desired signal. To illustrate this in the 40 dB per decade. It follows from (2) that em[n] can be
time domain, Figure 7(c) shows the sequence obtained by viewed as the result of passing the additive noise from the
passing the quantizer output sequence through a sharp lowpass quantizer, eq[n], through a discrete-time filter with transfer
discrete-time filter with a cutoff frequency of 500 kHz. The function (1 - z"1 ) 2 . Since this filter has two zeros at dc, the
significant quantization noise power in the zero to 500 kHz smooth 40 dB per decade increase of the PSD of em[n] indi-
frequency band causes the sequence shown in Figure 7(c) to cates that eq[n] is very nearly white noise, at least for the ex-
deviate significantly from the sinusoidal quantizer input se- ample shown in Figure 9.
quence. It can be proven that eq[n] is indeed white noise; it has a
variance of 1/12 and is uncorrelated with the AS modulator
B. An Example AS Modulator
input sequence [10]. Moreover, this situation holds in general
The example AS modulator architecture shown in Figure 8 for the example AS modulator architecture provided that the

26
Phase/
48 kHz sinusoid plus white Charge
noise (SNR = lOOdB) —
sampled at 48 MHz
Second-Order
AI Modulator
(a),(b) Lowpass Filter
(BW = 500 kHz) -fc> Freq.
Detector Pump
'*«>
vco
'm«
*<
c, ^')
Ci
(a) (b),
J .0 v Loop Filter
«tQ
+ N + y\n\

f:
£.100
\
0 SO0 1OO0 1500 2000
y\n\-
V

V
ref-
div-
(C)
II
2
• ' • - * ;
*•„••-'„•.
<-120 -2 Figure 10: The AL fractional-N PLL with the details of a commonly used
10 4 10 s 10 e 10T 0 500 1000 1500 2000 loop filter and a timing diagram relating to the charge pump output.
Hz time (units of 1/(48 MHz))
-180

thereby more aggressively suppressing quantization noise in


Figure 9: (a) A power spectral density plot of the A I modulator output in dB,
particular frequency bands relative to the example second-
relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot
of the AE modulator output, and (c) a time domain plot of the A I modulator
order AS modulator. Some of these higher-order AS modula-
output filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.
tors incorporate a higher than second-order loop filter (e.g.,
input sequence satisfies two conditions: 1) its magnitude is more than two discrete-time integrators) and a single quantizer
sufficiently small that the quantizer within the AS modulator surrounded by one or more feedback loops [15], [16]. In
never overloads, and 2) it consists of a signal component plus many cases, these AS modulators are designed specifically to
a small amount of independent white noise. It can be shown allow one-bit quantization [7], [17], [18]. This simplifies the
that the first condition is satisfied if the input signal is design of the divider in that only two moduli are required, but
bounded in magnitude by 3 A where A is the step-size of the such AS modulators tend to have spurious tones in their quan-
quantizer (for this example, A = 1) [11]. Input sequences with tization noise that cannot be completely suppressed even with
values even slightly exceeding 3 A in magnitude generally elaborate dithering techniques. Others of these higher-order
cause the quantizer to overload with the result that eq[n] con- AS modulators, often referred to as MASH, cascaded, or mul-
tains spurious tones and the SNR in the frequency band of tistage AS modulators, are comprised of multiple lower-order
interest is degraded. For this reason, the range between -3A AS modulators, such as the second-order AS modulator pre-
and 3 A is said to be the input no-overload range of the AS sented above, cascaded to obtain the equivalent of a single
modulator. For the second condition to be satisfied, the power higher-order AS modulator [5], [19], [20].
of the AS modulator input sequence's white noise component
may be arbitrarily small, but if it is absent altogether, eq[n], is V. AS FRACTIONAL-N PLL DYNAMICS
not guaranteed to be white. For instance, in the example
shown in Figure 9 the input sequence contains a white noise A AS fractional-TV PLL linearized model is derived in this
component with 100dB less power than the signal component. section in the form of a block diagram that describes the out-
If this tiny noise component were not present, the resulting AS put phase noise in terms of the component parameters and
modulator output PSD would contain numerous spurious noise sources in the PLL. As in the case of an integer-Af PLL
tones. Since the AS modulators used in AD fractional-iV PLLs the model provides an accurate tool with which to predict the
are all-digital devices, the noise must be added digitally. As total phase noise, bandwidth, and stability of the PLL.
shown in [12], it is sufficient to add a 1-bit, sub-LSB, inde-
pendent, white noise dither sequence with zero mean at the A. Derivation of a AS fractional-N PLL L inearized Model
input node. In practice, a 1-bit pseudo-random dither se- In PLL analyses it is common to assume that each periodic
quence is typically used in place of a truly random dither se- signal within the PLL has the form v(t) = A(t) sin(wt + 0(t)\
quence. Such a sequence can be generated easily using a lin- where A(t) is a positive amplitude function, co is a constant
ear feedback shift register, and has the desired result with re- center frequency in radians/sec, and 6{f) is zero-mean phase
spect to the quantization noise despite not being truly random noise in radians. In most cases of interest for PLL analysis,
[13], [14]. the amplitude is well modeled as a constant value, and the
phase noise is very small relative to TC with a bandwidth that is
C. Other AS Modulator Options much lower than the center frequency. Solving for the time of
To this point, the AS modulation concept has been illus- the w* positive-going zero crossing, yn, of v(t) gives yn = [n -
trated via the particular example AS modulator architecture 0(yn)/(2n)]'T, where T = litlco is the period of the signal.
shown in Figure 8, namely a second-order multi-bit AS modu- Therefore, the sequence, yn, is a sampled version of the phase
lator. While this type of AS modulator is widely used in AS noise with very little aliasing, so knowing the sequence and T
fractional-N PLLs, there exist other types of AS modulators is approximately equivalent to knowing the phase noise. This
that can be applied to AS fractional-N PLLs. Most of the approximation is made throughout the following analysis.
other architectures are higher-order AS modulators that per- The relationship between the charge pump output current
form higher than second-order quantization noise shaping, and the PFD input signals is shown in Figure 10. Ideally, dur-

27
ing the w* reference period the charge pump output is a cur-
rent pulse of amplitude / or - / and duration \tn - rn\, where /„
and rn are the times of the charge pump output transitions trig-
gered by the positive-going edges of the divider output and
reference signal, respectively. Therefore, the average current
sourced or sunk by the charge pump during the «* reference
period is /•(/„ - rn)ITnf. In practice, the PFD is usually de-
signed such that, except for a possible constant offset, this
result holds even though the current sources have finite rise
and fall times [2]. Figure 11: The A£ fractional-N PLL linearized model. Except for the shaded
region the model is identical to the corresponding integer-Af PLL model.
The first step in deriving the model is to develop an ex-
pression for /„ — xn. Ideally, rn = nTKf, but phase noise intro- tions in (5) can be neglected, and the charge pump output can
duced by the reference source and PFD cause it to have the be modeled as a smoothly varying function of time with an
form average value over each reference period equal to that of (5).
With these approximations, (5) implies that
T = nT
" * "fj[MO+*m>(O]» (3) ' "m(j)-Kcoum{t)-kycoUVc^t)-vml)dt~^l
where 6reJ(j) and OPFDO) a r e the reference source and PFD LO) = I - lE

phase noise functions, respectively. If the VCO output were cp
N+a
ideal its positive-going edges would be spaced at uniform in- (6)
1
tervals of Tnf I (N + a), where a is the fractional part of the 0
^,v(') , ~AO , &pFD(t)
modulus (e.g., a = 51/492 in Figure 5). Therefore, ideally,
2n + 2n + 2n

^v^i{N+m'
but in practice it deviates because of VCO phase noise,
where ujj) is the result of discrete-time integrating and con-
verting to continuous-time the quantity, y[n] — a.
OyccAi), divider phase noise, #</,v(*X and instantaneous devia- The AL fractional-N PLL linearized model follows directly
tions of the VCO control voltage from its ideal average value from (6) and Figure 10. It is shown in Figure 11, where in(t)
of
v^, =(N+a)l(TrefKyco), where Kvco is the VCO gain in represents the noise contributed by the charge pump current
units of Hz/Volt. As a result, sources and the loop filter, and z//s) is the transfer function of
the loop filter. The model specifies the phase noise transfer
<.-iyg<"+**j)-^-(v-<<>-v,)*-^] functions and loop dynamics of the PLL. For example, the
model implies that
T
2n e*.(O, 4 ^ ^
#«/(•*)
+ aJ ) ^ - ,
l + T(s)
and ^ l £ l
eYCO{s)
= _L_
l + T(s)
(7)
which reduces to
where
tn=nTnf +^—\y\(yW-<*) vc
T(s)= °y Y (8)
* N + alh '
-Kco f ( v ^ C ) - ^ ) * - ^ ] (4)is the loop gain of the PLL. For the loop filter shown in Fig-
ure 10, the transfer function is
T
1 z (J, 1 l + sRCt
ref
2n
0*(O- (9)
" C^+C2 s[\ +sRClC2 /(C, + C 2 )]'
Subtracting (3) from (4) yields an expression for the average
current sourced or sunk by the charge pump during the «* B. Differences Between the AS Fractional-N and Integer-N
reference period: PLL Models
I(tn-Tn)/Tre/ =
The shaded region in Figure 11 indicates the part of the
model that is specific to AZ fractional-TV PLLs; except for the
SUtj-^-^fMo-v,,)*-^ shaded region the model is identical to the corresponding
Il _ _ (5)
(5) model for integer-iV PLLs. Therefore, each phase noise trans-
N+a
fer function in an integer-iVTLL is identical to the correspond-
ing phase noise transfer function in a AZfractional-.AfPLL,
g«,C.) , M O , <WO except every occurrence of TV in the former is replaced by N+a
2n In ' In in the latter. In most cases, N» \ and a<\, so N+a~N
and the corresponding transfer functions in integer-N and AE
As mentioned above, the phase noise terms are assumed to fractional-AT PLLs are nearly identical in practice. Similarly,
have bandwidths that are much smaller than the reference fre- the loop dynamics and stability issues are nearly the same in
quency. Consequently, the sampling of the phase noise func- ASfractional-TVPLLs and integer-TV PLLs.

28
^ ^ / w = 2.402 GHz margin, and AS modulator quantization noise suppression.
pum Fiiter The process is demonstrated below for the AS fractional-N
L|QI I pi Debtor H pM r V T r PLL presented in Section III to generate the local oscillator
19.68 MHz frequencies in a direct conversion Bluetooth wireless LAN
| — ±N+y\n\ * transceiver. The PLL is shown in Figure 12 with additional
2nd-Order~| I detail regarding the frequency plan. As described previously,
m/492—*<£>-*> Digital A£ * the desired output frequencies are/pco = 2.402 GHz + k MHz
T 1 Modulator | y\n\ = {-1, 0,1, 2}
for k = 0, ..., 78, and the crystal reference frequency is 19.68
{0,2~i7} pseudo-random bit sequence
MHz. Each of the 79 possible output frequencies is chosen by
Frequency Plan: selecting m and N as indicated in the figure. In each case, the
• Toget* = 0, I , . . . , o r l 8 : set N= 122, m = k-25 + 26 divider modulus is restricted to the set of four integers {N- 1,
• To get* = 19, 2 1 , . . . , or 38: set N= 123, m = (k- 19)25 + 9
• To get * = 39, 4 1 , . . . , or 57: settf= 124, m = (*-39)-25 + 17 N,N+ 1, N + 2}. The combinations of m and N were chosen
• To get k = 58,60,..., or 79: set N= 125, m = (k- 58)25 to achieve the desired output frequencies yet keep the signals
at the input of the AS modulator sufficiently small so as not to
Figure 12: The example A I fractional-N PLL and frequency plan for genera- overload the AS modulator [11].
tion of the Bluetooth wireless LAN RF channel frequencies.
Typical requirements for such a PLL are that the loop
The primary difference between the AS fractional-^ and bandwidth must be greater than 40 kHz, the phase margin
integer-TV PLL models is the signal path corresponding to the must be greater than 60°, and the PLL phase noise be less than
AX modulator shown in the shaded region of Figure 11. The -120 dBc/Hz at offsets from the carrier of 3 MHz and above.
sequence, y[n] - a, consists of AS modulator quantization Assume that the VCO, divider, PFD, and charge pump circuits
noise, e m [«], which, as described previously, gives rise to have been designed such that the overall PLL phase noise
phase error in the PLL output. For the example second-order specification can be met provided the phase noise contributed
AS modulator it follows from the results presented in Section by the AS modulator and loop filter are each less than -130
IV and the AS fractional-AT PLL model equations presented dBc/Hz at offsets from the carrier of 3 MHz and above. Fur-
above that the PLL phase noise component resulting from thermore, assume that the VCO and charge pump circuits are
em[n] has a PSD given by such that Kvco and / are 200 MHz/V and 200 uA, respectively,
and that the loop filter has the form shown in Figure 10. Thus,
the remaining design task is to choose the loop filter compo-
nents such that the bandwidth, phase margin, and phase noise
l0.J-*J2.J*£tf _j_Au(W] dBc/Hz specifications are met.
The PLL phase margin, bandwidth, and phase noise arising
(10) from AS modulator quantization noise can be derived from the
The argument of the log function has the form of a highpass linearized model equations, (7) through (10). While this can
function times a lowpass function, which is consistent with the be done directly, it involves the solution of third order equa-
claim in Section III that the PLL lowpass filters the primarily tions which can be messy. Alternatively, approximate solu-
high frequency quantization noise from the AS modulator. It tions of the equations can be derived that provide better intui-
follows from (10) that the phase noise resulting from em[n] can tion [21]. A particularly convenient set of approximate solu-
be decreased by reducing the PLL bandwidth or increasing the tions are
=tan (11)
reference frequency. If a higher-order AS modulator is used,
an equation similar to (10) results except that the exponent of
the sinusoid is greater than two. This reduces the in-band por-
™ iH)'
_ IKycoR b-\ .
tion of the quantization noise, but increases the out-of-band hw ( }
portion, which, depending upon the loop parameters of the
~ 2nN " b '
PLL, can result in a somewhat lower overall phase noise.
However, the PLL loop filter is highly constrained to maintain 2
XJBW
PLL stability, so the phase noise reduction that can be
achieved by increasing the order of the AS modulator is lim- and
ited in most applications [16].
S0 (/)| «l 0 .logf^-sin 2 Mf^T] dBc/Hz,
C. A System Design Example (14)
The PLL bandwidth and the phase margin both depend where PM is the phase margin of the PLL, fBw is the 3 dB
upon the loop gain, 7\s)9 which, for the loop filter shown in bandwidth of the PLL, and b = 1 + C2IC\ is a measure of the
Figure 10, depends upon the parameters fn/, N, /, KVco, ^> C\9 separation between the two loop filter capacitors [22]. The
and C2. Usually,/*,/and //are dictated by the application, and derivations assume that b is greater than about 10, and (14) is
/ and Kyco are, at least partially, dictated by circuit design valid for frequencies greater than (C2+Ci)/(2^RC2Ci).
choices. This leaves the loop filter components as the main These equations are sufficient to determine appropriate
variables with which to set the desired PLL bandwidth, phase loop filter component values. For example, suppose b is set to

29
-60 tion delay depends upon the divider modulus and the number
"Exact" simulation of AI modulator output levels is greater than two, the effect is
-80 Linearized Model that of a hard non-linearity applied to the AI modulator quan-
-100 tization noise. This tends to fold out-of-band AI modulator
quantization noise to low frequencies and introduce spurious
N -120 tones, which can significantly increase the PLL phase noise.
The problem is analogous to that of multi-bit digital-to-analog
-140
O
CQ
converter step-size mismatches in analog AI data converters
•o
-160 [23]. Unfortunately, circuit simulations are required to evalu-
ate the severity of the problem on a case by case basis as both
-180 the extent of any modulus-dependent delays and their affect
-200 on the PLL phase noise are difficult to predict using hand
analysis.
-220 There are two well-known solutions to this problem. One
105 106u 10? 108 solution is to resynchronize the divider output to the nearest
Hz
VCO edge or at least a higher-frequency edge obtained from
FigureB: Simulated and calculated PSD plots of the phase noise arising from
A I modulator quantization noise for the example A I fractional-N PLL. within the divider circuitry [22], [24]. The ^synchronization
erases memory of modulus-dependent delays and noise intro-
49, so, as indicated by (11), the phase margin is approximately duced within the divider circuitry, but care must be taken to
70°. Solving (14) with the phase noise set to -130 dBc/Hz a t / ensure that the signal used for resynchronization is itself free
= 3 MHz indicates t h a t y ^ ~ 50 kHz. Therefore, the phase of modulus dependent delays. The primary drawback of the
noise resulting from AI modulator quantization noise is suffi- approach is that it increases power consumption.
ciently suppressed with a 50 kHz bandwidth and a phase mar- The other solution is to use a AI modulator with single-bit
gin of 70°. With this information (12) can be solved to find R (i.e., two level) quantization. In this case, modulus-dependent
= 960 Q. with which (13) and the definition of b can be used to delays give rise to phase error at the output of the divider that
calculate C2 = 23 nF and C\ = 480 pF. It is straightforward to consists of a constant offset plus a scaled version of the AI
verify that the phase noise introduced by the loop filter resistor modulator quantization noise. Since, by design, the AI modu-
(the only noise source in the loop filter) is well below -130 lator quantization noise has most of its power outside the PLL
dBc/Hz at offsets from the carrier of 3 MHz and above as re- bandwidth, the modulus-dependent delays increase the phase
quired. noise only slightly. Unfortunately, AI modulators with single-
Figure 13 shows PSD plots of the phase noise arising from bit quantization tend not to perform as well as AI modulators
AI modulator quantization noise for the example PLL with the with multi-bit (i.e., more than two-level) quantization. For
loop filter component values derived above. The heavy curve example, if the 9-level quantizer in the 48 Msample/s AI
was calculated directly from the linearized model equations modulator example presented in Section IV were replaced by
(7) through (10). The light curve was obtained through a a one-bit quantizer, the dynamic range of the AI modulator in
behavioral computer simulation of the PLL. As is evident the zero to 500 kHz band would be reduced from 88.5 dB to
from the figure, the two curves agree very well which suggests approximately 65 dB. Moreover, unlike the 9-level quantizer
that the approximations made in obtaining the linearized case, the additive noise from the single-bit quantizer would
model are reasonable. not be white and would be correlated with the input sequence.
An effect that does not have a counterpart in integer-Af Its variance would be input dependent and it would contain
PLLs is the presence of zeros in the PSD of the phase noise spurious tones.
arising from AI modulator quantization noise at multiples of These problems can be mitigated by using a higher-order
the reference frequency. These zeros are a result of the dis- AI modulator architecture to more aggressively suppress the
crete-to-continuous-time conversion of the AD modulator in-band portion of the additive noise from the two-level quan-
quantization noise; each zero is a sampling image of the dc tizer. However, to maintain stability in a higher-order AI
zero imposed on the quantization noise by the AI modulator. modulator with single-bit quantization, the useful input range
of the AI modulator input signal must be reduced and more
VI. AS FRACTIONAL-TV PLL SPECIFIC PROBLEMS poles and zeros must be introduced within the feedback loop
as compared to a multi-bit design with a comparable dynamic
One of the most significant problems specific to AI frac- range. Even then, the problem of spurious tones persists, and
tional-AT PLLs is that they can be sensitive to modulus- it is difficult to predict where they will appear except through
dependent divider delays. In practice, each positive-going extensive simulation. Furthermore, to compensate for the
divider edge is separated from the VCO edge that triggered it restricted input range of the AI modulator the reference fre-
by a propagation delay. Ideally, this propagation delay is in- quency must be large enough that all of the desired PLL out-
dependent of the corresponding divider modulus, in which put frequencies can be achieved. This can severely limit de-
case it introduces a constant phase offset but does not other- sign flexibility. For example, if the magnitude of the AI
wise contribute to the phase noise. However, if the propaga- modulator input signal were limited to less than 0.5 in the case

30
of the Bluetooth local oscillator application considered above, chip transmitter. Furthermore, the modulation index of the
the reference frequency would have to be greater than 79 transmitted signal depends upon the absolute tolerances of the
MHz. Otherwise, it would not be possible to generate all the VCO components which are often difficult to control in low-
Bluetooth channel frequencies. cost VLSI technologies and can also drift rapidly over time.
Another issue specific to AI fractional-TV PLLs is that In principle, AI fractional-TV PLLs can avoid these prob-
modulus switching increases the average duration over which lems by modulating the VCO within the PLL. This can be
the charge pump current sources are turned on each period done by driving the input of the digital AS modulator with the
relative to integer-TV PLLs. For comparison, consider a AI desired frequency modulation of the transmitted signal. The
fractional-TV PLL and an integer-TV PLL with the same TV primary limitation is that bandwidth of the PLL must be nar-
(where TV » a), the samey^/, and identical loop components. row enough that the quantization noise from the AI modulator
It follows from (5) that is sufficiently attenuated, but sufficiently high to allow for the
modulation. For instance, the phase noise PSD of the example
ADfractional-TVPLL shown in Figure 5 with a 50 kHz loop
bandwidth meets the necessary phase noise specifications
(15)
when used as a local oscillator in a conventional upconversion
The last term in (15), which is caused by having the AI modu-
stage within a Bluetooth wireless LAN transmitter. However,
lator switch the divider modulus, represents a significant in-
if the Bluetooth transmitter is to be implemented by modulat-
crease in the time during which the charge pump current
sources are turned on each reference period. Consequently, ing the VCO through the digital AI modulator, then the loop
the phase noise arising just from charge pump current source bandwidth of the PLL must be approximately 500 kHz. Un-
noise is larger in the AIfractional-TVPLL by fortunately, when the loop bandwidth of the fractional-// PLL
shown in Figure 5 is widened to 500 kHz, the resulting phase
T Averagefractional-TVPLL charge pump "on time"! noise becomes too large to meet the Bluetooth transmit re-
L Average integer-TV PLL charge pump "on time" J quirements.
where A is a constant between 10 and 20. The value of A de- Nevertheless, commercial transmitters with VCO modula-
pends upon the autocorrelation of the charge pump current tion through A I fractional-TV synthesizers are beginning to be
source noise. For example, if the current source noise in suc- deployed, especially in low-performance, low-cost wireless
systems such as Bluetooth wireless LANs [28]. Facilitating
cessive charge pump pulses is completely uncorrelated, then A
this trend are various solutions that have been devised in re-
is 10. Near the other extreme, A is close to 20.
cent years to allow for wideband VCO modulation in AI frac-
tional-TV PLLs without incurring the phase noise penalty men-
VII. TECHNIQUES TO WIDEN AE FRACTIONAL-N tioned above. One of the solutions is to keep the loop band-
PLL LOOP BANDWIDTHS width relatively low, but pre-emphasize (i.e., highpass filter)
the digital phase modulation signal prior to the digital AI
A transmitter with virtually any modulation format can be
modulator [29]. Unfortunately, this approach requires the
implemented using D/A conversion to generate analog base-
highpass response of the digital pre-emphasis filter to be a
band or IF signals and upconversion to generate the final RF
reasonably close match to the inverse of the closed-loop filter-
signal. However, many of the commonly used modulation
ing imposed by the largely analog PLL. Another of the solu-
formats in wireless communication systems such as MSK and
tions is to use a high-order loop filter in the PLL with a sharp
FSK involve only frequency or phase modulation of a single
lowpass response [30]. Increasing the order of the loop filter
carrier [25]. In such cases, the transmitted signal can be gen-
increases the attenuation of out-of-band quantization noise
erated by modulating a radio frequency (RF) VCO, thereby
which allows for higher-order AI modulation to reduce in-
eliminating the need for conventional upconversion stages and
band quantization noise thereby allowing the loop bandwidth
much of the attendant analog filtering. At least two ap-
to be increased without increasing the total phase noise.
proaches have been successfully implemented in commercial
However, as described in [30], this necessitates the use of a
wireless transmitters to date. One is based on open-loop VCO
Type 1 PLL which significantly complicates the design of the
modulation, and the other is based on AI fractional-TV synthe-
phase detector. Yet another solution is to use a narrow loop
sis.
bandwidth but modulate the VCO both through the digital AI
An example of a commercial transmitter that uses the modulator and through an auxiliary modulation port at the
open-loop VCO modulation technique is presented in [26] and VCO input [28]. The idea is to apply the low-frequency
[27], in this case for a DECT cordless telephone. Between modulation components at the AI modulator input and the
transmit bursts, the desired center frequency is set relative to a high frequency modulation components directly to the VCO.
reference frequency by enclosing the VCO within a conven- Again, matching is an issue, but it has proven to be manage-
tional PLL. During each transmit burst the VCO is switched able at least for low-end applications such as Bluetooth trans-
out of the PLL and the desired frequency modulation is ap- ceivers.
plied directly to its input. The primary limitation of the ap-
proach is that it tends to be highly sensitive to noise and inter- VIII. CONCLUSION
ference from other circuits. For example, in [27], the required
level of isolation precluded the implementation of a single- The additional concepts and issues associated with AI

31
fractional-^ PLLs for frequency synthesis relative to integer-Af ory, vol. 38, no.3, pp.1015-1028, May 1992.
PLLs have been presented. It has been shown that AI frac- 12. I. Galton, "One-bit dithering in delta-sigma modulator-
tionak/V PLLs provide tuning resolution limited only by digi- based D/A conversion," Proc. of the IEEE International
tal logic complexity, and, in contrast to integer-^ PLLs, in- Symposium on Circuits and Systems, 1993.
creased tuning resolution does not come at the expense of re-
13. S. W. Golomb, Shift Register Sequences. Laguna Hills,
duced bandwidth. Since one of the main innovations in a AS CA: Aegean Park Press, 1982
fractional-^ PLL is the use of a AE modulator to control the
14. E. J. McCluskey, Logic Design Principles. Englewood
divider modulus, the relevant concepts underlying AI modula-
Cliffs, NJ: Prentice-Hall, 1986.
tion have been described in detail. A linearized model has
been derived from first principles and a design example has 15. S. K. Tewksbury, R. W. Hallock, "Oversampled, linear
been presented to illustrate how the model is used in practice. predictive and noise-shaping coders of order N >1,"
Techniques for wideband digital modulation of the VCO IEEE Transactions on Circuits and Systems, vol. CAS-
within a delta-sigma fractional-TV PLL have also been pre- 25, pp. 436-447, July 1978.
sented. 16. W. Rhee, B. S. Song, A. AH, "A 1.1-GHz CMOS frac-
tional-N frequency synthesizer with a 3-b third-order AI
modulator," IEEE Journal of Solid-State Circuits, vol.
ACKNOWLEDGEMENTS
35, no. 10 , pp. 1453-1460, October 2000.
The author is grateful to Sudhakar Pamarti, Eric Siragusa, 17. W. L. Lee, C. G. Sodini, "A topology for higher order
and Ashok Swaminathan for their helpful discussions and ad- interpolative coders," Proceedings of the 1987 IEEE In-
vice regarding this paper. ternational Symposium on Circuits and Systems, vol. 2,
pp.459-462, May 1987.
18. K. C.-H. Chao, S. Nadeem, W. L. Lee, C. G. Sodini, "A
REFERENCES higher order topology for interpolative modulators for
oversampling A/D converters," IEEE Transactions on
1. P. M. Gardner, "Charge-pump phase-lock loops," IEEE Circuits and Systems, vol. 37, no.3, p.309-318, March
Transactions on Communications, vol. COM-28, pp. 1990.
1849-1858, November 1980. 19. Y. Matsuya, K. Uchimura, A. Iwata, T. Kobayashi, M.
2. B. Razavi, Design of Analog CMOS Integrated Circuits, Ishikawa, T. Yoshitome, "A 16-bit oversampling A-to-D
McGraw Hill, 2001. conversion technology using triple integration noise
3. Bluetooth Wireless LAN Specification, Version 1.0, shaping," IEEE Journal of Solid-State Circuits, vol. SC-
2000. 22, pp. 921-929, December 1987.
4. U. L. Rohde, Microvave and Wireless Synthesizers The- 20. K. Uchimura, T. Hayashi, T. Kimura, A. Iwata, "Over-
ory and Design, John Wiley & Sons, 1997. sampling A-to-D and D-to-A converters with multistage
5. B. Miller, B. Conley, "A multiple modulator fractional noise shaping modulators," IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. AASP-
divider," Annual IEEE Symposium on Frequency Con-
36, pp. 1899-1905, December 1988.
trol, vol. 44, pp. 559-568, March 1990.
6. B. Miller, B. Conley, "A multiple modulator fractional 21. J. Craninckx, M. S. J. Steyaert, "A fully integrated
divider," IEEE Transactions on Instrumentation and CMOS DCS-1800 frequency synthesizer," IEEE Journal
Measurement, vol. 40, no. 3, pp. 578-583, June 1991. of Solid-State Circuits, vol. 33, pp. 2054=2065, Decem-
ber 1998.
7. T. A. Riley, M. A. Copeland, T. A. Kwasniewski,
"Delta-sigma modulation in fractional-N frequency syn- 22. S. Pamarti, "Techniques for Wideband Fractional-Af
thesis," IEEE Journal of Solid-State Circuits, vol. 28, no. Phase-Locked Loops," PhD Dissertation, University of
5, pp. 553-559, May, 1993. California, San Diego, 2003.

8. S. K. Tewksbury, R. W. Hallock, "Oversampled, linear 23. S. R. Norsworthy, R. Schreier, G. C. Temes, Eds. Delta-
predictive and noise-shaping coders of order N >1," SigmaData Converters, Theory, Design, and Simulation,
IEEE Transactions on Circuits and Systems, vol. CAS- New York: IEEE Press, 1997.
25, pp. 436-447, July 1978. 24. L. Lin, L. Tee, P. R. Gray, "A 1.4 GHz differential low-
9. G. Lainey, R. Saintlaurens, P. Serin, "Switched-capacitor noise CMOS frequency synthesizer using a wideband
second-order noise-shaping coder," IEE Electronics Let- PLL architecture", IEEE ISSCC Digest of Technical Pa-
ters, vol. 19, pp. 149-150, February 1983. pers, pp. 204-205, Feb. 2000.
10. I. Galton, "Granular quantization noise in a class of 25. J. G. Proakis, Digital Communications, fourth ed.,
delta-sigma modulators," IEEE Transactions on Infor- McGraw Hill, 2000.
mation Theory, vol. 40, no. 3, pp. 848-859, May 1994. 26. S. Heinen, S. Beyer, J. Fenk, "A 3.0 V 2 GHz transmitter
11. N. He, F. Kuhlmann, A. Buzo, "Multiloop sigma-delta IC for digital radio communication with integrated
quantization," IEEE Transactions on Information The- VCO's," Digest of Technical Papers, IEEE International
Solid-State Circuits Conference, vol. 38, pp. 150-151,

32
Feb. 1995. mW CMOS fractional-N synthesizer using digital com-
27. S. Heinen, K. Hadjizada, U. Matter, W. Geppert, V. pensation for 2.5-Mb/s GFSK modulation," IEEE Jour-
Thomas, S. Weber, S. Beyer, J. Fenk, E. Matshke, "A 2.7 nal of Solid-State Circuits, vol. 32, no. 12, pp. 2048-
V 2.5 GHz bipolar chipset for digital wireless communi- 2059, Dec. 1997.
cation," Digest of Technical Papers, IEEE International 30. S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek,
Solid-State Circuits Conference, vol. 40, pp. 306-307, B. McFarland, "An integrated 2.5GHz LA frequency
Feb. 1997. synthesizer with 5 ns settling and 2Mb/s closed loop
28. N. Filiol, et. al., "A 22 mW Bluetooth RF transceiver modulation," Digest of Technical Papers, IEEE Interna-
with direct RF modulation and on-chip IF filtering," Di- tional Solid-State Circuits Conference, vol. 43, pp. 200-
gest of Technical Papers, IEEE International Solid-State 201, Feb. 2000.
Circuits Conference, vol. 43, pp. 202-203, Feb. 2001.
29. M. H. Perrott, T. L. Tewksbury III, C. G. Sodini, "A 27-

Ian Galton received the Sc.B. degree from Brown


University in 1984, and the M.S. and Ph.D. degrees
from the California Institute of Technology in 1989
and 1992, respectively, all in electrical engineering.
Since 1996 he has been a professor of electrical
engineering at the University of California, San
Diego where he teaches and conducts research in the
field of mixed-signal integrated circuits and systems
for communications. Prior to 1996 he was with UC
Irvine, the NASA Jet Propulsion Laboratory, Acu-
son, and Mead Data Central. His research involves the invention, analysis,
and integrated circuit implementation of key communication system blocks
such as data converters, frequency synthesizers, and clock recovery systems.
The emphasis of his research is on the development of digital signal process-
ing techniques to mitigate the effects of non-ideal analog circuit behavior with
the objective of generating enabling technology for highly integrated, low-
cost, communication systems. In addition to his academic research, he regu-
larly consults at several communications and semiconductor companies and
teaches portions of various industry-oriented short courses on the design of
data converters, PLLs, and wireless transceivers. He has served on a corpo-
rate Board of Directors and several corporate Technical Advisory Boards, and
his is the Editor-in-Chief of the IEEE Transactions on Circuits and Systems II:
Analog and Digital Signal Processing.

33
Designing Bang-Bang PLLs for Clock and Data
Recovery in Serial Data Transmission Systems
Richard C. Walker

Abstract - Clock recovery using phase-locked loops (PLL) coaxial delay lines for setting the timing of the recovered sampling
with binary (bang-bang) or ternary-quantized phase detectors clock with respect to the data eye [1].
has become increasingly common starting with the advent of
fully monolithic clock and data recovery (CDR) Circuits in the Early monolithic CDR designs imitated these discrete block
late 1980's. Bang-bang CDR circuits have the unique advan- diagrams. The propagation delay differences between data and
tages of inherent sampling phase alignment, adaptability to clock paths could be ignored as long as the gate delay skew was a
multi-phase sampling structures, and operation at the highest negligible fraction of the total bit time, or unit interval. The need
speed at which a process can make a working flip-flop. This for higher link speeds grew faster than Moore's law, and as clock
paper gives insight into the behavior of the nonlinear bang- frequencies approached the effective fT of the active devices, it
bang PLL loop dynamics, giving approximate equations for became increasingly difficult to maintain an optimum sampling
loop jitter, recovered clock spectrum, and jitter tracking per- phase alignment between the recovered clock and the data over
formance as a function of various design parameters. A novel process, temperature, data-rate, and voltage variations.
analysis shows that the bang-bang loop output jitter grows as
the square-root of the input jitter as contrasted with the linear A second problem was that most linear phase detectors pro-
dependence of the linear PLL. duced narrow pulses with widths proportional to the phase error
between the timing of the data and the clock [2], [3]. These narrow
I. INTRODUCTION pulses required a process speed in excess of that required to sim-
ply sample data at a given rate. The timing skew and speed of lin-
Prior to the advent of fully monolithic designs, clock recovery ear phase detector circuits then became the limiting factor for
was traditionally performed with some variant of the circuit in Fig. aggressive designs.
1. The clock frequency component was typically extracted from
Variable Delay Block 4 Retiming Latch
Both these difficulties are eliminated by a family of circuits
which simultaneously retime data and measure phase error by
nput NRZ Data Retimed Data using matched flip-flops to sample both the middle of each data bit
X D Q
/\ and the transitions between the data bits. Fig. 2 shows such an
/ samples of master transitions-,
BPF Recovered Clock
d samples of all transitions
X2 or
dt PLL Input t D Q
D Q —•
Data A X A
pulse
conditioning
frequency
extraction Y 2 vco

Fig. 1. Traditional non-monolithic clock and data recovery architec- jt divide by 20 loop
filter
V Y
ture. D Q — • Retimed Data

the data stream using some combination of differentiation, rectifi-


cation and filtering. The bandpass frequency filtering was pro- Fig. 2. A simple bang-bang loop using a flip-flop for a phase detector
vided by LC tank, surface acoustic wave (SAW) filter, dielectric to lock onto a data stream with a guaranteed "0" to " 1 " transition
resonator or PLL. Because the clock recovery path was separate every 20 bits.
from the data retiming path, it was difficult to maintain optimum
sampling phase alignment over process, temperature, data-rate,
and voltage variations. Even the PLL techniques had the drawback early gigabit-rate monolithic example of such a circuit [4] which
of using phase detectors with different set-up times than the retim- samples data with two matched flip-flops. Flip-flop "Y" samples
ing flip-flop so that the recovered clock was not intrinsically the middle of each data bit on the rising edge of the VCO clock to
aligned to the optimum sampling point in the data eye. Circuits produce retimed data, while flip-flop "X" samples the transition of
utilizing SAW resonator filtering typically required hand matching each bit using the falling edge of the VCO clock.
of SAW and circuit temperature coefficients along with custom cut
The loop is designed to use the 16B/20B line code of Fig. 3
which guarantees a " 0 1 " "master transition" every 20 bits. The
R. Walker is with Agilent Laboratories, 3500 Deer Creek Road, MS divide by 20 circuit and associated flip-flop in Fig. 2 discard every
26-U4, Palo Alto CA 94304. (e-mail: rick_walker@labs.agilent.com).

34
0.8 : : [38](8x)^j| : J30J(4x)
0.6 [SOpx) 7 :
training sequence : : [361 (4x) ' "
f : g - { 3 5 J ••••; # ••• [ 1 9 l ( 1 O x >
0.4
i I T> 2 9 1 f13K10x> * *[27]
16 data data 0.2

means: 16 data
means: 16 data
means: Training Sequence
means: Control Word
I 0.1
0.08
0.06

0.04

0.02
*
[25J
^
:
fo-n

lij*i"
i
i (15](2X)-" / ? ^ ( 2 X ) £ H 3 3 ]
i
j .PI]
: 1261(8*) B . i [22] !- Q . . .

Trck] * H 7 3 i
i i
j ["]

i
l J

I o
: [32]

:
Linear PLL
BBPLL
I

• Master Transition 1121.


0.01
1988 1990 1992 1994 1996 1998 2000 2002
Fig. 3. Format of 16B/20B line code used with bang-bang CDR year of publication
of.Fig. 2.
Fig. 4. CDR PLL designs over time. The ratio of link speed to
transition sample except for this master transition sample. During effective process transit frequency is plotted vs year of publication.
link start-up a training sequence is sent that has only one rising Multi-phase BB PLLs predominate as data rate approaches the pro-
transition at the location of the master transition. Once the loop is cess transit frequency limit. (The number of retiming phases used
locked, arbitrary data is allowed to be sent at the other 18 bits of in each design is given in parentheses.)
the frame, while the transition sampler pays attention only to the
data stream in the vicinity of the master transition. If the VCO fre-
quency is too high, the transition flip-flop starts sampling prior to II. FIRST-ORDER LOOP DYNAMICS
the master transition and outputs a "0" to the loop filter. A slightly
lower VCO frequency, on the other hand, will cause the loop to be Unfortunately, transition-sampling flip-flop-based phase detec-
driven by l's. tors can provide only binary (early/late) or ternary (early/late +
hold) phase information. This amounts to a hard non-linearity in
The loop drives the falling edge of the VCO into alignment the loop structure, leading to an oscillatory steady-state and ren-
with the data transitions based on the binary-quantized phase dering the circuit unanalyzable with standard linear PLL theory.
error. Because the clock-to-Q delay of the retiming flip-flop is Precise loop behavior can be simulated efficiently with time-step
monolithically matched with the phase detector flip-flop, the PLL simulators, but this is cumbersome to use for routine design. For-
aligns the recovered clock precisely in the middle of the data eye tunately, simple approximate closed-form expressions can be
with nofirst-ordertiming skew over process and temperature vari- derived for performance parameters of interest, such as loop jitter
ations. Because the narrowest pulse is the output of a flip-flop, generation, recovered clock spectrum, and jitter tracking perfor-
such detectors operate at the full speed at which a process is capa- mance as a function of various design parameters.
ble of building a functioning flip-flop. This ensures that the phase
detector will not be the limiting factor in building the fastest possi- VCO
ble retiming circuit.

An additional advantage of flip-flop-based phase detectors is


that since they only require simple processing of digital values, 'update

they easily generalize to multi-phase sampling structures allowing


CDR operation at frequencies in which it would be impossible to
Fig. 5. A simple bang-bang loop using aflip-flopfor a phase detectoi
build a working full-speed flip-flop. In contrast, most linear phase
to lock onto square-wave input.
detectors require at least some analog processing at the full bit
rate, limiting process speed and poorly generalizing to multi-phase
sampling architectures. A simple BB PLL is shown in Fig. 5. A flip-flop is used as a
phase detector to lock onto a square wave input signal. Depending
Because of these compelling advantages, the bang-bang loop on whether the VCO phase samples slightly before or after the ris-
has become a common design choice for state-of-the CDR designs ing edge of the input square wave, the flip-flop output is either low
which are pushing the capability of available IC processes. Fig. 4 or high, adjusting the VCO period in such a way as to move the
surveys CDR designs presented from 1988 to 2001 at the Interna- sampling phase error back towards zero. The dynamics of such a
tional Solid State Circuits Conference. Designs are plotted by year binary-quantized loop are equivalent to a data-driven phase detec-
of presentation against each design's ratio of link speed to effec- tor operating on alternating 0,1 data with 100% transition density,
tive fp The majority of current designs utilize a combination of or a master-transition based loop similar to that shown in Fig. 2.
multiphase sampling structures and bang-bang PLLs. In addition, For simplicity, we assume that a valid binary phase determination
all CDRs operating at data rates greater than 0.4 fT are bang-bang can be made at every timestep. The consequence of random data
designs.

35
and the introduction of a ternary hold mode are considered in a frequency detector, these non-uniform sampling times must be
later section. accounted for.

Thefirst-orderBB PLL of Fig. 5 can be rendered into a block With the uniform time step approximation, the VCO phase
diagram for analysis as shown in Fig. 6. The loop phase error changes up or down (or "walks off') by
ra(
®bb ~ ^(fbb^fnom) ^ a n s during each update period.
•w ev ee{±l}
In summary, the first order loop obeys a simple set of discrete
i 1 time difference equations:
s I I P Kv
s

Q
d % tn e
</('») = e</(0) + 2«8// B + <)>('„) w
+ + z
fin = fnom V fvco-fnom fbb
= + £
Fig. 6. Block diagram offirstorder loop showing definition of signal
%(tn+l) Wn) r,Qbt (2)

names. en = sign[e^ M )-e v (^)] 0)


&e(tn) , is defined as the difference between the data phase
As long as the VCO frequency step brackets the input signal
Bd(tn) and the VCO phase 6v(tn) at the nth sampling time frequency error, the loop will remain phase locked. Assuming
tn. For convenience, phase is measured with respect to an ideal <|>(0 small, the lock range is: - / ^ < 8 / < fbb. The loop gen-
erates an excess hunting jitter with a peak-to-peak value of two
clock source running at fnom •
bang-bang phase steps Jpp= 4%(fbb/fnom).

The frequency of the incoming data signal differs from the


For the loop to be locked, the average VCO frequency must
VCO center frequency by 8 / , and has a zero mean phase jitter of
equal the average data frequency. The phase detector duty cycle
<K0 • In other words, the data can be considered to have been
C, must satisfy the relation
generated by a pattern generator clocked on the rising edges of the
jittered clock signal sm[2n(fnom + 8 / > + ty(t)]. The data §/= C(fbb) + (l-C)(-fbb).
phase Qd(tn) is then 2ndftn + Wn)- The value of C is then given by

The phase detector binary-quantizes the loop phase error at c = (l+ df


)
each sampling time to give £ w = s i g n [ 9 e ( / n ) ] . (Note: In the

case of a ternary data-driven phase detector, Zn may be set to 0 The phase detector duty cycle, and therefore its average output
voltage are proportional to the loop frequency error. Fig. 7 shows a
when it is not possible to make a determination of phase error due simulated loop with a range of input frequencies. The loop is
to consecutive identical bits in the data stream. The consequence
of this "hold" state is treated in a later section). The error signal
drives the VCO through an attenuator p , to produce a change in 249O.0
fvGO

/«-•/»
frequency of fbb — P^vco' F r o m t * me *n unt** t * m e *n + 1 ' | /„,„
/.«.-/«
the VCO operates at one of the two frequencies given by 2484.C
Fin
out of lock In lock out of lock
200.0
J nom nJ bb'
•V:
Because the VCO frequency changes on each cycle, the system
has non-uniform sampling times. The time of phase sample e8
-200.0
=
'« + 1 *n + l/(fnom +£
Jb^ •In a ^ ^ CDR
> fbb is 5.0 10.0
time (jiseconds)
15.0

on the order of 0.1% of fnom , so that an analysis assuming uni- Fig. 7. Simulated response offirst-orderPLL to a range of input fre-
form time steps of t ^ate = 1 /fnom is sufficiently accurate quencies.

for most purposes. However, for loop analyses requiring exact "locked" whenever the input frequency is bracketed by the two
charge pump balance, such as wide-range loop pull-in without a VCO frequencies. The rapid alternation between frequencies

36
slightly too high and slightly too low creates a bounded hunting
jitter (Jpp). Proportional (BB) branch

VCO
The derivative of the input data phase deviation, d[$(t)]/dt,
v$ P
adds to the frequency error that must be tolerated by the loop. D Q Xs
Assuming 8 / = 0 , then for <|)(f) = Asin(2Kfmodt), the
>
maximum amplitude A of phase modulation at frequency fmod

before onset of slew-rate limiting is \f^A/ f mod • Fig- 8. demon- Integral branch

Fig. 9. Second-order bang-bang loop schematic.


2490.0
•vco tered on the average incoming data frequency. If certain assump-
tions are met, as described later, we can consider the system to be
fin
2486.CH
composed of two non-interacting loops. These are the loops
labeled "bang-bang branch" and "integral branch" If the center
Mb
0.0
°v. frequency control loop is slow enough, the resulting loop behavior
will be very similar to a simple first-order loop, but with an
-200.0 extended frequency lock range.
0.0
ee A. Stability Factor
-100.0
5.0 6.0 7.0 8.0 To preserve the desirable qualities of the first order loop, it is
time (^seconds) critical that the phase change due to the proportional branch domi-
Fig. 8. Simulated response of first-order PLL to sinusoidal input jitter nate over the phase change from the integral branch.
just slightly beyond the tracking capability of the loop.
The loop phase change in one update time due to the propor-
tional connection is AQbb = (5 V±K tupdate. The phase change
strates the loop at the onset of jitter-induced slew-rate limiting.
Although the average input frequency lies within the lock range of
the loop, the added sinusoidal jitter causes the instantaneous input due to the integral branch is A 9 / r t / = V^JL f upda%e/{!%) . The

frequency deviation to exceed i / ^ £ • The loop stops toggling and ratio of these two is the stability factor of the loop
goes into slew rate limiting, leading to a transient phase error. qt __ ^proportional __ 2pT ^
A. Summary of First-Order Loop ^^integral 'update
The reader should be careful not to confuse the bang-bang loop
The first-order bang-bang loop has only one degree of freedom. stability factor t, with the linear loop damping factor £ [5].
Jitter generation, lock range, and jitter tolerance are all inconve-
niently controlled by one parameter, fbb. This situation can be The discrete time difference equations for the second-order
improved by using a second control loop to dynamically adjust the loop can be written as
nominal VCO frequency fnom to be equal to the incoming data
Qd(tn) = Qd(0) + 2n8ftn + Wn) M
frequency. Because the phase detector duty cycle is proportional to
the loop frequency error, this dynamic centering of VCO fre- e (4)
( « 2" 1
quency can be accomplished by adjusting the VCO center fre- 8A+i> = W + ^ +^ + ^ J
quency in a feedback loop to drive the phase detector duty cycle
C to 50%. This decouples the lock range from jitter tolerance and
jitter generation, giving more design freedom. en = sign[e rf (/ n )-9 v (g] (3)

III. SECOND-ORDER LOOP DYNAMICS


From this, it can be seen that the second-order loop has two
To extend the loop tracking range independent of the jitter gen- degrees of freedom, the loop phase step Qbb (or equivalently, the
eration, an extra integrator is added between the phase detector
and the VCO as in Fig. 9. Since the first-order loop dynamic pro- loop frequency step fbb ) and the stability factor £ . The added
duces a phase detector duty cycle proportional to the loop fre- loop integrator extends the frequency tracking range, leaving
quency error, this added integrator can be viewed as an automatic 6 ^ free to control jitter tolerance and jitter generation.
means for keeping the first-order portion of the loop properly cen-

37
B. Simulations of Second-Order Loop ited to ^fbb, then there is no jitter accumulation or phase tran-

Fig. 10 shows two block diagrams for the second-order loop. sient at the sampling flip-flop.
The upper diagram is a straightforward translation of the sche-
matic in Fig. 9. The lower diagram is a topological re-arrangement
r
2480 0 1 ! ! ! ! ! ? T ! ! ! ! ! ! !
400.0 , , , , _
!_ 1 jpTH i i Si 0.0 , fi * * i £*"i^ • ^^ , i i , , |
I | ? ! I j ye r^*\ I I ! ?
2.0
tT i" H l i i i i i i i i i i i i i i I

[ i i I i i i i i i i i i i i I
W 6
tint , v rElh II 4.0 5.0 6.0 7.0
f time (^seconds)
-!!^z)J7L^ KV = Fig. 12. Second-order loop response to instantaneous frequency
.-• L—I* ..-' J \K '—' '—' step larger than fbb .
Af A6i A92 0 e tp V^
Fig. 10. Two equivalent second-order bang-bang loop block diagrams.
The proportional phase-control signal flow is highlighted with a Fig. 12 is a simulation in which the input frequency step is big-
dashed line, and the integral frequency-control loop with a solid line. ger than fbb, so the loop goes into slew rate limiting, leading to a

transient phase error Qe at the sampler.


which places an inner first-order phase tracking loop inside an
outer frequency tracking loop. If one writes the transfer function C. Response to Phase Step
from the output of the non-linear quantizer block back to the input
of the quantizer, it can be shown that both diagrams are exactly
equivalent. Some of the signals in the second diagram do not cor- For a normalized transient phase step of A = G ^ p / Q ^ , a
respond to actual physical variables in the circuit, but they are first-order loop relocks in A update times. The total time for
helpful in understanding the operation of the loop.
relocking is then ^step/(2nfbb) .

During the relocking transient of the second-order loop, the


loop integrator overshoots the correct steady-state VCO tune volt-
age. This causes a quadratic overshoot in the phase trajectory.

Fig. 13 shows the second-order phase step response with ^ as


2.0 i . . • . - . . . . . . i . . 1
a parameter. Up to the first zero crossing, the phase trajectory is
I I I I I I I I I I I I I I I I
given by
00
^ IHfMffllfillil
i I I i i i I i I i i I § i I r
4.0 5.0 6.0 7.0 ®bb V
S /
time (^seconds)

Fig. 11. Second-order loop response to instantaneous frequency with n = t/tupdate> The time of the first zero crossing
step smaller than fbb . approaches A as t, —> °o 9 consistent with a first-order loop. In
general, the second-order loop is quicker to reach zero phase error
than the first-order loop, but pays for this with an oscillatory over-
Fig. 11 shows the second-order loop responding to a step
shoot. As a conservative rule of thumb, the magnitude of the oscil-
change in input frequency fin, producing a slow response fint latory transient of a second-order step response can be considered
in the outer integral loop. The resulting phase error A 0 j is bounded by the simple linear transient of the first-order loop. The
time required to reach steady state, given a step of A is always
tracked by the inner bang-bang loop 0 V to produce the final less than or equal to A timesteps, independent of £ .
sampler phase error Qe . Notice that, unlike linear PLLs, if the
power-supply noise-induced VCO frequency modulation is lim-

38
5OO 200.0
400
A91
300
200
^infinite

g«2d00
I\ °-°S -200.0
AG2 ot.tr
CO 100 100.0
9
o v
CD , 0.0.
«
-10O
£-200
-100.0
A0 2 - V
-200
-300 •8 2.0 I
-400 S*20. V6-
'\*z 0.0
-500
0 100 200 300 400 500 600 700 -2.04.0
1 5.0 6.0 7.0
time / typdafe 5 time (jiseconds)
Fig. 13. Noise-free loop response to a phase step with stability
Fig. 15. Second-order loop response to large sinusoidal input jitter.
factor ^ as a parameter.

load so that a loop can be easily designed to never slew for signals
meeting a typicalfrequency-domainjitter tolerance specification.
IV. S L O P E O V E R L O A D

A. Delta-Sigma Analogy
Many systems, such as SONET, specify jitter tolerance in the
form of a sinusoidal jitter at various frequencies.
Before developing an analytic equation for slope overload, it is
100.0 helpful to introduce a further rearrangement of block diagram II
£ e i. from Fig. 10. Fig. 16 transforms the loop by pulling two integra-

-100.0
-AegS 4»(t)'
50.0
%. %. s**(t) fbb
0.0'
TJ
-50.0 5 -A92 fin 1 2fbb
2.0
CE [L X
s s£
SB 0.0
% • tn first order AE on Af
-2.0
4.0 5.0 6.0 7.0
time (n seconds) Fig. 16. Redrawing of the loop to show inner AX inner modulator
Fig. 14. Second-order loop response to sinusoidal input jitter. operating on the loop frequency error.

tors through the last summing node prior to the quantizer. The
update time interval is set to 1. The definition for bang-bang fre-
Fig. 14 shows the loop response with a sinusoidal input phase quency step f^ = $KVV±, and stability factor
jitter <|>(f) . The outer integral loop tracks the input jitter at AGj ^ = 2pT/£ u p ( j a t e are also substituted in.
with a slight phase lag. The resulting phase error A 9 2 is tracked
The shaded area in Fig. 16 shows how the proportional feed-
by the inner bang-bang loop 6 y to produce the final sampler
back loop can be thought of as an inner AX modulator producing
phase error 0^ . The duty-cycle of the PD output F . varies with a phase detector duty cycle proportional to the VCO frequency
error [6],[7].
the slope of A 9 2 which is proportional to the instantaneous fre-
quency error of the outer loop. Fig. 17 summarizes an analysis of the first order delta-sigma
(after [8]). When the loop is not in slew rate limiting, or in a peri-
In Fig. 15, the phase modulation is increased until the instanta- odic limit-cycle, the quantizer (e.g., PD) can be replaced with a
neous frequency error exceeds the inner loop's ability to track. unity gain element and a noise source Q(z) with the same
Slew-rate limiting produces a tracking error at the sampler Qe . A Asin(2ntft/tupda(e)/(2nft/tupdate) noise characteris-
CDR would normally be designed such that slewing would never tics as a random binary bitstream. Both these constraints are met
occur for any valid signal allowed by a particular standard. The in practice as the VCO phase noise is sufficient to eliminate any
next two sections develop an analytic expression for slope over- deterministic limit cycles, and the loop is designed to never slew
rate limit on any conforming input signal. This insight is critical as

39
maximum normalized input phase as a function of normalized fre-
quency

max. v
O7 (s) // 2
s+s 2^^/( 3 2 "\

X(z)
£ H(z)

(integration)
L
Y(z) -V-(( i) ')•
Q(z) This is a curious bootstrapped analysis, in that it assumes a lack
of slewing to justify the linearization which permits the computa-
tion of the onset of slew rate limiting.
r
^-TrmXU)+rniu)Q^
Fig. 18 shows a good agreement between this expression and
simulated loop performance in which slewing is defined as a con-
tiguous sequence of ten or more identical phase-error indications.
c c This expression can be used to design a loop for a given jitter tol-
O) CD
100G
freq freq
1G
•6-0.1; (s2 + s + ?)/(s3 + s2)
Fig. 17. Simplified analysis of delta-sigma circuit.
S=3
10M fctQ.
it allows linear analysis to be applied whenever the bang-bang points shown
loop is not in slew rate limiting. 100k are from numerical

"e-iod simulation

With the AE substitution, the inner loop becomes a wide-band


1k W.999
unity-gain block as seen from the viewpoint of the outer integral 10
frequency control loop. The noise in the delta-sigma core is first-
order frequency shaped towards high frequencies. However, when 0.1
the frequency noise is converted to phase noise, the shaping is lost in 10H 100|A 1m 10m 0.1 1 10

and the noise becomes flat. jitter frequency * t u p d a te

B. Expression for Slope Overload Fig. 18. Normalized amplitude of sinusoidal jitter just sufficient to
cause slope overload as a function of normalized jitter frequency
A closed-form analysis of slope overload can now be derived. and with ^ as a parameter.
Referring to Fig. 16, the system slews when |AF| > f^-
Assuming no slew rate limiting, we can use the results from the erance. The tolerance plots are single-pole slope for high ^ and
AZ analysis to justify replacing the loop quantizer with a unity high jitter frequency, becoming double-pole at lower frequencies
and small i;. At high frequencies, all of the curves become
gain element. The maximum input phase jitter in UI as a function
asymptotic to the single-pole tolerance of a first-order bang-bang
of frequency, O • (s) , normalized to 8 ^ can then be calcu- PLL. The operating region below each of these curves is where the
lated using Laplace transforms. AE approximation is valid, and where a linear loop analysis is
justified.
We want to find an input excitation F(s), for which
|AF| = fbb at all frequencies. The inner AZ of Fig. 16 has a
source I Kv
P + JT output
linearized transfer function of \/{s +f b b ) . Using standard phase I L S
L
noise
feedback loop theory, the expression for AF can then be written i
BB phase noise VCO open loop phase noise
as of form: Asin(x)/x
F\
AF = —m—. Fig. 19. Loop redrawn replacing phase detector with unity gain
1+ ffbbV 1 ^
U* JU+/J element and additive quantization noise.

Setting A F = fbb, and normalizing the equation by letting

fbb and tUpciate = 1 , we can solve for F(s)/s to get the

40
V. JITTER GENERATION VI. GAUSSIAN INPUT NOISE

With these insights, it is possible to accurately predict the loop Fig. 21 is a plot of output jitter vs input jitter with £ as a
jitter generation in the frequency domain. Fig. 19 is a redrawing of
the loop replacing the phase detector by a unity gain element, and 1OM j ; : : : : : —.

an additive noise source. The forward loop gain is

100k '\^"y5\'^' j ^ ^ ^ " ^ ^ ^ ^ ^ ^ ^ $-1*3

From this can be calculated two transfer functions: the lowpass


seen by both the source phase noise and the PD noise to the out-
put, A (5)= 1 / [ 1 + H(s)], and the high-pass transfer function
1 •£+**^£%&rmtrMiu i ^^r i
from VCO phase noise to the output,
B(s)= H(s)/[l + H(s)]. As shown in Fig. 20, with a source 0-1 * * ' ' • »
0.1 1 10 100 1k 10k 100k 1M
phase noise P(s), a PD phase noise Q(s), and a VCO phase noise
R(s), the total loop jitter generation spectrum becomes the RMS
combination of each of the three weighted terms Fig. 21. Normalized output jitter vs input jitter sigma with ^ as a
parameter. Simulation is for a non-tristated loop, with square wave
J(s)= J(PA)2 + (QA)2 + (RB)2 . The source phase noise is
data input, 10 timesteps per point, and ignoring phase wrapping.

-100 ^fff^Z.A 1777(7) ...» TTJHT) .--{ ."--. parameter. For convenience, all jitter sigmas are normalized to
.12o L 1 JL ! 1
N -80 • 1 —IV^^ . . . 1 0££ , the loop phase step size. The total loop output jitter can be
I " 90 TilEnmttiLy.J.y — l-^^^^v.^ii^*' vco phase noise
approximated by three regions of operation:
§ I120 11'" 1""_" 1 ? ii^jii^? 1 *^:!: 111111 • 11 iTTr^r^ •=*> r>n&se noiso_35^ J J + J + J InRe ion J the out
-130 source phase noise TTfoiiaiMV-..
-140 I
I..>K^...I
! ^»»»»~in | Tiihi ii MrtllH
total " idle linear walk • g > P u t J itter
M -80 | ^ . . , . . is independent of input jitter G .. This occurs when the self-gener-
~E -90 - ^ y g ^ ^ i ^ - - - -- { } computed phase noise
75 - 1 0 0 ...^'yiWHUmiWiHL^.,. 1 .1 1... ated hunting jitter exceeds the input jitter. The RMS jitter in this
iS-110 ....A^.:...^w?!...i.i^[.;; s***^..: region is empirically determined to be well approximated by
"O -120 ^^-r-™ measured phase noise ..J-.-.r?*^'
-130 } - * "J^toaaii, ^idle ~ ^ + (1.65/2;) . In Region II, the output jitter is pro-
-140 I I _J I ! • "——
1k 10k 100k 1M 10M 100M 1G portional to the input jitter. This occurs when the input jitter is so
Fig. 20. Example computation of loop jitter generation spectrum high that, for a given £ , the bang-bang dynamic is unable to con-
with parametersfrom[11]. trol the second-order portion of the loop. This leads to large qua-
dratic trajectories in the phase domain, causing the loop phase to
generally taken to be the spectrum of the clock driving the data "hunt" towards the limits of the input jitter distribution. As the
source or BERT, or in the case of a clock multiplying circuit, the loop phase nears the limits of the input jitter distribution, the bang-
spectrum of the reference clock corrected by 20 times the log of bang hunting has more effect on stabilizing the second-order loop.
the loop frequency multiplication ratio. In this region, the output jitter is proportional to the input jitter:

The phase noise power is given by JUn * 2 a . / ( 1 + 7 | ) -In Region III, the output RMS jitter

ls a
™max
^walk PP r o x i m a t e l Y ec ual t 0
l 0-7 * J<5j . This surprising
S
RMS = J W - result says that loops with large ^ have output jitter which grows
0
as the square root of the input jitter. Contrast this with a linear PLL
The RMS jitter in unit intervals is then which simply low-pass filters the input jitter and thus has an out-
J
RMS = atan
(^M^) /TC
• put jitter which grows linearly with the input jitter.
An approximate analysis of loop jitter can shed light on this curi-
It should be noted that the linearized loop model is only suit- ous square-root dependence of output jitter on input jitter. Assume
able for computation of the jitter spectrum but not for computing a zero-mean input jitter distribution with a sigma G -. Using a lin-
the actual sampling point phase error or other time-domain tran- earized approximation to the standard probability distribution
sient response. The linearized response only covers the dynamics
function, the probability of getting an "early" phase error indica-
of the outerfrequencytracking loop, but does not capture the extra
tion for small loop phase deviation A 6 , is approximately
tracking of the internal nonlinear A S core.

41
VII. DATA-DRIVEN PHASE DETECTORS
1 A9
Unless the data contains a guaranteed periodic transition, the
CDR will be required to lock onto random transitions embedded in
The expected phase change in the loop after one update time is
the data stream. The effects of runlength and transition density on
loop performance must then be considered. The effect of these two
e
data attributes is dependent on the type of phase detector used.
«((i-"«)~"')-^e» Most modern codes use some variation of Alexander's phase
detector [9] shown in Fig. 22.Two matched flip-flops form the
The discrete time equation for the average evolution of loop phase
under the condition of a small input phase error can then be
expressed as Retimed Data
'pump

( 26 ^

Input
Data
B
A
DOWN
UP v
tune

50% duty-cycle
This equation has the same form as a discrete time approximation clock from VCO •pump
T
to the capacitor voltage in an RC lowpass filter. By analogy, when Transition Samples
time is expressed in units of loop update times, any transient phase
error in the bang-bang loop can then be said to decay to zero with
Fig. 22. Modified form of Alexander's ternary-quantized phase
a time constant of T = GJ2K/(2Q^^) . detector for NRZ data along with a typical charge pump for driving
the VCO tuning input.
This "lowpass" loop characteristic is being driven by random
energy from the early/late phase detector output. A related prob-
lem is the computation of the baseline wander voltage generated front-end of Alexander's phase detector, with the first flip-flop
by passing a random NRZ data stream through a coupling capaci- driven on therisingedge of the 50% duty-cycle clock, and the sec-
tor. It can be shown that the sigma on the capacitor voltage is ond flip-flop driven on the falling edge of the same clock. (Using
a fully-differential monolithic ring-oscillator, it is possible to
given by OBLW = vppJtbi/&x) ' E x t e n d i n g t h i s analogy to achieve a very precise 50% duty-cycle clock source). When the
the loop, we can consider the output of the phase detector as a loop is locked, the rising-edge retiming flip-flop samples the cen-
50% duty-cycle random NRZ data stream. Given that the output ter of each data bit and produces a retimed data bit at (A) and the
from each "bit" must cause a loop phase change of Qbb , we can following retimed bit at (B). The falling-edge flip-flop functions as
a phase detector by sampling the transition (T) between the data
compute that the effective V to satisfy our loop difference bits (A,B). To improve the circuit's operating speed, the (T) sam-
ple is delayed an extra half bit time by a latch so that the logic on
equation must be JlTlO,. We can then compute the loop jitter by (A,T,B) has a full bit time for resolution.
using the analogous baseline wander expression with the effective
The transition sample is then compared to the surrounding data
loop V and t . The result is
bits to determine whether the clock sampling phase is early or late
to derive a binary-quantized (bang-bang) or ternary phase error
/eTToJ^n indication. A truth-table for the logic in Fig. 22 is given in Table 1.

TABLE 1. Truth table for logic in Fig. 22.


which is consistent with empirical analysis of simulation results.
State A T B UP DOWN Meaning
One further insight into this behavior is offered. The second- 0 0 0 0 0 0 hold
order loop drives the phase detector output to a steady-state 50%
duty-cycle. In this condition, the loop phase splits the input jitter 1 0 0 1 0 1 early
distribution into equal early and late halves. This means that the 2 0 1 0 1 1 hold
bang-bang loop phase is servoed to the median of the input jitter
distribution rather than to the mean as would be the case with a 3 0 1 1 1 0 late
linear loop. Because of this, the bang-bang loop makes a constant 4 1 0 0 1 0 late
modest correction in response to large jitter outliers, rather than 5 1 0 1 1 1 hold
the proportionally large overcompensation of a linear loop. This
insight supports the idea that the bang-bang loop jitter should only 6 1 1 0 0 1 early
be sub-linearly affected by the magnitude of the input jitter. 7 1 1 1 0 0 hold

42
The states 2 and 5 in Table 1 correspond to the normally impos- when Aj > AQ , for this implies exponential growth of the acqui-
sible condition of sampling a " 1 " midway between two "0" bits. A
custom truth table can use these states to detect either a high bit- sition transient. The convergence is guaranteed whenever
error-rate condition [10], a VCO running grossly too slowly (eg: $>2X.
lump these states into the "late" condition), or taken as an indica-
tion that a link has locked onto its own VCO crosstalk, perhaps by Although usable for tightly constrained block codes such as 8b/
amplification of power supply noise by pick up from a high-gain 10B, binary phase detectors are essentially unusable for codes
optical transimpedance amplifier [11]. such as 10Gb Ethernet 64b/66b or SONET which can have very
long runlengths of up to 66 or 80 bits, respectively.
Since the mid-bit samples (A,B) straddle the (T) transition
sample, it is also possible to detect the lack of a transition. This B. Ternary Phase-Detector
condition corresponds to states 0 and 7 in Table 1. This informa-
tion can be used to create an extra ternary hold-state in the PD out- The 3-state, or ternary phase detector provides superior jitter
put, causing the charge pump to hold its value during long run- performance for data with long runs [12]. Ternary PDs neither
lengths. Both binary and ternary PDs will be discussed in turn, charge nor discharge the loop filter during long runs causing the
along with their implications on loop performance. loop to hold the current estimate of the data frequency. Such loops
effectively "stop time" during long runs.
A. Run-length and Latency
If the charge pump does not have a hold-mode, it is possible to
Binary phase detectors have no hold state, so the PD continues emulate a ternary loop, with some loss of performance, by contin-
to put out the last valid phase error indication during long data uously toggling the phase-detector output to approximately main-
runlengths. In this situation, the loop idling jitter will be multi- tain the current charge pump voltage during long runs.
plied from the expected value by the maximum runlength of the
data. For example, an 8B/10B code has a maximum code run- The peak idling jitter for ternary loops is unchanged from the
length of 5 and will have a peak jitter walk-off five times the value simple 100% transition density analysis. The RMS jitter will be
of that computed for a "10" repetitive data pattern. The average reduced by the average transition density. Because the loop phase
RMS jitter will be a function of the runlength distributions of each cannot change during hold mode, the jitter tolerance will be der-
particular code. There is also a trade-off in effective stability factor ated by the average transition density. This can easily be taken into
as a repetitive pattern such as "11110000" will be equivalent to a
account by increasing 8 ^ appropriately for the characteristics of
loop with an effective update time 4 times larger than the expected
1
the code to be used.
update = l/fnom- S i n c e t h e stabilitv factor is
inversely
dependent on update time, it is possible for binary PDs to become C. VCO Tuning Bandwidth
unstable with data patterns containing very long runs due to the
delay in timely phase-error feedback. The previous analyses all assumed an infinite VCO tuning
bandwidth for the proportional tuning input. A VCO time-constant
Slope(t=0) = S tvco , can slightly reduce hunting jitter if it is small compared to
*1 X% - SX the loop update time.

0 Timeconstants larger than the loop update time prevent the


*i loop from reversing phase slope within an update period and
lengthen the loop limit cycle. If the extra pole is thought of as an
Fig. 23. Setup for computing onset of loop instability with latency X. extra latency 2 T y c o , then the result of the previous section can be
used to give an approximate bound on loop stability. To avoid
divergence: ^vco<ktupdate^^' Comparison with simulation
Fig. 23 shows the loop phase trajectory during an acquisition
transient. At t=0, the loop crosses zero phase error with verifies this equation as a conservative limit on Xyco .
d§/dt = S. From this we can compute an overshoot AQ.
However, it cannot be recommended to flirt with this boundary.
When the loop phase again crosses zero phase error, the phase
Unless one meticulously checks performance by numerical simu-
detector is late in responding by a time X. This time is a combina- lation, it is safest to design the VCO to essentially respond fully in
tion of runlength, latency in the phase detector logic, and high- one update time. This is usually very easy to achieve in ring-oscil-
order poles in the VCO tuning characteristic. lators and possible with some care using low-Q LC VCOs.

Due to the loop latency X, the loop overshoots zero phase by VIII. CONCLUSION

X /t,-SX before the "braking" effect of the proportional Bang-bang CDR circuits have the unique advantages of inher-
branch starts to act. The onset of catastrophic instability occurs ent sampling phase alignment, adaptability to multi-phase sam-

43
pling structures, and operation at the highest speed at which a [11] R. C. Walker, C. Stout and C. Yen, "A 2.488Gb/s Si-
process can make a working flip-flop. Approximate equations for Bipolar Clock and Data Recovery IC with Robust Loss of
loop jitter, recovered clock spectrum, and jitter tracking perfor- Signal Detection," in ISSCC Digest of Technical Papers
mance as a function of various design parameters have been pp. 246-247,466, Feb. 1997.
derived. The median-tracking property of the bang-bang loop
resulting in an output jitter equal to the square root of the input jit- [12] N. Ishihara and Y. Akazawa, "A Monolithic 156 Mb/s
ter has been presented. Clock and Data Recovery PLL Circuit Using the Sample-
and-Hold Technique," IEEE Journal of Solid-State Cir-
ACKNOWLEDGMENT cuits, vol. 29, no. 12, pp. 1566-1571, Dec. 1994.
[13] D. Chen, and M. O. Baker, "A 1.25 Gb/s, 460mW
The author is grateful to the contributions of Birdy Amrutur, CMOS Transceiver for Serial Data Communication," in
Bill Brown, John Corcoran, Craig Corsetto, Dave DiPietro, Brian
ISSCC Digest of Technical Papers, pp. 242- 243,465 Feb.
Donoghue, Jeff Galloway, Andrew Grzegorek, Tom Hornak, Jim
1997.
Homer, Tom Knotts, Benny Lai, Adolf Leiter, Bill McFarland,
Charles Moore, Rasmus Nordby, Cheryl Owen, Pat Petruno, Kent [14] L. DeVito, J. Newton, R. Goughwell, J. Bulzacchelli
Springer, Guenter Steinbach, Hugh Wallace, Bin Wu, J.T. Wu, and and F.Benkley, "A 52MHz and 155 MHz Clock-Recovery
Chu Yen for technical discussions and helpful insights into bang- PLL," in ISSCC Digest of Technical Papers, pp. 142-
bang loop behavior. 143,306, Feb. 1991.
[15] J. F. Ewen, A. X. Widmer, M. Soyuer, K. R. Wrenner,
REFERENCES
B. Parker and H. A. Ainspan, "Single-Chip 1062Mbaud
[1] C. B. Armitage, "SAW Filter Retiming in the AT&T CMOS Transceiver for Serial Data Communication," in
432 Mb/s Lightwave Regenerator," in Conference Pro- ISSCC Digest of Technical Papers, pp. 32-33,336, Feb.
ceedings: AT&T Bell Labs, pp. 102-103, Sept. 3-6, 1984. 1995.
[2] C. R. Hogge, Jr., "A Self Correcting Clock Recovery [16] A. Fiedler, R. Mactaggart, J. Welch and S. Krishnan, "A
Circuit," IEEE Transactions on Electron Devices, vol. ED- 1.0625Gbps Transceiver with 2x-Oversampling and
32, no. 12, pp. 2704-2706, Dec. 1985. Transmit Signal Pre-Emphasis," in ISSCC Digest of Tech-
[3] J. Tani, Crandall, D., Corcoran, J. Hornak, T., "Parallel nical Papers, pp. 238-239,464, Feb. 1997.
Interface ICs for 120Mb/s Fiber Optic Links," in ISSCC [17] B. Guo, A. Hsu, Y. Wang and J. Kubinec, "125Mb/s
Digest oj Technical Papers, pp. 190-191,390, Feb. 1987. CMOS All-Digital Data Transceiver Using Synchronous
[4] R. C. Walker, T. Hornak, C. Yen and K. H. Springer, "A Uniform Sampling," in ISSCC Digest of Technical Papers,
Chipset for Gigabit Rate Data Communication," in Pro- pp. 112-113, Feb. 1994.
ceedings of the 1989 Bipolar Circuits and Technology [18] Y. M. Greshishchev, P. Schvan, J. L. Showell, M. Xu, J.
Meeting, pp. 288-290 September 18-19 1989. J. Ojha and J. E. Rogers, "A Fully Integrated SiGe
[5] F. Gardner, Phaselock Techniques, New York: John Receiver IC for 10-Gb/s Data Rate," IEEE Journal of Solid
Wiley & Sons, 1979, pp. 8-14. State Circuits, vol. 35, no. 12, pp. 1949-1957, Dec. 2000.
[6] I. Galton, "Higher-order Delta-Sigma Frequency-to- [19] R. Gu, J. M. Tran, H. Lin, A. Yee and M. Izzard, "A 0.5-
Digital Conversion," in Proceedings of IEEE International 3.5Gb/s Low-Power Low-Jitter Serial Data CMOS Trans-
Symposium on Circuits and Systems, pp. 441-444, May 30 ceiver," in ISSCC Digest of Technical Papers, pp. 352-
-June 2, 1994. 353,478, Feb. 1999.
[7] I. Galton, "Analog-Input Digital Phase-Locked Loops [20] J. Hauenschild, C. Dorshcky, T. W. Mohrenfels and R.
for Precise Frequency and Phase Demodulation," Transac- Seitz, "A lOGb/s BiCMOS Clock and Data Recovery 1:4-
tions on Circuits and Systems-II: Analog and Digital Sig- Demultiplexer in a Standard Plastic Package with External
nal Processing, vol. 42, no. 10, pp. 621-630, Oct. 1995. VCO," in ISSCC Digest of Technical Papers, pp. 202-
203,445, Feb. 1996.
[8] M. W. Hauser, "Principles of Oversampling AID Con-
version," J. Audio Eng. So. vol 39, no. 1/2, pp 3-26, Jan./ [21] T. He, and P. Gray, "A Monolithic 480 Mb/s AGC/Deci-
Feb. 1991. sion/Clock Recovery Circuit in 1.2 urn CMOS," IEEE
Journal of Solid State Circuits, vol. 28, no. 12, pp. 1314-
[9] J. D. H. Alexander, "Clock Recovery from Random
1320, Dec. 1993.
Binary Signals," Electronics Letters, vol. 11, no. 22, pp.
541-542, Oct. 1975. [22] P. Larsson, "A 2-1600MHz 1.2-2.5V CMOS Clock-
Recovery PLL with Feedback Phase-Selection and Aver-
[10] J. Hauenschild, D. Friedrich, J. Herrle, J. Krug, "A Two-
aging Phase-Interpolation for Jitter Reduction," in ISSCC
Chip Receiver for Short Haul Links up to 3.5Gb/s with
Digest of Technical Papers, pp. 356-357, Feb. 1999.
PIN-Preamp Module and CDR-DMUX " in ISSCC Digest
of Technical Papers, pp. 308-309,452, Feb. 1996.

44
[23] B. Lai, and R. C. Walker, "A Monolithic 622Mb/s Clock [36] R. C. Walker, K. Hsieh, T. A. Knotts and C. Yen, "A
Extraction Data Retiming Circuit," in JSSCC Digest of lOGb/s Si-Bipolar TX/RX Chipset for Computer Data
Technical Papers, pp. 144,145, Feb. 1991. Transmission," in ISSCC Digest of Technical Papers, pp.
[24] T. H. Lee, and J. F. Bulzacchelli, "A 155MHz Clock 302-303,450, Feb. 1998.
Recovery Delay- and Phase-Locked Loop," IEEE Journal [37] R. C. Walker, J. Wu, C. Stout, B. Lai, C. Yen, T. Hornak
of Solid State Circuits vol. 27, no. 12, pp. 1736-1746, Dec. and P. Petruno, "A 2-Chip 1.5Gb/s Bus-Oriented Serial
1992. Link Interface," in ISSCC Digest of Technical Papers, pp.
[25] R. H. Leonowich, and J. M. Steininger, "A 45-MHz 226-227,291, Feb. 1992.
CMOS phase/frequency-locked loop timing recovery cir- [38] C. K. Yang, and M. A. Horowitz, "0.8um CMOS 2.5Gb/
cuit," in ISSCC Digest of Technical Papers, pp. 14-15,278- s Oversampled Receiver for Serial Links," IEEE Journal
279, Feb. 1988. of Solid State Circuits vol. 31, no. 12, pp. 20150-2023,
[26] I. Lee, C. Yoo, W. Kim, S. Chai and W. Song, "A Dec. 1996.
622Mb/s CMOS Clock Recovery PLL with Time- Inter-
leaved Phase Detector Array," in ISSCC Digest of Techni-
Richard Walker was born in San Rafael
cal Papers, pp. 198-199,444, Feb. 1996. CA, in 1960. He received the B.S.
[27] M. Meghelli, B. Parker, H. Ainspan and M. Soyuer, "A degree in Engineering and Applied
SiGe BiCMOS 3.3V Clock and Data Recovery Circuit for Science from the California Institute of
lOGb/s Serial Transmission Systems," in ISSCC Digest of Technology in 1982, and an M.S.
Technical Papers, pp. 56-57, Feb. 2000. degree in Computer SciencefromCali-
fornia State University, Chico, CA in
[28] T. Morikawa, M. Soda, S. Shiori, T. Hashimoto, F. Sato 1992. Rick joined Agilent Laboratories
and K. Emura, "A SiGe Single-Chip 3.3V Receiver IC for (formerly Hewlett-Packard Laborato-
lOGb/s Optical Communication System," in ISSCC Digest ries) in 1981, where he is currently a
of Technical Papers, pp. 380-381,481, Feb. 1999. Principal Project Engineer. Since that
time, he has worked in the areas of
[29] A. Pottbacker, and U. Langmann, "An 8GHz Silicon broadband-cable modem design, solid-
Bipolar Clock-Recovery and Data-Regenerator IC," IEEE state laser characterization, phase-locked-loop theory, linecode
Journal of Solid State Circuits vol. 29, no. 12, pp. 1572- design, and gigabit-rate serial data transmission. He holds 15 U.S.
1576, Dec. 1994. patents.
[30] M. Reinhold, C. Dorschky, F. Pullela, E. Rose, P.
Mayer, P. Paschke, Y. Baeyens, J. Mattia and F. Kunz, "A
Fully-Integrated 40Gb/s Clock and Data Recovery / 1:4
DEMUX IC in SiGe Technology," in ISSCC Digest of
Technical Papers, pp. 84-85,435, Feb. 2001.
[31] M. Soyuer, and H. A. Ainspan, "A Monolithic 2.3 Gb/s
lOOmW Clock and Data Recovery Circuit," in ISSCC
Digest of Technical Papers, pp. 158-159,282, Feb. 1993.
[32] S. Ueno, K. Watanabe, T. Kato, T. Shinohara, K.
Mikami, T. Hashimoto, A. Takai, K. Washio, R. Takeyar
and T. Harada, "A Single-Chip lOGb/s Transceiver LSI
using SiGe SOI/BiCMOS," in ISSCC Digest of Technical
Papers, pp. 82-83,435, Feb. 2001.
[33] H. Wang, and R. Nottenburg, "A lGb/s CMOS Clock
and Data Recovery Circuit," in ISSCC Digest of Technical
Papers, pp. 354-355,477, Feb. 1999.
[34] P. Wallace, R. Bayruns, J. Smith, T. Laverick and R.
Shuster, "A GaAs 1.5Gb/s Clock Recovery and Data
Retiming Circuit," in ISSCC Digest of Technical Papers,
pp. 192-193, Feb. 1990.
[35] Z. Wang, M. Berroth, J. Seibel, P. Hofinann, A. Huls-
mann, Kohler, B. Raynor and J. Schneider, "19GHz
Monolithic Integrated Clock Recovery Using PLL and
0.3um Gate-Length Quantum-Well HEMTs," in ISSCC
Digest of Technical Papers, pp. 118-119, Feb. 1994,

45
Predicting the Phase Noise and Jitter of PLL-Based
Frequency Synthesizers
Kenneth S. Kundert

Abstract — Two methodologies are presented for predicting the frequency dividers (FDs). The PLL is a feedback loop that,
phase noise and jitter of a PLL-based frequency synthesizer when in lock, forces /ft, to be equal to/ r e f . Given an input fre-
using simulation that are both accurate and efficient. The meth- quency ^ n , the frequency at the output of the PLL is
odologies begin by characterizing the noise behavior of the
blocks that make up the PLL using transistor-level RF simula-
tion. For each block, the phase noise or jitter is extracted and •'out M Y (1)
applied to a model for the entire PLL.
where M is the divide ratio of the input frequency divider, and
N is the divide ratio of the feedback divider. By choosing the
I. INTRODUCTION
frequency divide ratios and the input frequency appropriately,
Phase-locked loops (PLLs) are used to implement a variety of the synthesizer generates an output signal at the desired fre-
timing related functions, such as frequency synthesis, clock quency that inherits much of the stability of the input oscilla-
and data recovery, and clock de-skewing. Any jitter or phase tor. In RF transceivers, this architecture is commonly used to
noise in the output of the PLL used in these applications gen- generate the local oscillator (LO) at a programmable fre-
erally degrades the performance margins of the system in quency that tunes the transceiver to the desired channel by
which it resides and so is of great concern to the designers of adjusting the value of N.
such systems. Jitter and phase noise are different ways of
referring to an undesired variation in the timing of events at /ref
OSC PU
the output of the PLL. They are difficult to predict with tradi- PFD CP LF VCO
/in
tional circuit simulators because the PLL generates repetitive /out
switching events as an essential part of its operation, and the
/fbp
-f-N
noise performance must be evaluated in the presence of this
large-signal behavior. SPICE is useless in this situation as it Fig. 1. The block diagram of a frequency synthesizer.
can only predict the noise in circuits that have a quiescent
(time-invariant) operating point. In PLLs the operating point B. Direct Simulation
is at best periodic, and is sometimes chaotic. Recently a new In many circumstances, SpectreRF* can be directly applied to
class of circuit simulators has been introduced that are capa- predict the noise performance of a PLL. To make this possi-
ble of predicting the noise behavior about a periodic operating ble, the PLL must at a minimum have a periodic steady state
point [1]. SpectreRF is the most popular of this class of simu- solution. This rules out systems such as bang-bang clock and
lators and, because of the algorithms used in its implementa- data recovery circuits and fractional-Af synthesizers because
tion, is likely to be the best suited for this application [2]. they behave in a chaotic way by design. It also rules out any
These simulators can be used to predict the noise perfor- PLL that is implemented with a phase detector that has a dead
mance of PLLs. The ideas presented in this paper allow those zone. A dead zone has the effect of opening the loop and let-
simulators to be applied even to those PLLs that have chaotic ting the phase drift seemingly at random when the phase of
operating points. the reference and the output of the voltage-controlled oscilla-
tor (VCO) are close. This gives these PLLs a chaotic nature.
A. Frequency Synthesis
To perform a noise analysis, SpectreRF must first compute
The focus of this paper is frequency synthesis. The block dia- the steady-state solution of the circuit with its periodic steady
gram of a PLL operating as a frequency synthesizer is shown state (PSS) analysis. If the PLL does not have a periodic solu-
in Figure 1 [3]. It consists of a reference oscillator (OSC), a tion, as the cases described above do not, then it will not con-
phase/frequency detector (PFD), a charge pump (CP), a loop verge. There is an easy test that can be run to determine if a
filter (LF), a voltage-controlled oscillator (VCO), and two circuit has a periodic steady-state solution. Simply perform a
transient analysis until the PLL approaches steady state and
Ken Kundert is with Cadence Design Systems, San Jose, Cal-
ifornia, kundert@cadence.com. t Spectre is a registered trademark of Cadence Design Systems.

46
then observe the VCO control voltage. If this signal consists jitter parameters for the corresponding behavioral models [8].
of frequency components at integer multiples of the reference Once everything is ready, simulation of the PLL occurs with
frequency, then the PLL has a periodic solution. If there are the blocks of the PLL being described with behavioral models
other components, it does not. Sometimes it can be difficult to that exhibit jitter. The actual jitter or phase noise statistics are
identify the undesirable components if the components asso- observed during this simulation. Generally tens to hundreds
ciated with the reference frequency are large. In this case, use of thousands of cycles are simulated, but the models are effi-
the strobing feature of Spectre's transient analysis to elimi- cient so the time required for the simulation is reasonable.
nate all components at frequencies that are multiples of the This approach allows prediction of PLL jitter behavior once
reference frequency. Do so by strobing at the reference fre- the noise behavior of the blocks has been characterized. How-
quency. In this case, if the VCO control voltage varies in any ever, it requires the use of an experimental simulator that is
significant way the PLL does not have a periodic solution. not readily available to characterize the jitter of the blocks.
If the PLL has a periodic solution, then in concept it is always In an earlier series of papers [9, 10], the relevant ideas of
possible to apply SpectreRF directly to perform a noise analy- Demir were adapted to allow use of a commercial simulator,
sis. However, in some cases it may not be practical to do so. Spectre [11], and an industry standard modeling language,
The time required for SpectreRF to compute the noise of a Verilog-A^ [12]. These ideas are further refined in the later
PLL is proportional to the number of circuit equations needed half of this paper.
to represent the PLL in the simulator times the number of
time points needed to accurately render a single period of the E. Predicting Noise in PLLs
solution times the number of frequencies at which the noise is There are two different approaches to modeling noise in
desired. When applying SpectreRF to frequency synthesizers PLLs. One approach is to formulate the models in terms of
with large divide ratios, the number of time points needed to the phase of the signals, producing what are referred to as
render a period can become problematic. Experience shows phase-domain models. In the simplest case, these models are
that divide ratios greater than ten are often not practical to linear and analyzed easily in the frequency domain, making it
simulate. Of course, this varies with the size of the PLL. simple to use the model to predict phase noise, even in the
For PLLs that are candidates for direct simulation using Spec- presence of flicker noise or other noise sources that are diffi-
treRF, simply configure the simulator to perform a PSS analy- cult to model in the time domain. Phase-domain models are
sis followed by a periodic noise (PNoise) analysis. The period described in the first half of this paper.
of the PSS analysis should be set to be the same as the refer- The process of predicting the phase noise of a PLL using
ence frequency as defined in Figure 1. The PSS stabilization phase-domain models involves:
time (tstab) should be set long enough to allow the PLL to
1. Using SpectreRF to predict the noise of the individual
reach lock. This process was successfully followed on a fre-
blocks that make up the PLL.
quency synthesizer with a divide ratio of 40 that contained
2500 transistors, though it required several hours for the com- 2. Building high-level behavioral models of each of the
plete simulation [4]. blocks that exhibit phase noise.
3. Assembling the blocks into a model of the PLL.
C. When Direct Simulation Fails 4. Simulating the PLL to find the phase noise of the overall
The challenge still remains, how does one predict the phase system.
noise and jitter of PLLs that do not fit the constraints that The other approach formulates the models in terms of volt-
enable direct simulation? The remainder of this paper age, which are referred to as voltage-domain models. The
attempts to answer that question for frequency synthesizers, advantage of voltage-domain models is that they can be
though the techniques presented are general and can be refined to implementation. In other words, as the design pro-
applied to other types of PLLs by anyone who is sufficiently cess transitions to being more of a verification process, the
determined. abstract behavioral models initially used can be replaced with
detailed gate- or transistor-level models in order to verify the
D. Monte Carlo-Based Methods PLL as implemented.
Demir proposed an approach for simulating PLLs whereby a A voltage-domain model is strongly nonlinear and never has a
PLL is described using behavioral models simulated at a high quiescent operating point, making it incompatible with a
level [5, 6]. The models are written such that they include jit- SPICE-Iike noise analysis. Often such models have a periodic
ter in an efficient way. He also devised a simulation algorithm operating point and so can be analyzed with small-signal RF
based on solving a set of nonlinear stochastic differential noise analysis (SpectreRF), but it is also common for that not
equations that is capable of characterizing the circuit-level to be the case. For example, a fractional-^ synthesizer does
noise behavior of blocks that make up a PLL [6, 7]. Finally,
he gave formulas that can be used to convert the results of the
t Verilog is a registered trademark of Cadence Design Systems
noise simulations on the individual blocks into values for the licensed to Accellera.

47
not have a periodic operating point. Occasionally, the circuit involves hundreds or thousands of cycles at the input to the
is sensitive enough that the noise affects the large-signal phase detector. With large divide ratios, this can translate to
behavior of the PLL, such as with bang-bang clock-and-data hundreds of thousands of cycles of the VCO. Thus, the num-
recovery PLLs, which invalidates any use of small-signal ber of time points needed for a single simulation could range
noise analysis. into the millions.
Modeling large-signal noise in a voltage-domain model as a This is all true when simulating the PLL in terms of voltages
voltage or a current is problematic. Such signals are very and currents. When doing so, one is said to be using voltage-
small and continuously and very rapidly varying. Extremely domain models. However, that is not the only option avail-
tight tolerances and small time steps are required to accu- able. It is also possible to formulate models based on the
rately resolve such signals with simulation. To overcome phase of the signals. In this case, one would be using phase-
these problems, the noise is instead represented using the domain models. The high frequency variations associated
effect it has on the timing of the transitions within the PLL. In with the voltage-domain models are not present in phase-
other words, the noise is added to circuit in the form of jitter. domain models, and so simulations are considerably faster. In
In this case there is no need for either small time steps or tight addition, when in lock the phase-domain-based models gener-
tolerances. ally have constant-valued operating points, which simplifies
small-signal analysis, making it easier to study the closed-
The process of predicting the jitter of a PLL with voltage-
loop dynamics and noise performance of the PLL using either
domain models involves:
AC or noise analysis.
1. Using SpectreRF to predict the noise of the individual
blocks that make up the PLL. A linear phase-domain model of a frequency synthesizer is
shown in Figure 2. Such a model is suitable for modeling the
2. Converting the noise of the block to jitter.
behavior of the PLL to small perturbations when the PLL is in
3. Building high-level behavioral models of each of the lock as long as you do not need to know the exact waveforms
blocks that exhibit jitter. and instead are interested in how small perturbations affect
4. Assembling the blocks into a model of the PLL. the phase of the output. This is exactly what is needed to pre-
5. Simulating the PLL to find the jitter of the overall system. dict the phase noise performance of the PLL.
The simple linear phase-domain model described in the first
part of this paper, and the nonlinear voltage-domain model osc FD/u PFD CP LF VCO
*det l w
^out
described in the second part, represent the two ends of a con- % //(CO)
M 271 /CO
tinuum of models. Generally, the phase-domain models are +^
considerably more efficient, but the voltage-domain models I ¥DN
1
do a better job of capturing the details of the behavior of the '•ft N
loop, details such as the signal capture and escape processes.
The phase-domain models can be made more general by mak- Fig. 2. Linear time-invariant phase-domain model of the synthesizer
ing them nonlinear and by analyzing them in the time domain. shown in Figure 1.
It is common to use such models with fractional-TV synthesiz-
The derivation of the model begins with the identification of
ers. Conversely, simplifications can be made to the voltage-
those signals that are best represented by their phase. Many
domain models to make them more efficient. It is even possi-
blocks have large repetitive input signals with their outputs
ble to use both voltage- and phase-domain models for differ-
being primarily sensitive to the phase of their inputs. It is the
ent parts of the same loop. One might do so to retain as much
efficiency as possible while allowing part of the design to be signals that drive these blocks that are represented as phase.
refined to implementation level. In general it is best to under- They are identified using a (]) variable in Figure 2. Notice that
stand both approaches well, and use ideas from both to con- this includes all signals except those at the inputs of the LF
struct the most appropriate approach for your particular and VCO.
situation. The models of the individual blocks will be derived by assum-
ing that the signals associated with each of the phase variables
is a pulse train. Though generally the case, it is not a require-
II. PHASE-DOMAIN MODEL
ment. It simply serves to make it easier to extract the models.
It is widely understood that simulating PLLs is expensive Define ri(r0, T, T) to be a periodic pulse train where one of the
because the period of the VCO is almost always very short pulses starts at / 0 and the pulses have duration t and period T
relative to the time required to reach lock. This is particularly as shown in Figure 3. This signal transitions between 0 and 1
true with frequency synthesizers, especially those with large if t is positive, and between 0 and - 1 if z is negative. The
multiplication factors. The problem is that a circuit simulator phase of this signal is defined to be $ = 2%t^/T. In many cases,
must use at least 10-20 time points for every period of the the duration of the pulses is of no interest, in which case
VCO for accurate rendering, and the lock process often n(r 0 , T) is used as a short hand. This occurs because the input

48
that the signal is driving is edge triggered. For simplicity, we The model of (3) is a continuous-time approximation to what
assume that such inputs are sensitive to the rising edges of the is inherently a discrete-time process. The phase detector does
signal, that r0 specifies the time of a rising edge, and that the not continuously monitor the phase difference between its
signal is transitioning between 0 and 1. two input signals, rather it outputs one pulse per cycle whose
width is proportional to the phase difference. Using a continu-
ous time approximation is generally acceptable if the band-

't-fhn , t.r - H width of the loop filter is much less than/ ref (generally less
than/ ref /10 is sufficient). In practical PLLs this is almost

4 i w ' ;tura^r
T>0 T<0
Fig. 3. The pulse train waveform represented by FI(^Q, X, 7).
always the case. It is possible to develop a detailed phase-
domain PFD model that includes the discrete-time effects, but
it would run more slowly and the resulting phase-domain
model of the PLL would not have a quiescent operating point,
which makes it more difficult to analyze.
The input source produces a signal vin = n(/Q, T). Since this is The voltage-controlled oscillator, or VCO, converts its input
the input, t0 is arbitrary. As such, we are free to set its phase <>| voltage to an output frequency, and the relationship between
to any value we like. input voltage and output frequency can be represented as
Given a signal vj = n(/ 0 , T) a frequency divider will produce /out = *Xv c ) (4)
an output signal v o = U(t0, NT) where N is the divide ratio. The mapping from voltage to frequency is designed to be lin-
The phase of the input isfa= 2nto/T and the phase of the out- ear, so a first-order model is often sufficient,
put is <|>o = 2ntrf(NT) and so the phase transfer characteristic
/ o u t = ^vco v c- (5)
of a divider is
•o = *iW. (2) It is the output phase that is needed in a phase-domain model,
There are many different types of phase detectors that can be •outW = 2nJ* vco v c (r)dr (6)
used, each requiring a somewhat different model. Consider a
simple phase-frequency detector combined with a charge or in the frequency domain,
pump [13]. In this case, the detector takes two inputs,
vj = n(*i, T) and v 0 = n(f 0 , T) and produces an output <t>out(«) = — ^ ^ ( c o ) . (7)
*cp = /maxn(^o» ?i~ro» T) where / max is the maximum output
current of the charge pump. The output of the charge pump A. Small-Signal Stability
immediately passes through a low pass filter that is designed
to suppress signals at frequencies of 1/7 and above, so in most This completes the derivation of the phase-domain models for
cases the pulse nature of this signal can be ignored in favor of each of the blocks. Now the full model is used to help predict
its average value, <i ) . Thus, the transfer characteristic of the small-signal behavior of the PLL. Start by using Figure 2
the combined PFD/CP is to write a relationship for its loop gain. Start by defining
dct vc
G fwd = I2H1 = - J 2 f f ( ( D ) _ l E 2 = ° (8)
</cp> = 7max~Y"" = 7max 1%
=
~^^\ "*o) <3)
where AT Kdetdet = / m a x . Of course, this is only valid for to be the forward gain,
^% a t t n e mmost.
l^l"" ^2! < 2fl
ost
- The behavior outside this range <I>4V 1
depends strongly on the type of phase detector used [3]. Even G = — =~ (9)
rCV
within this range, the phase detector may be better modeled <>ou« * ^
with a nonlinear transfer characteristic. For example, there to be the feedback factor, and
can be a flat spot in the transfer characteristics near 0 if the
detector has a dead zone. However it is generally not produc- T ~ GfwdGrev ^ (10)
tive to model the dead zone in a phase-domain model.'
to be the loop gain. The loop gain is used to explore the small-
signal stability of the loop. In particular, the phase margin is
an important stability metric. It is the negative of the differ-
t This phase-domain model is a continuous-time model that ignores ence between the phase shift of the loop at unity gain and
the sampling nature of the PFD. A dead zone interacts with the sam- 180°, the phase shift that makes the loop unstable. It should
pling nature of the PFD to create a chaotic limit cycle behavior that
is not modeled with the phase-domain model. This chaotic behavior be no less than 45° [14]. When concerned about phase noise
creates a substantial amount of jitter, and for this reason, most mod- or jitter, the phase margin is typically 60° or more to reduce
ern phase detectors are designed such that they do not exhibit dead peaking in the closed-loop gain, which results in excess phase
zones. noise.

49
B. Noise Transfer Functions As co -» 0, Gfwd -» °° because of the 1 /(/G>) term from the
In Figure 4 various sources of noise have been added. These VCO. So at DC, G r e f , G f d m , G f d n ^ N , Cm->N/M and
G
noise sources can represent either the noise created by the vco -* 0 • At low frequencies, the noise of the PLL is con-
blocks due to intrinsic noise sources (thermal, shot, and tributed by the OSC, PFD/CP, FD M and FD M and the noise
flicker noise sources), or the noise coupled into the blocks from the VCO is diminished by the gain of the loop.
from external sources, such as from the power supplies, the Consider further the asymptotic behavior of the loop and the
substrate, etc. Most are sources of phase noise, and denoted VCO noise at low offset frequencies (co —> 0). Oscillator
phase noise in the VCO results in the power spectral density
FDw PFD/CP LF VCO 5(})vco being proportional to 1/co2, or fyvco~ 1/co2 (neglect-
•»f ing flicker noise). If the LF is chosen such that //(co) ~ 1,
•in" 1 I *det
//(co)
2TC
^vco •oui
M (g 271
^ j(O then Gfwd - 1 /co, and contribution from the VCO to the out-
^fdm 'det TVCO put noise power, GvccAvco»*s f*mte anc* nonzero. If the LF
FD*
1 is chosen such that //(co) - 1/co, as it typically is when a
r N true charge pump is employed, then Gfwd ~ 1/co2 and the
^fdn
noise contribution to the output from the VCO goes to zero at
low frequencies.
Fig. 4. Linear time-invariant phase-domain model of the synthesizer
shown in Figure 2 with representative noise sources added. The <|)'s C. Noise Model
represent various sources of noise.
One predicts the phase noise exhibited by a PLL by building
and applying the model shown in Figure 4. The first step in
•in* •fdm* ^fdn' anc* •vcc because the circuit is only sensitive doing so is to find the various model parameters, including
to phase at the point where the noise is injected. The one the level of the noise sources, which generally involves either
exception is the noise produced by the PFD/CP, which in this direct measurement or simulating the various blocks with an
case is considered to be a current, and denoted /det. RF simulator, such as SpectreRF. Use periodic noise (or
Then the transfer functions from the various noise sources to PNoise) analysis to predict the output noise that results from
the output are stochastic noise sources contained within the blocks using
G NG simulation. Use a periodic AC or periodic transfer function
r - *out _ fwd _ fwd n n
ref =
^"f =
i ^ =
^-Gfwd' (11) (PAC or PXF) to compute the perturbation at the output of a
block due to noise sources outside the block, such as on sup-
G _ •«* _ l _ N ( m
plies.
K
°vco - i •; f - TTTa ' '
Once the model parameters are known, it is simply a matter of
1 N computing the output phase noise of the PLL by applying the
C = isH! = ±S*a* = (\3) equations in Section II-B to compute the contributions to (J)out
* tfa Ml-T MN-G^' from every source and summing the results. Be careful to
and by inspection, account for correlations in the noise sources. If the noise
sources are perfectly correlated, as they might be if the ulti-
Gfdn = j = * - "Graf d4) mate source of noise is in the supplies or substrate, then use a
direct sum. If the sources produce completely uncorrelated
noise, as they would when the ultimate source of noise is ran-
G fdm - j 2 * - ~GKf, (15) dom processes within the devices, use a root-mean-square
Tfdm sum.
27CG
r - *out _ ref ,v
Alternatively, one could build a Verilog-A model and use sim-
n
u }
^det - "j 7?—' ulation to determine the result. The top-level of such a model
A
'det det
is shown in Listing 1. It employs noisy phase-domain models
On this last transfer function, we have simply referred *det to
for each of the blocks. These models are given in Listings 3-7
the input by dividing through by the gain of the phase detec-
and are described in detail in the next few sections (HI-VI). In
tor.
this example, the noise sources are coded into the models, but
These transfer functions allow certain overall characteristics the noise parameters are not set at the top level to simplify the
of phase noise in PLLs to be identified. As co->°o, model. To predict the phase noise performance of the loop in
Gfwd -» 0 because of the VCO and the low-pass filter, and so lock, simply specify these parameters in Listing 1 and per-
G
ref> Gdet> Gfdm> Gfdn> G i n ~> ° a n d G vco ~> l • A t h i S h f r e " form a noise analysis. To determine the effect of injected
quencies, the noise of the PLL is that of the VCO. Clearly this noise, first refer the noise to the output of one of the blocks,
must be so because the low-pass LF blocks any feedback at and then add a source into the netlist of Listing 1 at the appro-
high frequencies. priate place and perform an AC analysis.

50
Listing 1 — Phase-domain model for a PLL configured as a is because oscillators inherently tend to amplify noise found
frequency synthesizer. near their oscillation frequency and any of its harmonics. The
reason for this behavior is covered next, followed by a
include "discipline.h" description of how to characterize and model the noise in an
module pll(out); oscillator. The origins of oscillator phase noise are described
output out; in a conceptual way here. For a detailed description, see the
phase out; papers by Kaertner or Demir et al [15, 16, 17].
parameter integer m = 1 from [1 :inf); //input divide ratio
parameter real Kdet = 1 from (O:inf); //detector gain A. Oscillator Phase Noise
parameter real Kvco = 1 from (0:inf); // VCO gain
Nonlinear oscillators naturally produce high levels of phase
parameter real d = 1 n from (O:inf); //Loop filter C1
parameter real c2 = 200p from (0:inf); //Loop filter C2
noise. To see why, consider the trajectory of a fully autono-
parameter real r = 10K from (0:inf); //Loop filterR mous oscillator's stable periodic orbit in state space. In steady
parameter integer n = 1 from [1 :inf); // fb divide ratio state, the trajectory is a stable limit cycle, v. Now consider
phase in, ret, fb; perturbing the oscillator with an impulse and assume that the
electrical c; deviation in the response due to the perturbation is Av, as
shown in Figure 5. Separate Av into amplitude and phase vari-
oscillator OSC(in);
divider #(.ratio(m)) FDm(in, ref); ations,
phaseDetector #(.gain(Kdet)) PD(ref, fb, c); Av(r) = [\+a(t)]v(t + $&)-v(t). (17)
loopFilter#(.c1(c1), .c2(c2), .r(r)) LF(c);
vco #(.gain(Kvco)) VCO(c, out);
divider #(.ratio(n)) FDn(out, fb); where v represents the unperturbed T-periodic output voltage
endmodule
of the oscillator, oc represents the variation in amplitude, § is
the variation in phase, and/ o = \IT is the oscillation fre-
Listings 1 and 3-7 have phase signals, and there is no phase quency.
discipline in the standard set of disciplines provided by Ver-
ilog-A or Verilog-AMS in discipline.h. There are several dif- v2
ferent resolutions for this problem. Probably the best solution 'CL Av(0)
is to simply add such a discipline, given in Listing 2, either to '6
discipline.h as assumed here or to a separate file that is 'o
included as needed. Alternatively, one could use the rota- 'l h A!>6 '51
v
l
tional discipline. It is a conservative discipline that includes l
A '6
torque as a flow nature, and so is overkill in this situation. h
Finally, one could simply use either the electrical or the volt- h h
h h
age discipline. Scaling for voltage in volts and phase in radi- K
ans is similar, and so it will work fine except that the units Fig. 5. The trajectory of an oscillator shown in state space with and
will be reported incorrectly. Using the rotational discipline without a perturbation Av. By observing the time stamps (?Q» ..., fg)
would require that all references to the phase discipline be one can see that the deviation in amplitude dissipates while the
changed to rotational in the appropriate listings. Using either deviation in phase does not.
the electrical or voltage discipline would require that both the
Since the oscillation is stable and the duration of the distur-
name of the disciplines be changed from phase to either elec-
bance is finite, the deviation in amplitude eventually decays
trical or voltage, and the name of the access functions be
away and the oscillator returns to its stable orbit (oc(f) -» 0 as
changed from Theta to V.
t -» oo). In effect, there is a restoring force that tends to act
Listing 2 — Signal flow discipline definition for phase signals (the against amplitude noise. This restoring force is a natural con-
nature Angle is defined in discipline.h). sequence of the nonlinear nature of the oscillator that acts to
suppresses amplitude variations.
* include "discipline.h" The oscillator is autonomous, and so any time-shifted version
discipline phase of the solution is also a solution. Once the phase has shifted
potential Angle; due to a perturbation, the oscillator continues on as if never
enddiscipline disturbed except for the shift in the phase of the oscillation.
There is no restoring force on the phase and so phase devia-
tions accumulate. A single perturbation causes the phase to
m. OSCILLATORS
permanently shift (§(t) —> A(|) as t —> oo). If we neglect any
Oscillators are responsible for most of the noise at the output short term time constants, it can be inferred that the impulse
of the majority of well-designed frequency synthesizers. This response of the phase deviation <|>(0 can be approximated with

51
a unit step s(t). The phase shift over time for an arbitrary input at A/= 1 Hz and/ c is the flicker noise corner frequency. As
disturbance u is shown in Figure 6, n is extracted by simply extrapolating to
oo t 1 Hz from a frequency where the noise from the white
sources dominates.
<K0~ js(t-x)u(x)dx = J«(x)rfr, (18)
—oo —oo

\\^3: Flicker sources dominate


or the power spectral density (PSD) of the phase is

lHz \^
^v White sources dominate
This shows that in all oscillators the response to any form of
perturbation, including noise, is amplified and appears mainly \T2:1 External noise sources
in the phase. The amplification increases as the frequency of
the perturbation approaches the frequency of oscillation in
proportion to l/A/(or I/A/ 2 in power). a
Notice that there is only one degree of freedom — the phase f\» **
of the oscillator as a whole. There is no restoring force when /o

the phase of all signals associated with the oscillator shift Fig. 6. Extracting the noise parameters, n> a, and/ c , for an oscillator.
together, however there would be a restoring force if the The parameter a is an alternative to n where n = afo2. It is used later.
phase of signals shifted relative to each other. This observa- The graph is plotted on a log-log scale.
tion is significant in oscillators with multiple outputs, such as
Sty is not directly observable and often difficult to find, so now
quadrature or ring oscillators. The dominant phase variations
Sty is related to L, the power spectral density of the output
appear identically in all outputs, whereas relative phase varia-
voltage noise Sv normalized by the power in the fundamental
tions between the outputs are naturally suppressed by the
tone. Sv is directly available from either measurement with a
oscillator or added by subsequent circuitry and so tend to be
spectrum analyzer or from RF simulators, and £ i s defined as
much smaller [8].
SJ
B. Characterizing Oscillator Phase Noise
Above it was shown that oscillators tend to convert perturba-
««> - ^f. m
where Vj is the fundamental Fourier coefficient of v, the out-
tions from any source into a phase variation at their output
put signal. It satisfies
whose magnitude varies with I/A/ (or l / A / 2 i n power). Now
oo
assume that the perturbation is from device noise in the form
of white and flicker stochastic processes. The oscillator's »(,) = £ Vkei2«kf°'. (23)
response will be characterized first in terms of the phase noise
Sty, and then because phase noise is not easily measured, in
terms of the normalized voltage noise L. The result will be a In (41) of [15], Demir et al shows that for a free-running
small set of easily extracted parameters that completely oscillator perturbed only by white noise sources*
describe the response of the oscillator to white and flicker n
noise sources. These parameters are used when modeling the 440 =I 2 2 A,2, (24)
oscillator. 2 2 2
2w 7C + A /
Assume that the perturbation consists of white and flicker which is a Lorentzian process with corner frequency of
noise and so has the form
/comer = " * « / o - <25>
SJL6f)~\+f±. (20) At frequencies above the corner,

Then from (19) the response will take the form


which agrees with Vendelin [18],
Use (21) to extract/ c . Then use both (21) and (26) to deter-
mine n by choosing JSf well above the flicker noise corner fre-
where the factor of (2ri)2 in the denominator of (19) has been quency,/^ and the corner frequency of (25),/ c o r n e p to avoid
absorbed into «, the constant of proportionality. Thus, the ambiguity and well b e l o w / 0 to avoid the noise from other
response of the oscillator to white and flicker noise sources is sources that occur at these frequencies.
characterized using just two parameters, n &ndfc, where n is
the portion of Sty attributable to the white noise sources alone
t Demir uses c rather than «, where n = c/02.

52
C. Phase-Domain Models for the Oscillators though they should not contain any white space, wpn was
The phase-domain models for the reference and voltage-con- chosen to represent white phase noise and/p/i stands for
trolled oscillators are given in Listings 3 and 4. The VCO flicker phase noise.
model is based on (6). Perhaps the only thing that needs to be When interested in the effect of signals coupled into the oscil-
explained is the way that phase noise is modeled in the oscil- lator through the supplies or the substrate, one would com-
lators. Verilog-AMS provides the flicker jioise function for pute the transfer function from the interfering source to the
modeling flicker noise, which has a power spectral density phase output of the oscillator using either a PAC or PXF anal-
proportional to l / / a with a typically being close to 1. How- ysis. Again, one would simply assume that the perturbation in
ever, Verilog-AMS does not limit a to being close to one, the output of the oscillator is completely in the phase, which
making this function well suited to modeling oscillator phase is true except at very high offset frequencies. One then
noise, for which a is 2 in the white-phase noise region and employs (12) and (13) to predict the response at the output of
close to 3 in the flicker-phase noise region (at frequencies the PLL.
below the flicker noise corner frequency). Alternatively, one
could dispense with the noise parameters and use the IV. LOOP FILTER
noise jtable function in lieu of the flicker jfioise functions to
use the measured noise results directly. The "wpn" and "fpn" Even in the phase-domain model for the PLL, the loop filter
remains in the voltage domain and is represented with a full
Listing 3 — Phase-domain oscillator noise model. circuit-level model, as shown in Listing 5. As such, the noise
behavior of the filter is naturally included in the phase-
include "discipline.h' domain model without any special effort assuming that the
module oscillator(out); noise is properly included in the resistor model.
output out;
phase out; Listing 5 — Loopfiltermodel.
parameter real n = 0 from [O:inf);
// white output phase noise at 1 Hz (rad2/Hz) include "discipline.h"
parameter real fc = 0 from [O:inf); module loopFilter(n);
// flicker noise corner frequency (Hz) electrical n;
analog begin ground gnd;*
Theta(out) <+ flicker__noise(n, 2, "wpn") parameter real d = 1n from (0:inf);
+ flicker_noise(n*fc, 3, "fpn"); parameter real c2 = 200p from (0:inf);
end parameter real r = 10K from (O:inf);
endmodule electrical int;
capacitor #(.c(d)) C1(n, gnd);
Listing 4 — Phase-domain VCO noise model. capacitor #(.c(c2)) C2(n, int);
resistor #(.r(r)) R(int, gnd);
include "discipline.h"
endmodule
include "constants.h"
module vco(in, out); t The ground statement is not currently supported in Cadence's Ver-
input in; output out; ilog-A implementation, so instead ground is explicitly passed into
voltage in; the module.
phase out;
parameter real gain = 1 from (0:inf); V. PHASE DETECTOR AND CHARGE PUMP
//transfer gain, Kvco (HzN)
parameter real n = 0 from [0:inf); As with the VCO, the noise of the PFD/CP as needed by the
// white output phase noise at 1 Hz (rad2/Hz) phase-domain model is found directly with simulation. Sim-
parameter real fc = 0 from [0:inf); ply drive the block with a representative periodic signal, per-
// flicker noise corner frequency (Hz) form a PNoise analysis, and measure the output noise current.
analog begin In this case, a representative signal would be one that pro-
Theta(out) <+ 2*'M_PI*gain*ldt(V(in)); duced periodic switching at the output. This is necessary to
Theta(out) <+ flickerjioise(n, 2, "wpn") capture the noise present during the switching process. Gen-
+ flicker_nofse(n*fc, 3, "fpn"); erally the noise appears as in Figure 7, in which case the noise
end is parameterized with n and/ c . n is the noise power density at
endmodule frequencies above the flicker noise corner frequency,/ c , and
below the noise bandwidth of the circuit.
strings passed to the noise functions are labels for the noise
The phase-domain model for the PFD/CP is given in
sources. They are optional and can be chosen arbitrarily,
Listing 6. It is based on (3). Alternatively, as before one could

53
A A.
A Cyclostationary Noise.
SQ Flicker sources dominate P 0]
Formally, the term cyclostationary implies that the autocorre-
^V^TJI White sources dominate lation
a
* tJ function of a stochastic process varies with / in a peri-
odic fashion [19, 20], which in practice is associated with a
T" TTS. pei
periodic variation in the noise power of a signal. In general,
/c \. the noise produced by all of the nonlinear blocks in a PLL is
1 N*^ str
strongly cyclostationary. To understand why, consider the
Noise bandwidth noise produced by a logic circuit, such as the inverter shown
noi
Fig. 7. Extracting the noise parameters, n and / c , for the PFD/CP. in
i n Figure 8. The noise at the output of the inverter, n out ,

The graph is plotted on a log-log scale. comes


coi from different sources depending on the phase of the
output signal, v out . When the output is high, the output is
ou
use the noise jtable function in lieu of the white_noise and jinsensitive
ng to small changes on the input. The transistor A/p is
flickerjnoise functions to use the measured noise results on Q n and the noise at the output is predominantly due to the
directly. thermal
the noise from its channel. This is region A in the figure.
Wl
When the output is low, the situation is reversed and most of
Listing 6 — Phase-domain phase detector noise model. the output noise is due to the thermal noise from the channel
'include "discipline.h" of
o f A/N. This is region B. When the output is transitioning,

* include "constants.h" thermal


the noise from both Afp and M N contribute to the output.
In addition, the output is sensitive to small changes in the
module phaseDetector(pin, nin, out); input. In fact, any noise at the input is amplified before reach-
m
input pin, nin; output out; *
ing
in the output. Thus, noise from the input tends to dominate
phase pin, nin; *
electrical out;
over
ov the thermal noise from the channels of M P and M N in
parameter real gain = 1 from (O:inf); this
thi region. Noise at the input includes noise from the previ-
// transfer gain (A/cycle) ous
ou stage and noise from both devices in the form of flicker
parameter real n = 0 from [O:inf); noise
no and thermal noise from gate resistance. This is region C
// white output current noise (A2/Hz) in
[n the figure.
parameter real fc = 0 from [Orinf);
// flicker noise corner frequency (Hz)
analog begin Hr ^p
l(out) <+ gain * Theta(pin,nin) / (2**M_PI); in out
l(out) <+ white_noise(n, "wpn")
+ flicker_noise(n*fc, 1, "fpn");
end
endmodule
v
outh

VI. FREQUENCY DIVIDERS

There are several reasons why the process of extracting the


noise produced by the frequency dividers is more complicated
than that needed for other blocks. First, the phase noise is
needed and, as of the time when this document was written,
SpectreRF reports on the total noise and does not yet make
the phase noise available separately. Secondly, the frequency
dividers are always followed by some form of edge-sensitive
thresholding circuit, in this case the PFD, which implies that p^
Fig. 8. Noise produced by an inverter (nout) as a function of the
the overall noise behavior of the PLL is only influenced by oui
output signal (vout). In region A the noise is dominated by the thermal
n0]
the noise produced by the divider at the time when the thresh- noise of Mp in region B its dominated by the thermal noise of A/^,
n< in region C the output noise includes the thermal noise from both
old is being crossed in the proper direction. The noise pro- |jand
duced by the frequency divider is cyclostationary, meaning devices as well as the amplified noise from the input.
that the noise power varies over time. Thus, it is important to JThe
J challenge in estimating the effect of noise passing
analyze the noise behavior of the divider carefully. The sec- m]
through a threshold is the difficulty in estimating the noise at
ond issue is discussed first. tfa
the point where the threshold is crossed. There are several dif-
ferent ways of estimating the effect of this noise, but the sim-
plest is to use the strobed noise feature of SpectreRF. * When
the strobed noise feature is active, the noise produced by the

54
circuit is periodically sampled to create a discrete-time ran- v is T periodic, which makes dv(iT)ldt a constant, and so
dom sequence, as shown in Figure 9. SpectreRF then com-
putes the power-spectral density of the sequence. The sample S^f) = [2nfo/^Pjsn{f). (32)
time would be adjusted to coincide with the desired threshold
crossings. Since the T-periodic cyclostationary noise process where Sn(f) and S^(f) are the power spectral densities of the
is sampled every T seconds, the resulting noise process is sta- ni and <(),• sequences.
tionary. Furthermore, the noise present at times other than at
the sample points is completely ignored. C. Phase-Domain Model for Dividers
To extract the phase noise of a divider, drive the divider with a

i Pi Pi P. n. r representative periodic input signal and perform a PSS analy-


sis to determine the threshold crossing times and the slew rate
\J \J u
V (t)
{dvldt) at these times. Then use SpectreRF's strobed PNoise
analysis to compute Sn(f). When running PNoise analysis,

1 *
t
assure that the maxsidebands parameter is set sufficiently
large to capture all significant noise folding. A large value

I i
Fig. 9. Strobed noise. The lower waveform is a highly magnified
will slow the simulation. To reduce the number of sidebands
needed, use T as small as possible. S^(f) is then computed
from (32). Generally the noise appears as in Figure 10. Notice
that the noise is periodic in/with period 1/7 because n is a
view of the noise present at the strobe points in vn, which are chosen
to coincide with the threshold crossings in v. discrete-time sequence with period T. The parameters n and/ c
for the divider are extracted as illustrated. The high frequency
B. Converting to Phase Noise roll-off is generally ignored because it occurs above the fre-
The act of converting the noise from a continuous-time pro- quency range of interest.
cess to a discrete-time process by sampling at the threshold
crossings makes the conversion into phase noise easier. If vn 5A Flicker sources dominate
is the continuous-time noisy response, and v is the noise-free «r\ White sources dominate
A
/ \
response (response with the noise sources turned off), then^
n;=v n (/7)~v0T). (27>
Then if vn is noisy because it is corrupted with a phase noise
Noise bandwidth \ / UT
process 0, then

v(t+m
Vn{t) = 2itf/ (28) Fig. 10. Extracting the noise parameters, n and/ c , for the divider.

Assume the phase noise § is small and linearize v using a Tay- With ripple counters, one usually only characterizes one stage
lor series expansion at a time and combines the phase noise from each stage by
assuming that the noise in each stage is independent (true for
(29) device noise, would not be true for noise coupling into the
divider from external sources). The variation due to phase
and noise accumulates, however it is necessary to account for the
increasing period of the signals at each stage along the ripple
ni . sv(jT) + MiI)gZ)_ v ( j T ) = M L D | p . (30) counter. Consider an intermediate stage of a /sT-stage ripple
' at 2nf0n at lntn
counter. The total phase noise at the output of the ripple
Finally, <|>,- can be found from nt using counter that results due to the phase noise 5 ^ at the output of
. , ,dv(iT) stage k is (TK/T02S^. So the total phase noise at the output
(31) of the ripple counter is
K

Vut = 4 £ ^ (33)
t The strobed-noise feature of SpectreRF is also referred to as its
time-domain noise feature. *=o *
arem e
t It is assumed that the sequence nt is formed by sampling the noise where S^ and 7Q phase noise and signal period at the
at iT, which implies that the threshold crossings also occur at iT. In input to the first stage of the ripple counter.
practice, the crossings will occur at some time offset from iT. That
offset is ignored. It is done without loss of generality with the under- With undesired variations in the supplies or in the substrate
standing that the functions v and vn can always be reformulated to the resulting phase noise in each stage would be correlated, so
account for the offset. one would need to compute the transfer function from the sig-

55
nal source to the phase noise of each stage and combine in a a frequency divider that implements non-integer divide ratio
vector sum. except in a few very restrictive cases, so instead a divider that
Unlike in ripple counters, phase noise does not accumulate is capable of switching between two integer divide ratios is
with each stage in synchronous counters. Phase noise at the used, and one rapidly alternates between the two values in
output of a synchronous counter is independent of the number such a way that the time-average is equal to the desired non-
of stages and consists only of the noise of its clock along with integer divide ratio [13]. A block diagram for a fractional-Af
the noise of the last stage. synthesizer is shown in Figure 11. Divide ratios of N and
N + 1 are used, where N is the first integer below the desired
The phase-domain model for the divider, based on (2), is divide ratio, and N + 1 is the first integer above. For example,
given in Listing 7. As before, one could use the noisejtable if the desired divide ratio is 16.25, then one would alternate
function in lieu of the white_noise and flickerjnoise functions between the ratios of 16 and 17, with the ratio of 16 being
to use the measured noise results directly. used 75% of the time. Early attempts at fractional-N synthesis
alternated between integer divide ratios in a repetitive man-
Listing 7 — Phase-domain divider noise model.
ner, which resulted in noticeable spurs in the VCO output
Include "discipline.h" spectrum. More recently, AZ modulators have been used to
generate a random sequence with the desired duty cycle to
module divider(in, out);
control the multi-modulus dividers [21]. This has the effect of
input in; output out;
trading off the spurs for an increased noise floor, however the
phase in, out;
parameter real ratio = 1 from (OAni);//divide ratio AZ modulator can be designed so that most of the power in its
parameter real n = 0 from [0:inf); output sequence is at frequencies that are above the loop
// white output phase noise (rads?/Hz) bandwidth, and so are largely rejected by the loop.
parameter real fc = 0 from [0:inf);
// flicker noise corner frequency (Hz) /ref
osc
analog begin PFD CP LF VCO
/out
Theta(out) <+ Theta(in) / ratio;
Theta(out) <+ white_noise(n, "wpn") /J FD
+ flicker_noise(n*fc, 1, "fpn"); +N, N+l
end
endmodule Mod

Fig. 11. The block diagram of a fractional-N frequency synthesizer.


VII. FRACTIONAL-N SYNTHESIS
The phase-domain small-signal model for the combination of
One of the drawbacks of a traditional frequency synthesizer, a fractional-// divider and a AE modulator is given in
also known as an integer-N frequency synthesizer, is that the Listing 8. It uses the noisejtable function to construct a sim-
output frequency is constrained to be N times the reference ple piece-wise linear approximation of the noise produced in
frequency. If the output frequency is to be adjusted by chang- an rfi1 order AE modulator that is parameterized with the low
ing N, which is constrained by the divider to be an integer, frequency noise generated by the modulator, along with the
then the output frequency resolution is equal to the reference corner frequency and the order.
frequency. If fine frequency resolution is desired, then the ref-
erence frequency must be small. This in turn limits the loop
bandwidth as set by the loop filter, which must be at least 10 VIII. JITTER
times smaller than the reference frequency to prevent signal The signals at the input and output of a PLL are often binary
components at the reference frequency from reaching the signals, as are many of the signals within the PLL. The noise
input of the VCO and modulating the output frequency, creat- on binary signals is commonly characterized in terms of jitter.
ing spurs or sidebands at an offset equal to the reference fre-
Jitter is an undesired perturbation or uncertainty in the timing
quency and its harmonics. A low loop bandwidth is
of events. Generally, the events of interest are the transitions
undesirable because it limits the response time of the synthe-
in a signal. One models jitter in a signal by starting with a
sizer to changes in N. In addition, the loop acts to suppress the
noise-free signal v and displacing time with a stochastic pro-
phase noise in the VCO at offset frequencies within its band-
cess./. The noisy signal becomes
width, so reducing the loop bandwidth acts to increase the
total phase noise at the output of the VCO. vn(0 = v(r+y(0) (34)

The constraint on the loop bandwidth imposed by the withy assumed to be a zero-mean process and v assumed to be
required frequency resolution is eliminated if the divide ratio a 7-periodic function, j has units of seconds and can be inter-
N is not limited to be an integer. This is the idea behind frac- preted as a noise in time. Alternatively, it can be reformulated
tional-N synthesis. In practice, one cannot directly implement as a noise is phase, or phase noise, using

56
Listing 8 — Phase-domain fractional-N divider model. between transitions. The next metric characterizes the corre-
lations between transitions as a function of how far the transi-
include "discipline.h" tions are separated in time.
module divider(in, out); Define Jk(i) to be the standard deviation of ti+k - th
input in; output out;
phase in, out; 7,(0 = Vvar(f,. + , - * , . ) . (38)
parameter real ratio = 1 from (O:lnf);// divide ratio Jk(i) is referred to as k-cycle jitter or long-term jitter '. It is a 1
parameter real n = 0 from [0:inf);
measure of the uncertainty in the length of k cycles and has
// white output phase noise (rads?/Hz)
units of time. 7j, the standard deviation of the length of a sin-
parameter real fc = 0 from [0:inf);
// flicker noise corner frequency (Hz) gle period, is often referred to as the period jitter, and it
parameter real bw = 1 from (O:inf);//AX mod bandwidth denoted J, where J = 7].
parameter integer order = 1 from (0:9);//AZ mod order Another important jitter metric is cycle-to-cycle jitter. Define
parameter real fmax = 10*bw from (bw:inf); 7} = ft-+i - tx to be the period of cycle i. Then the cycle-to-
// maximum frequency of concern cycle jitter 7CC is
analog begin
Theta(out) <+ Theta(in) / (ratio + noise_table([ ' c c » = V Var < 7 '.- + l- 7 '*>- <39>
0, n, Cycle-to-cycle jitter is like edge-to-edge jitter in that it is a
bw, n, scalar jitter metric that does not contain information about the
fmax, n*pow((fmax/bw),order) correlation in the jitter between distant transitions. However,
], "dsn"));
it differs in that it is a measure of short-term jitter that is rela-
end
tively insensitive to long-term jitter [22]. As such, cycle-to-
endmodule
cycle jitter is the only jitter metric that is suitable for use
<K0 = 27t/ o7 (0, (35) when flicker noise is present. All other metrics are unbounded
in the presence of flicker noise.
where/ o = 1/Fand
If7(0 is either stationary or T-cyclostationary, then {t{\ is sta-
vn(0 = v(, + f | ) . (36) tionary, meaning that these metrics do not vary with i, and so
7 e e (0, «J&(0> a n d Jcc(0 c a n b e shortened to 7 ee , Jk, and 7CC.
These jitter metrics are illustrated in Figure 12.
A. Jitter Metrics
Define {^} as the sequence of times for positive-going zero
crossings, henceforth referred to as transitions, that occur in edge-to-edge jitter ~ \ K_c
vn. The various jitter metrics characterize the statistics of this
sequence. jeeu) = 7^8^j "*" p '
The simplest metric is the edge-to-edge jitter, 7 ee , which is
the variation in the delay between a triggering event and a
fc-cycle jitter "~i |—I |—| |—i r~\
response event. When measuring edge-to-edge jitter, a clean
jitter-free input is assumed, and so the edge-to-edge jitter 7 ec Jk(i) = Jynr(tl + k-tii H* ^\
is HI k cycles ~ti+k
(37)
cycle-to-cycle jitter p H * ~H
Edge-to-edge jitter assumes an input signal, and so is only
defined for driven systems. It is an input-referred jitter metric, '««= j™iTi+l-Ti) J l J l J U L f
meaning that the jitter measurement is referenced to a point Fig. 12. The various jitter metrics.
on a noise-free input signal, so the reference point is fixed. No
such signal exists in autonomous systems. The remaining jit- B. Types of Jitter
ter metrics are suitable for both driven and autonomous sys- The type of jitter produced in PLLs can be classified as being
tems. They gain this generality by being self-referred, from one of two canonical forms. Blocks such as the PFD,
meaning that the reference point is on the noisy signal for CP, and FD are driven, meaning that a transition at their out-
which the jitter is being measured. These metrics tend to be a put is a direct result of a transition at their input. The jitter
bit more complicated because the reference point is noisy,
which acts to increase the measured jitter.
t Some people distinguish betweenfc-cyclejitter and long-term jitter
Edge-to-edge jitter is also a scalar jitter metric, and it does not by defining the long-term jitter J^ as being thefc-cyclejitter Jk as
convey any information about the correlation of the jitter k -»«>.

57
exhibited by these blocks is referred to as synchronous jitter, J
it is a variation in the delay between when the input is k® = Vvar<'i + ft-'i>' < 43)
received and the output is produced. Blocks such as the OSC <>*(0 = Vvaraa + k)T+jsync(ti + k)] - [iT+j^)]) ,(44)
and VCO are autonomous. They generate output transitions
not as a result of transitions at their inputs, but rather as a y,(/) = 72varO sync (/ /) ). (45)
result of the previous output transition. The jitter produced by
Jk(i) = V27ee(0 . (46)
these blocks is referred to as accumulating jitter, it is a varia-
tion in the delay between an output transition and the subse- Since 7Sync(0 is jT-cyclostationary ysync =; syn c('/) is indepen-
quent output transition. Table I previews the basic dent of i, and so is 7 ee and Jk. The factor of 72 in (46) stems
characteristics of these two types of jitter. The formulas for from the length of an interval including the independent vari-
jitter given in this table are derived in the next two sections. ation from two transitions. From (46), Jk is independent of £,
and so
T A B L E I: THE TWO CANONICAL FORMS OF JITTER.
Jk = J for k = 1,2, ...m. (47)
Jitter Type Circuit Type Period Jitter Using similar arguments, one can show that with simple syn-
chronous jitter,
driven , /var(« ( r ) )
Jcc = J, (48)
synchronous (pFD/cp ^ J = ^ / < f t
Generally, the jitter produced by the PFD/CP and FDs is well
. . autonomous ,—
approximated by simple synchronous jitter if one can neglect
accumulating | ( Q S Q yep) | 7 = ToT
flicker noise.

IX. SYNCHRONOUS JITTER A. Extracting Synchronous Jitter


Synchronous jitter is exhibited by driven systems. In the PLL, The jitter in driven blocks, such as the PFD/CP or FDs, occurs
the PFD/CP and FDs exhibit synchronous jitter. In these com- because of an interaction between noise present in the blocks
ponents, an output event occurs as a direct result of, and some and the thresholds that are inherent to logic circuits.
time after, an input event. It is an undesired fluctuation in the In systems where signals are continuous valued, an event is
delay between the input and the output events. If the input is a usually defined as a signal crossing a threshold in a particular
periodic sequence of transitions, then the frequency of the direction. The threshold crossings of a noiseless periodic sig-
output signal is exactly that of the input, but the phase of the nal, v(0, are precisely evenly spaced. However, when noise is
output signal fluctuates with respect to that of the input. The added to the signal, vn(r) = v(i) + nv(t), each threshold cross-
jitter appears as a modulation of the phase of the output, ing is displaced slightly. Thus, a threshold converts additive
which is why it is sometimes referred to as phase modulated noise to synchronous jitter.
or PM jitter.
The amount of displacement in time is determined by the
Let T| be a stationary or T-cyclostationary process, then amplitude of the noise signal, nv(t) and the slew rate of the
(40) periodic signal, dv(tc)/dt, as the threshold is crossed, as shown
W) = ^ in Figure 13 [23]. If the noise /^ is stationary, then
v n (0 = Ht+jsync(O) (41)
var0 ))s (49)
exhibits synchronous jitter. If t| is further restricted to be a
white Gaussian stationary or T-cyclostationary process, then
-^ ^7^
vn(0 exhibits simple synchronous jitter. The essential charac- where tc is the time of a threshold crossing in v (assuming the
teristic of simple synchronous jitter is that the jitter in each noise is small).
event is independent or uncorrelated from the others, and (35)
shows that it corresponds to white phase noise. Driven cir- 'c
Av
cuits exhibit simple synchronous jitter if they are broadband
and if the noise sources are white, Gaussian and small. The
sources are considered small if the circuit responds linearly to Threshold Noise
the noise, even though at the same time the circuit may be Histogram
responding nonlinearly to the periodic drive signal.
For systems that exhibit simple synchronous jitter, from (37),
J •At
Jf>= J™<Jsync«i»- (42)
Similarly, from (38), Jitter Histogram
Fig. 13. How a threshold converts noise into jitter.

58
Generally nv is not stationary, but cyclostationary (refer back With ripple counters, one usually only characterizes one stage
to Section VI-A). It is only important to know when the noisy at a time. The total jitter due to noise in the ripple counter is
periodic signal vn(t) crosses the threshold, so the statistics of then computed by assuming that the jitter in each stage is
nv are only significant at the time when vn(t) crosses the independent (again, this is true for device noise, but not for
threshold, noise coupling into the divider from external sources) and
var(n ( O ) taking the square-root of the sum of the square of the jitter on
v
"°- < '< )) - T^pk>- <50) each stage.
Unlike in ripple counters, jitter does not accumulate with syn-
The jitter is computed from (42) using (49) or (50), chronous counters. Jitter in a synchronous counter is indepen-
dent of the number of stages and consists only of the jitter of
V its clock along with the jitter of the last stage.
dv(tc)/dt '
To compute var(nv(rc)), one starts by driving the circuit with a 2) Extracting the Jitter of the Phase Detector: The PFD/CP is
representative periodic signal, and then sampling v(t) at inter- not followed by a threshold. Rather, it feeds into the LF,
vals of 7to form the ergodic sequence {v(rl)} where tt = tc for which is sensitive to the noise emitted by the CP at all times,
some i. Then the variance is computed by computing the not just during transitions. This argues that the noise of the
power spectral density for the sequence by integrating from PFD/CP be modeled as a continuous noise current. However,
/ = ~/0/2 to/ o /2. Recall that the noise is periodic in/with as mentioned earlier, doing so is problematic for simulators
period/o = 1/7because n is a discrete-time sequence with rate and would require very tight tolerances and small time steps.
T. So instead, the noise of the PFD/CP is referred back to its
inputs. The inputs of the PFD/CP are edge triggered, so the
In practice, this is done by using the strobed noise capability noise can be referred back as jitter.
of SpectreRF^ to compute the power spectral density of the
sequence. When the strobed noise feature is active, the noise To extract the input-referred jitter of a PFD/CP, drive both
produced by the circuit is periodically sampled to create a dis- inputs with periodic signals with offset phase so that the PFD/
crete-time random sequence, as shown in Figure 9. SpectreRF CP produces a representative output. Use SpectreRF's PNoise
then computes the power-spectral density of the sequence. analysis to compute the output noise over the total bandwidth
The sample time should be adjusted to coincide with the of the PFD/CP (in this case, use the conventional noise analy-
desired threshold crossings. Since the T-periodic cyclostation- sis rather than the strobed noise analysis). Choose the fre-
ary noise process is sampled every T seconds, the resulting quency range of the analysis so that the total noise at
noise process is stationary. Furthermore, the noise present at frequencies outside the range is negligible. Thus, the noise
times other than at the sample points is completely ignored. should be at least 40 dB down and dropping at the highest fre-
quency simulated. Integrate the noise over frequency and
1) Extracting the Jitter of Dividers: To extract the jitter of a apply Wiener-Khinchin Theorem [24] to determine
divider, drive the divider with a representative periodic input
signal and perform a PSS analysis to determine the threshold var(n) = f Sn(f)df, (53)
—oo
crossing times and the slew rate (dv/dt) at these times. Then
use SpectreRF's strobed PNoise analysis to compute £„(/). the total output noise current squared [19]. Then either calcu-
The sample point should be set to coincide with the point late or measure the effective gain of the PFD/CP, K^cV Scale
where the output signal crosses the threshold of the subse- the gain so that it has the units of amperes per second. Then
quent stage (the phase detector) in the appropriate direction. divide the total output noise current by the gain and account
When running PNoise analysis, assure that the maxsidebands for there being two transitions per cycle to distribute the noise
parameter is set sufficiently large to capture all significant over to determine the input-referred jitter for the PFD/CP,
noise folding. A large value will slow the simulation. To
reduce the number of sidebands needed, use T as small as J = T F**W (54)
ee K }
possible. SpectreRF computes the power spectral density, PFD/cp 2nKdJ 2 '
which is integrated to compute the total noise at the sample As before, when running PNoise analysis, assure that the
points, maxsidebands parameter is set sufficiently large to capture all
significant noise folding. A large value will slow the simula-
/°/2 tion. To reduce the number of sidebands needed, use T as
var(nv(fc)) = J Sn(f,tz)df. (52) small as possible.
•'o

Then J^ is computed from (51).


X. ACCUMULATING JITTER

t The strobed-noise feature of SpectreRF is also referred to as its Accumulating jitter is exhibited by autonomous systems, such
timf-Hotnain nnisp. fpfltiire. as oscillators, that generate a stream of spontaneous output

59
transitions. In the PLL, the OSC and VCO exhibit accumulat- Similarly,
ing jitter. Accumulating jitter is characterized by an undesired
Jcc = 727. (59)
variation in the time since the previous output transition, thus
the uncertainty of when a transition occurs accumulates with Generally, the jitter produced by the OSC and VCO are well
every transition. Compared with a jitter free signal, the fre- approximated by simple accumulating jitter if one can neglect
quency of a signal exhibiting accumulating jitter fluctuates flicker noise.
randomly, and the phase drifts without bound. Thus, the jitter
appears as a modulation of the frequency of the output, which A. Extracting Accumulating Jitter
is why it is sometimes referred to as frequency modulated or The jitter in autonomous blocks, such as the OSC or VCO, is
FM jitter. almost completely due to oscillator phase noise. Oscillator
Again assume that T| be a stationary or T-cyclostationary pro- phase noise is a variation in the phase of the oscillator as it
cess, then proceeds along its limit cycle.
In order to determine the period jitter / of vn(f) for a noisy
W ) = f neorfc (55) oscillator, assume that it exhibits simple accumulating jitter
J
o so that T| in (55) is a white Gaussian r-cyclostationary noise
process (this excludes flicker noise) with a power spectral
v n (0 = v(/+y a c c (O) (56)
density of
exhibits accumulating jitter. While Tj is cyclostationary and so
has bounded variance, (55) shows that the variance of y acc , S^(f)= a, (60)
and hence the phase difference between v(t) and v n (0, is and an autocorrelation function of
unbounded. Rr](tvt2) = ab{tx-t2)y (61)
If t| is further restricted to be a white Gaussian stationary or
T-cyclostationary random process, then v n (0 exhibits simple where 8 is a Kronecker delta function. Then
accumulating jitter. In this case, the process {yacc(*T)} that
results from sampling y a c c every T seconds is a discrete AccW = f T\TW (62>
Wiener process and the phase difference between v(/7) and
vn(/7) is a random walk [19]. As shown next, simple accumu- is a Wiener process [19], which has an autocorrelation func-
lating jitter corresponds to oscillator phase noise that results tion of
from white noise sources. R
j (*!> l2> = amin(f 1? h^ • ( 63 >
The essential characteristic of simple accumulating jitter is
that the incremental jitter that accumulates over each cycle is The period jitter is the standard deviation of the variation in
independent or uncorrelated. Autonomous circuits exhibit one period, and so
simple accumulating jitter if they are broadband and if the Jl
= ™0'acc('+7Wacc«). (64)
noise sources are white, Gaussian and small. The sources are
considered small if the circuit responds linearly to the noise, ^2 = E [ 0 a c c 0 + 7 ) - j a c c ( 0 ) 2 ] (65)
though at the same time the circuit may be responding nonlin- 2 2 2
J = E[/ acc (r + T) - 2jacc(t + 7); acc (0 +; a c c (0 ] (66)
early to the oscillation signal. An autonomous circuit is con-
sidered broadband if there are no secondary resonant Jl = EL/acc« + 7) 2 ] " 2 Et/ a c c (/ + 7)y acc (/)] + E[/ a c c (0 2 ] (67)
responses close in frequency to the primary resonance.*
J2 = R. (t + T,t + T)-2Rj (/+7W) + * / . (M) (68)
For systems that exhibit simple accumulating jitter, each tran- •'ace •'ace 'ace
sition is relative to the previous transition, and the variation in J2 = a(t + T) - 2at + at (69)
the length of each period is independent, so the variance in
the time of each transition accumulates, / = Jaf (70)
Jk= 4~kJ for k = 0, 1 , 2 , . . . , (57) We now have a way of relating the jitter of the oscillator to the
PSD of T|. However, x\ is not measurable, so instead the jitter
where is related to the phase noise S§. To do so, consider simple
J = ^varO^. +^-varO-^,.)). (58) accumulating jitter written in terms of phase,

•accW = 2nfohcc^ = 2%fo hOOrfC, (71)


t Oscillators are strongly nonlinear circuits undergoing large peri-
odic variations, and so signals within the oscillator freely mix up and
down in frequency by integer multiples of the oscillation frequency. where/„ = 1/r. From (60) and (71) the PSD of ^ is
For this reason, any low frequency time constants or resonances in
supply or bias lines would effectively act like close-in secondary res- 5* (A/) = a
(2rc/o)2 _ aft (72)
onances. In fact, this is the most likely cause of such phenomenon. ^acc (2nAf)2 A/ 2 '

60
From (26) XI. JITTER OF A PLL

If a PLL synthesizer is constructed from blocks that exhibit


UAf) = fa (A/) = ^ - , 2
(73) simple synchronous and accumulating jitter, then the jitter
2 ^acc 2A/
behavior of the PLL is relatively easy to estimate [26].
a = 2UAf)^ . (74) Assume that the PLL has a closed-loop bandwidth of/ L , and
that x L = l/2rc/L, then for k such that kT « T L , jitter from the
VCO dominates and the PLL exhibits simple accumulating
Determine a by choosing A/well above the corner frequency,
t0 jitter equal to that produced by the VCO. Similarly, at large k
/comer avoid ambiguity and well below/ o to avoid the noise
(low frequencies), the PLL exhibits simple accumulating jitter
from other sources that occur at these frequencies.
equal to that produced by the OSC. Between these two
1) Example: To compute the jitter of an oscillator, an RF sim- extremes, the PLL exhibits simple synchronous jitter. The
ulator such as SpectreRF is used to find L &ndfo of the oscil- amount of which depends on the characteristics of the loop
lator. Given these, a is found with (74), J is found with (70) and the level of synchronous jitter exhibited by the FDs and
and Jk is found with (57). This procedure is demonstrated for the PFD/CP. The behavior of such a PLL is shown in
the oscillator shown in Figure 14. This is a very low noise Figure 15.
oscillator designed in O.35JI CMOS by of Rael and Abidi
[25]. The frequency of oscillation is 1.1 GHz and the resona- Accumulating jitter
tor has a loaded Q of 6. from OSC
Accumulating jitter
from VCO ^
logC/*)
T J
Synchronous jitter from
PFD/CP, FDs

AJ log(*)
Fig. 15. Long-term jitter (Jk) for an idealized PLL as a function of

' <j -|jK<-t-. ; ' ' the number of cycles.

XII. MODELING A PLL WITH JITTER


®'DD
The basic behavioral models for the blocks that make up a
PLL are well known and so will not be discussed here in any
Fig. 14. Differential LC oscillator. depth [27, 28]. Instead, only the techniques for adding jitter to
the models are discussed.
The procedure starts by using an RF simulator such as Spec-
treRF to compute the normalized phase noise L. Its PNoise Jitter is modeled in an AHDL by dithering the time at which
analysis is used, with the maxsidebands parameter set to at events occur. This is efficient because it does not create any
least 10 to adequately account for noise folding within the additional activity, rather it simply changes the time when
oscillator.* In this case, £ = - 1 1 0 dBc at 100 kHz offset from existing activity occurs. Thus, models with jitter can run as
the carrier. Apply (74) to compute a from L, where £( A/) = efficiently as those without.
10"11, A/= 100 kHz, and/ 0 = 1.1 GHz,
A. Modeling Driven Blocks

a = 2 • 10" 11 1Q
= 165.3X10" 21 . (75) A feature of Verilog-A allows especially simple modeling of
Vl.lxlOV synchronous jitter. The transitionQ function, which is used to
model signal transitions between discrete levels, provides a
The period jitter J is then computed from (70), delay argument that can be dithered on every transition. The
/ r* fa /165.3 x 10~ 21 1O~. ,nf-, delay argument must not be negative, so a fixed delay that is
J = JaT = / - = /— = 12.3 fs. (76) greater than the maximum expected deviation of the jitter
^/0 A/ 1.1 GHz must be included. This approach is suitable for any model that
In this example, the noise was extracted for the VCO alone. In exhibits synchronous jitter and generates discrete-valued out-
practice, the LF is generally combined with the VCO before puts. It is used in the Verilog-A divider module shown in
extracting the noise so that the noise of the LF is accounted Listing 9, which models synchronous jitter with (41) where
for. 7 sync *s a stationary white discrete-time Gaussian random pro-
cess. It is also used in Listing 10, which models a simple
t At one point it was mistakenly suggested in the documentation for PFD/CP.
SpectreRF that maxsidebands should be set to 0 for oscillators. This
causes SpectreRF to ignore all noise folding and results in a signifi- 1) Frequency Divider Model: The model, given in Listing 9,
cant underestimation of the total noise. operates by counting input transitions. This is done in the

61
Listing 9 — Frequency divider that models synchronous jitter. Listing 10 — PFD/CP model with synchronous jitter.

include "discipline.h" include "discipline.h"


module divider (out, in); module pfd_cp (out, ret, vco);
input in; output out; electrical in, out; input ref, vco; output out; electrical ref, vco, out;
parameter real Vlo=-1, Vhi=1; parameter real lout=100u;
parameter integer ratio=2 from [2:inf); parameter integer dir=1 from [-1:1] exclude 0;
parameter integer dir=1 from [-1:1] exclude 0; //dir=1 for positive edge trigger
//dir=1 for positive edge trigger //dir=-1 for negative edge trigger
//dir=-1 for negative edge trigger parameter real tt=1n from (0:inf);
parameter real tt=1n from (0:inf); parameter real td=O from (0:inf);
parameter real td=O from (0:inf); parameter real jitter=O from [0:\d/5);//edge-to-edge jitter
parameter real jitter=O from [0:td/5);//edge-to-edge jitter parameter real ttol=1p from (0:td/5);//tfo/«jitter
parameter real ttol=1p from (0:td/5);// ttoi« jitter
integer state, seed;
integer count, n, seed; real dt;
real dt; analog begin
analog begin @(initiaLstep) seed = 716;
@(initial_step) seed = -311 ; @ (cross(V(ref), dir, ttol)) begin
@(cross(V(in) - (Vhi + Vlo)/2, dir, ttol)) begin if (state > -1) state = state - 1 ;
//count input transitions dt = jitter*$dist_normal(seed,0,1);
count = count + 1; end
if (count >= ratio) @(cross(V(vco), dir, ttol)) begin
count = 0; if (state < 1) state = state + 1;
n = (2*count >= ratio);
dt = jitter*$dist_normal(seed,0,1);
//add jitter
end
dt = jitter*$dist_normal(seed,0,1);
end l(out) <+ transition(lout*state, td + dt, tt);
end
V(out) <+ transition^ ? Vhi: Vlo, td+dt, tt);
endmodule
end
endmodule
input in the direction dir, the output is decremented. If both
the VCO and reference inputs are at the same frequency, then
@ cross block. The cross function triggers the @ block at the
the average value of the output is proportional to the phase
precise moment when its first argument crosses zero in the
difference between the two, with the average being negative if
direction specified by the second argument. Thus, the @
the reference transition leads the VCO transition and positive
block is triggered when the input crosses the threshold in the
otherwise [3]. As before, the time of the output transitions is
user specified direction. The body of the @ block increments
randomly dithered by dt to model jitter. The output is mod-
the count, resets it to zero when it reaches ratio, then deter-
eled as an ideal current source and a finite transition time pro-
mines if count is above or below its midpoint (n is zero if the
vides a simple model of the dead band in the CP.
count is below the midpoint). It also generates a new random
dither dT that is used later. Outside the @ block is code that
B. Modeling Accumulating Jitter
executes continuously. It processes n to create the output. The
value of the ?: operator is Vhi if n is 1 and Vlo if n is 0. Finally, 1) OSC Model: The delay argument of the transition^) func-
the transition function adds a finite transition time of tt and a tion cannot be used to model accumulating jitter because of
delay of td + dt. The finite transition time removes the discon- the accumulating nature of this type of jitter. When modeling
tinuities from the signal that could cause problems for the a fixed frequency oscillator, the timerQ function is used as
simulator. The jitter is embodied in dt, which varies randomly shown in Listing 11. At every output transition, the next tran-
from transition to transition. To avoid negative delays, td must sition is scheduled using the timerQ function to be
always be larger than dt. This model expects jitter to be speci- T/K + Jb/Jk in the future, where 8 is a unit-variance zero-
fied as igg, as computed with (51). mean random process and K is the number of output transi-
tions per period. Typically, K = 2.
2) PFD/CP Model: The model for a phase/frequency detector
combined with a charge pump is given in Listing 10. It imple- C. VCO Model
ments a finite-state machine with a three-level output, - 1 , 0
and +1. On every transition of the VCO input in direction dir, A VCO generates a sine or square wave whose frequency is
the output is incremented. On every transition of the reference proportional to the input signal level. VCO models, given in

62
Listing 11 — Fixed frequency oscillator with accumulating jitter. AT isis aa random
AT random variable
variable with
with variance
variance

include "discipline.h" var(AT) = 2 j p = Jl. (78)


module osc (out);
Therefore,
output out; electrical out; 75,-
parameter real freq=1 from (0:inf); Axf. = —lz (79)
parameter real Vlo==-1, Vhi=1;
JK
parameter real tt=O.O1/freq from (O:inf); where 8 is a zero-mean unit-variance Gaussian random pro-
parameter real jitter=O from [0:0M1req);// period jitter cess. The dithered frequency is
integer n, seed;
real next, dT; f - if—1—^ - Kx
- fc
ram
fi ( }
analog begin ~ K\% + ti%) ~ i + Ax, " l + t f A V c
@ (initiaLstep) begin T
seed = 286;
Let A 7 . = ATAT£. , then
next = 0.5/freq + $abstime;
end
@(timer(next)) begin ''-ursj.- <81>
n = !n;
dT = jitter*$dist_normal(seed,0,1); Finally varCr,) = J2/K, so AT,- = JS/Jk and AT1- = jKJdr
next = next + 0.5/freq + 0.707*dT; The @ cross statement is used to determine the exact time
end when the phase crosses the thresholds, indicating the begin-
V(out) <+ transition^ ? Vhl: Vlo, 0, tt); ning of a new interval. At this point, a new random trial S£ is
end generated.
endmodule The final model given in Listing 12. This model can be easily
modified to fit other needs. Converting it to a model that gen-
Listings 12 and 13, are constructed using three serial opera- erates sine waves rather than square waves simply requires
tions, as shown in Figure 16. First, the input signal is scaled to replacing the last two lines with one that computes and out-
compute the desired output frequency. Then, the frequency is puts the sine of the phase. When doing so, consider reducing
integrated to compute the output phase. Finally, the phase is the number of jitter updates to one per period, in which case
used to generate the desired output signal. The phase is com- the factor of 1.414 should be changed to 1.
puted with idtmod, a function that provides integration fol-
lowed by a modulus operation. This serves to keep the phase Listing 13 is a Verilog-A model for a quadrature VCO that
bounded, which prevents a loss of numerical precision that exhibits accumulating jitter. It is an example of how to model
would otherwise occur when the phase became large after a an oscillator with multiple outputs so that the jitter on the out-
long period of time. Output transitions are generated when the puts is properly correlated.
phase passes -n/2 and n/2.
D. Efficiency of the Models

0) •
Vv JU Conceptually, a model that includes jitter should be just as
efficient as one that does not because jitter does not increase
Vin k Y,] f mod 2rc
the activity of the models, it only affects the timing of particu-
"Knit
lar events. However, if jitter causes two events that would nor-
75 mally occur at the same time to be displaced so that they are
Fig. 16. Block diagram of VCO behavioral model that includes no longer coincident, then a circuit simulator will have to use
jitter. more time points to resolve the distinct events and so will run
more slowly. For this reason, it is desirable to combine jitter
The jitter is modeled as a random variation in the frequency sources to the degree possible.
of the VCO. However, the jitter is specified as a variation in
To make the HDL models even faster, rewrite them in either
the period, thus it is necessary to relate the variation in the
Verilog-HDL or Verilog-AMS. Be sure to set the time resolu-
period to the variation in the frequency. Assume that without
tion to be sufficiently small to prevent the discrete nature of
jitter, the period is divided into K equal intervals of duration T
time in these simulators from adding an appreciable amount
= T/K = l/Kf0. The frequency deviation will be updated
of jitter.
every interval and held constant during the intervals. With jit-
ter, the duration of an interval is 1) Including Synchronous Jitter into OSC: One can combine
%. = T + AT.. (77) the output-referred noise of F D ^ and FD^ and the input-

63
Listing 12 — VCO model that includes accumulating jitter. Listing 13 — Quadrature Differential VCO model that includes
accumulating jitter.
include "discipline.h"
include "constants.h" include "discipline.h"
Include "constants.h"
module vco (out, in);
module quadVco (Plout.Nlout, PQout,NQout, Pin,Nin);
input in; output out; electrical out, in;
electrical Plout, Nlout, PQout, NQout, Pin, Nin;
parameter real Vmin=0;
output Plout, Nlout, PQout, NQout;
parameter real Vmax=Vmin+1 from (Vmin:inf);
input Pin, Nin;
parameter real Fmin=1 from (Orinf);
parameter real Fmax=2*Fmin from (Fmin:inf); parameter real Vmin=0;
parameter real Vlo=-1, Vhi=1; parameter real Vmax=Vmin+1 from (Vmin:inf);
parameter real tt=0.01/Fmax from (O:inf); parameter real Fmin=1 from (O:inf);
parameter real jitter=O from [0:0.25/Fmax);// period jitter parameter real Fmax=2*Fmin from (Fminrinf);
parameter real ttol=1u/Fmax from (0:1/Fmax); parameter real Vlo=-1, Vhi=1;
parameter real jitter=O from [0:0.25/Fmax);// period jitter
real freq, phase, dT;
parameter real ttol=1u/Fmax from (0:1/Fmax);
integer n, seed;
parameter real tt=0.01/Fmax;
analog begin
real freq, phase, dT;
©(initlaLstep) seed = - 5 6 1 ;
integer i, q, seed;
//compute the freq from the input voltage
analog begin
freq = (V(in) - Vmin)*(Fmax - Fmin) / (Vmax - Vmin)
@(initial_step) seed = 133;
+ Fmin;
//compute the freq from the input voltage
//bound the frequency (this is optional)
freq = (V(Pin.Nin) - Vmin) * (Fmax - Fmin) / (Vmax - Vmin)
if (freq > Fmax) freq = Fmax;
+ Fmin;
if (freq < Fmin) freq = Fmin;
//bound the frequency (this is optional)
/ / add the phase noise
if (freq > Fmax) freq = Fmax;
freq = f req/(1 + dT*freq);
if (freq < Fmin) freq = Fmin;
//phase is the integral of the freq modulo 2K
/ / add the phase noise
phase = 2*^M_PI*idtmod(freq, 0.0,1.0, -0.5);
freq = freq/(1 + dT*freq);
/ / update jitter twice per period
//phase is the integral of the freq modulo 2K
// 1A14=sqrt(K), K=2 jitter updates/period
phase = 2* % MJ D l*idtmod(freq, 0.0,1.0, -0.5);
@(cross(phase + *M_PI/2, + 1 , ttol) or
cross(phase - 'M_PI/2, +1, ttol)) begin // update jitter where phase crosses n/2
d T = 1.414*jitter*$dist_jiormal(seed,0,1); //2=sqrt(K), K=4 jitter updates per period
n = (phase >= - M_PI/2) && (phase < 'M_PI/2); @(cross(phase - 3**M_PI/4, +1, ttol) or
end cross(phase - x M_PI/4, + 1 , ttol) or
cross(phase + 'lvLPI/4, + 1 , ttol) or
//generate the output
cross(phase + 3**M__PI/4, +1, ttol)) begin
V(out) <+ transition^ ? Vhi: Vlo, 0, tt);
dT = 2*jitter*$dist_normal(seed,0,1);
end
I = (phase >= -3*^M_PI/4) && (phase < %M_PI/4);
endmodule
q = (phase >= - M_PI/4) && (phase < 3*%M_PI/4);
end
referred noise of the PFD/CP with the output noise of OSC. A
modified fixed-frequency oscillator model that supports two //generate the I and Q outputs
jitter parameters and the divide ratio M is given in Listing 14 V(Plout) <+ transition(i ? Vhi: Vlo, 0, tt);
(more on the effect of the divide ratio on jitter in the next sec- V(Nlout) <+ transition^ ? Vlo: Vhi, 0, tt);
tion). The accJitter parameter is used to model the accumulat- V(PQout) <+ transition^ ? Vhi: Vlo, 0, tt);
V(NQout) <+ transition(q ? Vlo : Vhi, 0, tt);
ing jitter of the reference oscillator, and the syncJitter
end
parameter is used to model the synchronous jitter of FD^,
endmodule
FDN and PFD/CP. Synchronous jitter is modeled in the oscil-
lator without using a nonzero delay in the transition function.
2) Merging the VCO and FDN: If the output of the VCO is
This is a more efficient approach because it avoids generating
not used to drive circuitry external to the synthesizer, if the
two unnecessary events per period. To get full benefit from
divider exhibits simple synchronous jitter, and if the VCO
this optimization, a modified PFD/CP given in Listing 15 is
exhibits simple accumulating jitter, then it is possible to
used. This model runs more efficiently by removing support
include the frequency division aspect of the FD^ as part of the
for jitter and the td parameter.

64
Listing 14 — Fixed-frequency oscillator with accumulating and Listing 15 — PFD/CP without jitter.
synchronous jitter.
include "discipline.h"
include "discipline.h"
module pfd_cp (out, ref, vco);
module osc (out);
input ref, vco; output out; electrical ref, vco, out;
output out; electrical out;
parameter real lout=100u;
parameter real freq=1 from (0:inf); parameter integer dir=1 from [-1:1] exclude 0;
parameter real ratio=1 from (0:inf); //dir= 1 for positive edge trigger
parameter real Vlo=-1, Vhi=1; // dir = -1 for negative edge trigger
parameter real tt=0.01*ratio/freq from (0:inf); parameter real tt=1n from (0:inf);
parameter real accJitter=O from [O:O.1/freq); //period jitter parameter real ttol=1p from (0:inf);
parameter real syncJitter=O from [0:0.1 *ratlo/freq);
integer state;
// edge-to-edge jitter
analog begin
integer n, accSeed, syncSeed;
@(cross(V(ref), dir, ttol)) begin
real next, dT, dt, accSD, syncSD;
If (state > -1) state = state - 1;
analog begin end
@(initial_step) begin @(cross(V(vco), dir, ttol)) begin
accSeed = 286; if (state < 1) state = state + 1;
syncSeed = -459; end
accSD = accJltter*sqrt(ratio/2);
l(out) <+ transition(lout * state, 0, tt);
syncSD = syncJitter;
end
next = 0.5/freq + $abstime;
endmodule
end
@(timer(next + dt)) begin Thus, to merge the divider into the VCO, the VCO gain must
n = !n; be reduced by a factor of N, the period jitter increased by a
dT = accSD*$dist_normal(accSeed,0,1); factor of JN , and the divider model removed.
dt = syncSD*$dist_normal(syncSeed,0,1);
next = next + 0.5*ratio/freq + dT; After simulation, it is necessary to refer the computed results,
end which are from the output of the divider, to the output of
VCO, which is the true output of the PLL. The period jitter at
V(out) <+ transition^ ? Vhi: Vlo, 0, tt);
end the output of the VCO, Jyco* c a n ^ e computed with (82).
endmodule To determine the effect of the divider on 5^(0)), square both
sides of (82) and apply (70)
VCO by simply adjusting the VCO gain and jitter. If the fl
aa T FDrFD (83)
divide ratio of FDN is large, the simulation runs much faster VCO7VCO N
because the high VCO output frequency is never generated.
The Verilog-A model for the merged VCO and FDN is given TvcO=TFD/N> and so
in Listing 16. It also includes code for generating a logfile a (84)
vco
containing the length of each period. The logfile is used in
From (72),
Section XIII when determining 5VCO» the power spectral den-
sity of the phase of the VCO output. s^ VCO-£-
r2 - ^FD f 2
(85)
Recall that the synchronous jitter of F D M and FD# has Jvco
already been included as part of OSC, so the divider model Finally,/vco = W/ FD , and so
incorporated into the VCO is noiseless and the jitter at the ^VCO2^ 5prj>. (86)
output of the noiseless divider results only from the VCO jit-
ter. Since the divider outputs one pulse for every N pulses at Once FDN is incorporated into the VCO, the VCO output sig-
its input, the variance in the output period is the sum of the nal is no longer observable, however the characteristics of the
variance in N input periods. Thus, the period jitter at the out- VCO output are easily derived from (82) and (86), which are
put, /prj, is JN times larger than the period jitter at the input, summarized in Table II.
or It is interesting to note that while the frequency at the output
JVcO'
= JN+VCO- (82) of FDN is N times smaller than at the output of the VCO,
'FD
except for scaling in the amplitude, the spectrum of the noise
close to the fundamental is to a first degree unaffected by the
presence of FD#. In particular, the width of the noise spec-

65
T A B L E II: CHARACTERISTICS OF V C O OUTPUT RELATIVE TO THE
Listing 16 — VCO with FD N .
OUTPUT OF FD/v ASSUMING THE V C O EXHIBITS SIMPLE
Include "discipline.h" ACCUMULATING JITTER AND THE FDyy IS NOISE FREE.

module vco (out, in);


Frequency Jitter Phase Noise
input in; output out; electrical out, in;

parameter realVmin=0; = j -'*> 54 = N2S,


parameter real
Vmax=Vmin+1 from (Vmin:inf);
/vco ^/FD V C 0 ^vco VFD
" ^
parameter real
Fmin=1 from (0:inf);
parameter real
Fmax=2*Fmin from (Fminrinf); To understand why FD# does not affect the width of the noise
parameter real
ratio=1 from (0:inf); spectrum, recall that while we started with a jitter that varied
parameter real
Vlo=-1, Vhi=1; continuously with time, j(t) in (34), for either efficiency or
parameter real
tt=O.O1 *ratio/Fmax from (0:inf);
modeling reasons we eventually sampled it to end up with a
parameter real
jitter=O from [0:0.25*ratio/Fmax);
discrete-time version. The act of sampling the jitter causes the
//VCO period jitter
parameter real ttol=1u*ratio/Fmaxfrom (O:ratio/Fmax); spectrum of the jitter to be replicated at the multiples of the
parameter real outStart=inf from (1/Fmin:inf); sampling frequency, which adds aliasing. This aliasing is visi-
ble, but not obvious, at high frequencies in Figure 18. How-
real freq, phase, dT, delta, prev, Vout;
ever, especially with accumulating jitter, the phase noise
integer n, seed, fp;
amplitude at low frequencies is much larger than the aliased
analog begin noise, and so the close-in noise spectrum is largely unaffected
@ (initial_step) begin by the sampling. The effect of FD^ is to decimate the sampled
seed = - 5 6 1 ;
jitter by a factor of TV, which is equivalent to sampling the jit-
delta = jitter * sqrt(2*ratio);
ter signal, yCX at the original sample frequency divided by N.
fp = $fopen("periods.m");
Thus, the replication is at a lower frequency, the amplitude is
Vout = Vlo;
end lower, and the aliasing is greater, but the spectrum is other-
wise unaffected.
//compute the freq from the input voltage
freq = (V(in) - Vmin)*(Fmax - Fmin) / (Vmax - Vmin)
+ Fmin; XIII. SIMULATION AND ANALYSIS

//bound the frequency (this is optional) The synthesizer is simulated using the netlist from Listing 18
if (freq > Fmax) freq = Fmax; and the Verilog-A descriptions in Listings 14-16, modifying
if (freq < Fmin) freq = Fmin; them as necessary to fit the actual circuit. The simulation
//apply the frequency divider, add the phase noise should cover an interval long enough to allow accurate Fou-
freq = (freq / ratio)/(1 + dT * freq / ratio); rier analysis at the lowest frequency of interest {Fm^. With
deterministic signals, it is sufficient to simulate for K cycles
//phase is the integral of the freq modulo 1
phase = idtmod(freq, 0.0,1.0, -0.5);
after the PLL settles if F m i n = \I(TK). However, for these sig-
nals, which are stochastic, it is best to simulate for \0K to
/ / update jitter twice per period 100AT cycles to allow for enough averaging to reduce the
@(cross(phase - 0.25, +1, ttol)) begin
uncertainty in the result.
dT = delta * $dist_normal(seed, 0,1);
Vout = Vhi; One should not simply apply an FFT to the output signal of
end the VCO/FDyy to determine £(A/) for the PLL. The result
@(cross(phase + 0.25, +1, ttol)) begin would be quite inaccurate because the FFT samples the wave-
dT = delta * $dist_normal(seed, 0,1); form at evenly spaced points, and so misses the jitter of the
Vout = Vlo; transitions. Instead, -£(40 can be measured with Spectre's
if ($abstime >= outStart) $fstrobe( fp, "%0.10e",
Fourier Analyzer, which uses a unique algorithm that does
$abstime - prev);
accurately resolve the jitter [11]. However, it is slow if many
prev = $abstime;
frequencies are needed and so is not well suited to this appli-
end
V(out) <+ transition(Vout, 0, tt);
cation.
end Unlike HAf), S^(Af) can be computed efficiently. The Ver-
endmodule ilog-A code for the VCO/FDN given in Listing 16 writes the
length of each period to an output file named periods.m. Writ-
trum is unaffected by FD#. This is extremely fortuitous, ing the periods to the file begins after an initial delay, speci-
because it means that the number of cycles we need to simu- fied using outStart, to allow the PLL to reach steady state.
late is independent of the divide ratio N. Thus, large divide This file is then processed by Matlab from Math Works using
ratios do not affect the total simulation time. the script shown in Listing 17. This script computes S^(Af),

66
the power spectral density of <|), using Welch's method [28]. XIV. EXAMPLE
The frequency range is from/ out /2 to/out/nfft. The script corn- These ideas were applied to model and simulate a PLL acting
Listing 17 — Matlab script used for computing S^Af). These results as a frequency synthesizer. A synthesizer was chosen with/ ref
must be further processed using Table II to map them to the output of = 25 MHz,/ 0 U t = 2 GHz, and a channel spacing of 200 kHz.
the VCO. As such, M = 125 and N = 10,000.
The noise of OSC is -95 dBc/Hz at 100 kHz. Applying (74)
% Process period data to compute S^(Af)
to compute a, where HAf) = 316 x 10"12, A/ = 100 kHz, and
echo off; fo = 25 MHz, gives a = 10"14. The period jitter J is then com-
nfft=512; % should be power of two
puted from (70), giving J = 20 ps.
winLength=nfft;
overlap=nfft/2; The noise of VCO is -48 dBc/Hz at 100 kHz. Applying (74)
winNBW=1.5; % Noise bandwidth given in bins and (70) with £(4/*) = 1.59 x 10"5, A/ = 100 kHz, and/ 0 =
% Load the data from the file generated by the VCO
2 GHz, gives a = 7.9 x 10~14 and an period jitter of J = 6.3 ps.
load periods.m; The period jitter of the PFD/CP and FDs was found to be
% output estimates of period and jitter 2 ns. The FDs were included into the oscillators, which sup-
T=mean(periods); presses the high frequency signals at the input and output of
J=std(periods); the synthesizer. The netlist is shown in Listing 18. The results
maxdT = max(abs(periods-T))/T; (compensated for non-unity resolution bandwidth (-28 dB)
fprintf(T = %.3gs, F = %.3gHz\n',T, 1/T); and for the suppression of the dividers (80 dB)) are shown in
fprintf('Jabs = %.3gs, Jrel = %.2g%%\n\ J, 100*J/T); Figures 17-20. The simulation took 7.5 minutes for 450k
fprintf('max dT = %.2g%%\n\ 100*maxdT); time-points on a HP 9000/735. The use of a large number of
fprintf('periods = %d, nfft = %d\n\ length(periods), nfft); time points was motivated by the desire to reduce the level of
% compute the cumulative phase of each transition uncertainty in the results. The period jitter in the PLL was
phases=2*pi*cumsum(periods)/T; found to be 9.8 ps at the output of the VCO.
% compute power spectral density of phase
[Sphi,f]=psd(phases,nfftl1/T,winLength,overlap,'linear>);
Listing 18 — Spectre netlist for PLL synthesizer.

% correct for scaling in PSD due to FFT and window //PLL-based frequency synthesizer that models jitter
Sphi=winNBW*Sphi/nffi; simulator lang=spectre
% plot the results (except at DC) ahdijnclude "osc.va" //Listing 14
K = length(f); ahdLJnclude "pfd_cp.va" //Listing 15
semi!ogx(f(2:K),10*log10(Sphi(2:K))); ahdl_include "vco.va" //Listing 16
title('Power Spectral Density of VCO Phase');
Osc (in) osc freq=25MHz ratio=125\
xlabel('Frequency (Hz)');
accJitter=20ps syncJitter=2ns
ylabel('S phi (dB/Hz)');
PFD (err in fb) pfd__cp lout=500ua
rbw = winNBW/(T*nfft);
C1 (errc) capacitor c=3.125nF
RBW=sprintf('Resolution Bandwidth = %.0f Hz (%.0f dB)\
R (c 0) resistor r=10k
rbw, 10*log10(rbw));
C2 (c 0) capacitor c=625pF
imtext(0.5,0.07, RBW);
VCO (fb err) vco Fmin=1 GHz Fmax=3GHz \
Vmin=-4 Vmax=4 ratio=10000 \
putes Sty(Af) with a resolution bandwidth of rbw.^ Normally, jitter=6ps outStart=10ms
S$(&f) is given with a unity resolution bandwidth. To com- JitterSim tran stop=60ms
pensate for a non-unity resolution bandwidth, broadband sig-
nals such as the noise should be divided by rbw. Signals with in err
Osc& VCO&
bandwidth less than rbw, such as the spurs generated by leak- + 125 PFD & CP + 10,000
age in the CP, should not be scaled. The script processes the
r I
output of VCO/FD^. The results of the script must be further fb
processed using the equations in Table II to remove the effect
ofFDtf.

The low-pass filter LF blocks all high frequency signals from


t The Hanning window used in the psd() function has a resolution reaching the VCO, so the noise of the phase lock loop at high
bandwidth of 1.5 bins [29]. Assuming broadband signals, Matlab frequencies is the same as the noise generated by the open-
divides by 1.5 inside psd() to compensate. In order to resolve narrow-
loop VCO alone. At low frequencies, the loop gain acts to sta-
band signals, the factor of 1.5 is removed by the script, and instead
included in the reported resolution bandwidth. bilize the phase of the VCO, and the noise of the PLL is dom-

67
-10 0
VCO-OL
-20
-10
^-30
5 -40
S-50
OL
I-
•o 3
FD/CP,FD-OL
PLL-CL

^
t o- 3 0
*-«
-40
-70 CL
OSC-Ol>
-80 -50

300 Hz 1kHz 3 kHz 10 kHz 30 kHz 100 kHz 300 Hz 1 kHz 3 kHz 10 kHz 30 kHz 100 kHz

Fig. 17. Noise of the closed-loop PLL at the output of the VCO Fig. 20. Closed-loop PLL noise performance compared to the open-
when only the reference oscillator exhibits jitter (CL) versus the loop noise performance of the individual components that make up
noise of the reference oscillator mapped up to the VCO frequency the PLL. The achieved noise is slightly larger than what is expected
when operated open loop (OL). from the components due to peaking in the response of the PLL.

the loop. In this example, noise at the middle frequencies is


dominated by the synchronous jitter generated by the PFD/
0
CO and FDs. The measured results agree qualitatively with
the expected results. The predicted noise is higher than one
OL
-10 would expect solely from the open-loop behavior of each
block because of peaking in the response of the PLL from 5
m-20 kHz to 50 kHz. For this reason, PLLs used in synthesizers
where jitter is important are usually overdamped.

-30
XV. CONCLUSION
CL
-40 A methodology for modeling and simulating the phase noise
and jitter performance of phase-locked loops was presented.
300 Hz 1kHz 3 kHz 10 kHz 30 kHz 100 kHz The simulation is done at the behavioral level, and so is effi-
cient enough to be applied in a wide variety of applications.
Fig. 18. Noise of the closed-loop PLL at the output of the VCO
when only the VCO exhibits jitter (CL) versus the noise of the VCO The behavioral models are calibrated from circuit-level noise
when operated open loop (OL). simulations, and so the high-level simulations are accurate.
Behavioral models were presented in the Verilog-A language,
however these same ideas can be used to develop behavioral
-25
A OL models in purely event-driven languages such as Verilog-
-30
X •
HDL and Verilog-AMS. This methodology is flexible enough
to be used in a broad range of applications where phase noise
^.-35
\
VV
N
and jitter is important.
5 -40
CO CL
3-45 REFERENCES

[1] Ken Kundert. "Introduction to RF simulation and its ap-


plication." Journal ofSolid-State Circuits, vol. 34, no. 9,
-55
September 1999.
-60 V [2] Cadence Design Systems. "SpectreRF simulation op-
tion." www.cadence.com/datasheets/spectrerf.html.
1kHz 10 kHz 100kHz
Fig. 19. Noise of the closed-loop PLL at the output of the VCO [3] F. Gardner. Phaselock Techniques. John Wiley & Sons,
when only the PFD/CP, FDM, and FD^ exhibit jitter (CL) versus the 1979.
noise of these components mapped up to the VCOfrequencywhen [4] D. Yee, C. Doan, D. Sobel, B. Limketkai, S. Alalusi, and
operated open loop (OL). R. Brodersen. "A 2-GHz low-power single-chip CMOS
inated by the phase noise of the OSC. There is some receiver for WCDMA applications." Proceedings of the
European Solid-State Circuits Conference, Sept. 2000.
contribution from the VCO, but it is diminished by the gain of

68
[5] A. Demir, E. Liu, A. Sangiovanni-Vincentelli, and I. [17] F. X. Kaertner. "Analysis of white and/^01 noise in oscil-
Vassiliou. "Behavioral simulation techniques for phase/ lators." International Journal of Circuit Theory and Ap-
delay-locked systems." Proceedings of the IEEE Custom plications, vol. 18, pp. 485-519, 1990.
Integrated Circuits Conference, pp. 453-456, May 1994. [18] G. Vendelin, A. Pavio, U. Rohde. Microwave Circuit
[6] A. Demir, E. Liu, and A. Sangiovanni-Vincentelli. Design. J. Wiley & Sons, 1990.
"Time-domain non-Monte-Carlo noise simulation for [19] W. Gardner. Introduction to Random Processes: With
nonlinear dynamic circuits with arbitrary excitations." Applications to Signals and Systems. McGraw-Hill,
IEEE Transactions on Computer-Aided Design of Inte- 1989.
grated Circuits and Systems, vol. 15, no. 5, pp. 493-505,
May 1996. [20] Joel Phillips and Ken Kundert. "Noise in mixers, oscilla-
tors, samplers, and logic: an introduction to cyclostation-
[7] A. Demir, A. Sangiovanni-Vincentelli. "Simulation and ary noise." Proceedings of the IEEE Custom Integrated
modeling of phase noise in open-loop oscillators." Pro- Circuits Conference, CICC 2000. The paper and presen-
ceedings of the IEEE Custom Integrated Circuits Con- tation are both available from www.designers-
ference, pp. 445-456, May 1996. guide, com.
[8] A. Demir, A. Sangiovanni-Vincentelli. Analysis and [21] T. A. D. Riley, M. A. Copeland, and T. A. Kwasniewski.
Simulation of Noise in Nonlinear Electronic Circuits and "Delta-sigma modulation in fractional-TV frequency syn-
Systems. Kluwer Academic Publishers, 1997. thesis." IEEE Journal of Solid-State Circuits, vol. 28 no.
[9] Ken Kundert. "Modeling and simulation of jitter in 5, May 1993, pp. 553 -559
phase-locked loops." In Analog Circuit Design: RF An- [22] Frank Herzel and Behzad Razavi. "A study of oscillator
alog-to-Digital Converters; Sensor and Actuator Inter- jitter due to supply and substrate noise." IEEE Transac-
faces; Low-Noise Oscillators, PLLs and Synthesizers, tions on Circuits and Systems - //; Analog and Digital
Rudy J. van de Plassche, Johan H. Huijsing, Willy M.C. Signal Processing, vol. 46. no. 1, Jan. 1999, pp. 56-62.
Sansen, Kluwer Academic Publishers, November 1997.
[23] T. C. Weigandt, B. Kim, and P. R. Gray. "Jitter in ring
[10] Ken Kundert. "Modeling and simulation of jitter in PLL oscillators." 1994 IEEE International Symposium on
frequency synthesizers." Available from www.design- Circuits and Systems (ISCAS-94), vol. 4, 1994, pp. 27-
ers-guide.com. 30.
[11] Kenneth S. Kundert. The Designer's Guide to SPICE and [24] A. Papoulis. Probability, Random Variables, and Sto-
Spectre. Kluwer Academic Publishers, 1995. chastic Processes. McGraw-Hill, 1991.
[12] Verilog-A Language Reference Manual: Analog Exten- [25] J. J. Rael and A. A. Abidi. "Physical processes of phase
sions to Verilog-HDL, version 1.0. Open Verilog Inter- noise in differential LC oscillators." Proceedings of the
national, 1996. Available from www.eda.org/verilog- IEEE Custom Integrated Circuits Conference, CICC
ams. 2000.
[13] Ulrich L. Rohde. Digital PLL Frequency Synthesizers. [26] J. McNeill. "Jitter in Ring Oscillators." IEEE Journal of
Prentice-Hall, Inc., 1983. Solid-State Circuits, vol. 32, no. 6, June 1997.
[14] Paul R. Gray and Robert G. Meyer. Analysis and Design [27] H. Chang, E. Charbon, U. Choudhury, A. Demir, E. Felt,
of Analog Integrated Circuits. John Wiley & Sons, 1992. E. Liu, E. Malavasi, A. Sangiovanni-Vincentelli, and I.
[15] A. Demir, A. Mehrotra, and J. Roychowdhury. "Phase Vassiliou. A Top-Down Constraint-Driven Methodology
noise in oscillators: a unifying theory and numerical for Analog Integrated Circuits. Kluwer Academic Pub-
methods for characterization." IEEE Transactions on lishers, 1997.
Circuits and Systems I: Fundamental Theory and Appli- [28] A. Oppenheim, R. Schafer. Digital Signal Processing.
cations, vol. 47, no. 5, May 2000, pp. 655 -674. Prentice-Hall, 1975.
[16] F. Kaertner. "Determination of the correlation spectrum [29] F. Harris. "On the use of windows for harmonic analysis
of oscillators with low noise." IEEE Transactions on Mi- with the discrete Fourier transform." Proceedings of the
crowave Theory and Techniques, vol. 37, no. 1, pp. 90- IEEE, vol. 66, no. 1, January 1978.
101, Jan. 1989.

69
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996 331

A Study of Phase Noise in CMOS Oscillators


Behzad Razavi, Member, IEEE

Abstract-This paper presents a study of phase noise in two models, the analytical approach can predict the phase noise
inductorless CMOS oscillators. First-order analysis of a linear with approximately 4 to 6 dB of error.
oscillatory system leads to a noise shaping function and a new
The next section of this paper describes the effect of
definition of Q. A linear model of CMOS ring oscillators is used
to calculate their phase noise, and three phase noise phenomena, phase noise in wireless communications. In Section 111, the
namely, additive noise, high-frequency multiplicative noise, and concept of Q is investigated and in Section IV it is generalized
low-frequency multiplicativenoise, are identified and formulated. through the analysis of a feedback oscillatory system. The
Based on the same concepts, a CMOS relaxation oscillator is also resulting equations are then used in Section V to formulate
analyzed. Issues and techniques related to simulation of noise in
the time domain are described,and two prototypesfabricated in a the phase noise of ring oscillators with the aid of a linearized
0.5-pm CMOS technology are used to investigate the accuracy of model. In Section VI, nonlinear effects are considered and
the theoretical predictions. Compared with the measured results, three mechanisms of noise generation are described, and in
the calculated phase noise values of a 2-GHz ring oscillator and Section VII, a CMOS relaxation oscillator is analyzed. In
a 900-MHz relaxation oscillator at 5 MHz offset have an error
of approximately 4 dB.
Section VIII, simulation issues and techniques are presented,
and in Section IX the experimental results measured on the
two prototypes are summarized.
I. INTRODUCTION
11. PHASE NOISEIN WIRELESS COMMUNICATIONS
V OLTAGE-CONTROLLED oscillators (VCO’s) are an
integral part of phase-locked loops, clock recovery cir-
cuits, and frequency synthesizers. Random fluctuations in the
Phase noise is usually characterized in the frequency do-
main. For an ideal oscillator operating at W O , the spectrum
output frequency of VCO’s, expressed in terms of jitter and assumes the shape of an impulse, whereas for an actual
phase noise, have a direct impact on the timing accuracy oscillator, the spectrum exhibits “skirts” around the center
where phase alignment is required and on the signal-to-noise or “carrier” frequency (Fig. 1). To quantify phase noise, we
ratio where frequency translation is performed. In particular, consider a unit bandwidth at an offset Aw with respect to W O ,
RF oscillators employed in wireless tranceivers must meet calculate the noise power in this bandwidth, and divide the
stringent phase noise requirements, typically mandating the use result by the carrier power.
of passive LC tanks with a high quality factor (Q). However, To understand the importance of phase noise in wire-
the trend toward large-scale integration and low cost makes it less communications, consider a generic transceiver as
desirable to implement oscillators monolithically. The paucity depicted in Fig. 2, where the receiver consists of a low-
of literature on noise in such oscillators together with a lack of noise amplifier, a band-pass filter, and a downconversion
experimental verification of underlying theories has motivated mixer, and the transmitter comprises an upconversion
this work. mixer, a band-pass filter, and a power amplifier. The
This paper provides a study of phase noise in two induc- local oscillator (LO) providing the carrier signal for both
torless CMOS VCO’s. Following a first-order analysis of a mixers is embedded in a frequency synthesizer. If the
linear oscillatory system and introducing a new definition of LO output contains phase noise, both the downconverted
Q, we employ a linearized model of ring oscillators to obtain and upconverted signals are corrupted. This is illustrated
an estimate of their noise behavior. We also describe the in Fig. 3(a) and (b) for the receive and transmit paths,
limitations of the model, identify three mechanisms leading respectively.
to phase noise, and use the same concepts to analyze a CMOS Referring to Fig. 3(a), we note that in the ideal case, the
relaxation oscillator. In contrast to previous studies where signal band of interest is convolved with an impulse and thus
time-domain jitter has been investigated [l], [2], our analysis translated to a lower (and a higher) frequency with no change
is performed in the frequency domain to directly determine the in its shape. In reality, however, the wanted signal may be
phase noise. Experimental results obtained from a 2-GHz ring accompanied by a large interferer in an adjacent channel, and
oscillator and a 900-MHz relaxation oscillator indicate that, the local oscillator exhibits finite phase noise. When the two
despite many simplifying approximations, lack of accurate signals are mixed with the LO output, the downconverted band
MOS models for RF operation, and the use of simple noise consists of two overlapping spectra, with the wanted signal
suffering from significant noise due to tail of the interferer.
Manuscript received October 30, 1995; revised December 17, 1995. This effect is called “reciprocal mixing.”
The author was with AT&T Bell Laboratories, Holmdel, NJ 07733 USA.
He is now with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA. Shown in Fig. 3(b), the effect of phase noise on the transmit
Publisher Item Identifier S 0018-9200(96)02456-0. path is slightly different. Suppose a noiseless receiver is to
0018-9200/96$05.00 0 1996 IEEE
332 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

- Aiw
Fig. 1. Phase noise in an oscillator.

Low-Noise
Amplifier

c Band-Pass
Filter

Frequency
Synthesizer .
Amplifier
Band-Pass

Fig. 2. Generic wireless transceiver.

detect a weak signal at w2 while a powerful, nearby tranmitter


generates a signal at w1 with substantial phase noise. Then,
the wanted signal is corrupted by the phase noise tail of the
transmitter.
The important point here is that the difference between w1 (b)
and w2 can be as small as a few tens of kilohertz while each of Fig 3. Effect of phase noise on (a) receive and (b) transmt paths.
these frequencies is around 900 MHz or 1.9 GHz. Therefore,
the output spectrum of the LO must be extremely sharp. In
the North American Digital Cellular (NADC) IS54 system, making the circuit sensitive to supply and substrate noise.
the phase noise power per unit bandwidth must be about 115 Second, the required inductor (and varactor) Q is typically
dB below the carrier power (i.e., - I15 dBc/Hz) at an offset greater than 20, prohibiting the use of low-Q integrated
of 60 kHz. inductors. Third, monolithic varactors also suffer from large
Such stringent requirements can be met through the use of series resistance and hence a low Q. Fourth, since the LO
LC oscillators. Fig. 4 shows an example where a transcon- signal inevitably appears on bond wires connecting to (or
ductance amplifier (G,) with positive feedback establishes a operating as) the inductor, there may be significant coupling
negative resistance to cancel the loss in the tank and a varactor of this signal to the front end (“LO leakage”), an undesirable
diode provides frequency tuning capability. This circuit has a effect especially in homodyne architectures [ 3 ] .
number of drawbacks for monolithic implementation. First, Ring oscillators, on the other hand, require no external
both the control and the output signals are single-ended, components and can be realized in fully differential form, but
RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS 333

“cc
ll-
I I

Freq.
Control

-L

Fig. 4. LC oscillator, (2) Q = 2~ Energy


Energy Dissipated per Cycle
their phase noise tends to be high because they lack passive
resonant elements.

111. DEFINITIONS
OF Q
The quality factor, Q, is usually defined within the context
of second-order systems with (damped) oscillatory behavior.
Illustrated in Fig. 5 are three common definitions of Q. For an
RLC circuit, Q is defined as the ratio of the center frequency
and the two-sided -3-dB bandwidth. However, if the inductor (3) Q=--00 dQ
is removed, this definition cannot be applied. A more general 2 do
definition is: 27r times the ratio of the stored energy and the
dissipated energy per cycle, and can be measured by applying a Fig. 5. Common definitions of &.
step input and observing the decay of oscillations at the output.
Again, if the circuit has no oscillatory behavior (e.g., contains
no inductors), it is difficult to define “the energy dissipated
per cycle.” In a third definition, an LC oscillator is considered
as a feedback system and the phase of the open-loop transfer
function is examined at resonance. For a simple LC circuit
such as that in Fig. 4, it can be easily shown that the Q of
the tank is equal to 0 . 5 ~ 0d@/dw, where W O is the resonance
frequency and d@/dw denotes the slope of the phase of the
Fig. 6. Two-integrator oscillator.
transfer function with respect to frequency. Called the “open-
loop &” herein, this definition has an interesting interpretation
if we recall that for steady oscillations, the total phase shift
around the loop must be precisely 360”. Now, suppose the
oscillation frequency slightly deviates from W O . Then, if the
phase slope is large, a significant change in the phase shift
arises, violating the condition of oscillation and forcing the
frequency to return to W O . In other words, the open-loop Q Fig. 7. Linear oscillatory system.
is a measure of how much the closed-loop system opposes
variations in the frequency of oscillation. This concept proves IV. LINEAROSCILLATORY
SYSTEM
useful in our subsequent analyses.
Oscillator circuits in general entail “compressive” nonlin-
While the third definition of Q seems particularlly well-
earity, fundamentally because the oscillation amplitude is not
suited to oscillators, it does fail in certain cases. As an
defined in a linear system. When a circuit begins to oscillate,
example, consider the two-integrator oscillator of Fig. 6 , where
the amplitude continues to grow until it is limited by some
the open-loop transfer function is simply
other mechanism. In typical configurations, the open-loop gain
of the circuit drops at sufficiently large signal swings, thereby
H(s)= -(?) 2
(1) preventing further growth of the amplitude.
In this paper, we begin the analysis with a linear model. This
approach is justified as follows. Suppose an oscillator employs
yielding CP = L H ( s = j w ) = 0, and Q = 0. Since this circuit strong automatic level control (ALC) such that its oscillation
does indeed oscillate, this definition of Q is not useful here. amplitude remains small, making the linear approximation
334 LEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

Fig. 8. Noise shaping in oscillators.

valid. Since the ALC can be relatively slow, the circuit spectral density is shaped by
parameters can be considered time-invariant for a large number
of cycles. Now, let us gradually weaken the effect of AJX
so that the oscillator experiences increasingly more “self-
limiting.” Intuitively, we expect that the linear model yields
reasonable accuracy for soft amplitude limiting and becomes
This is illustrated in Fig. 8. As we will see later, (6) assumes
gradually less accurate as the ALC is removed. Thus, the
choice of this model depends on the error that it entails a simple form for ring oscillators.
in predicting the response of the actual oscillator to various
To gain more insight, let H ( j w ) = A ( w ) exp[j@(w)],and
hence
sources of noise, an issue that can be checked by simulation
(Section VIII). While adequate for the cases considered here,
(7)
this approximation must be carefully examined for other types
of oscillators.
To analyze phase noise, we treat an oscillator as a feedback
Since for w M WO, A = 1, ( 6 ) can be written as
system and consider each noise source as an input (Fig. 7).
The phase noise observed at the output is a function of: 1)
sources of noise in the circuit and 2) how much the feedback
system rejects (or amplifies) various noise components. The
system oscillates at w = W O if the transfer function
We define the open-loop Q as

goes to infinity at this frequency, i.e., if H ( j w 0 ) = -1. For Combining (8) and (9) yields
+
frequencies close to the carrier, w = W O A w , the open-loop
transfer function can be approximated as

a familiar form previously derived for simple LC oscillators


[4]. It i s interesting to note that in an LC tank at resonance,
and the noise tranfer function is d A / & = 0 and (9) reduces to the third definition of Q given
in Section III. In the two-integrator oscillator, on the other
hand, d A / d w = 2/wo,d@./dw = 0 , and Q = 1. Thus, the
proposed definition of Q applies to most cases of interest.
To complete the discussion, we also consider the case
shown in Fig. 9, where H l ( j w ) H ~ ( ~= wH ) ( j w ) . Therefore,
Since H ( j w 0 ) = -1 and for most practical cases Y ( j w ) / X ( j w )is given by (5). For, Y ~ ( j w ) / X ( j w we
) , have
l a w dH/dwl << 1, (4) reduces to
Y -1
,[j(wo + Aw)] M (5)
aw-dH .
~

dw giving the following noise shaping function:

This equation indicates that a noise component at w = wo+Aw


is multiplied by - ( A w d H / d w ) - l when it appears at the
output of the oscillator. In other words, the noise power
RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS 335

y,(im)

Y(p) - - -
Fig. 9. Oscillatory system with nonunity-gain feedback. Fig. 11. Linearized model of CMOS VCO.

open-loop transfer function is thus given by


-8
H(jw)= (13)
Freq. .........:.I ...................... i..;
....................... I I
,. (1 + j f i-5~)~’
0...........i.......................... i .......................... i

(a) Therefore, JdA/dwl = 9/(4wO) and Id@/dwl = 3&/(4w0).


It follows from (6) or (10) that if a noise current Inl is
injected onto node 1 in the oscillator of Fig. 11, then its power
vDD ’ spectrum is shaped by

This equation is the key to predicting various phase noise


components in the ring oscillator.

Control AND MULTIPLICATIVE


VI. ADDITIVE NOISE
Modeling the ring oscillator of Fig. 10 with the linearized
circuit of Fig. 11 entails a number of issues. First, while the
(b)
stages in Fig. 10 tum off for part of the period, the linearized
model exhibits no such behavior, presenting constant values
Fig. 10. CMOS VCO: (a) block diagram and (b) implementation of one
stage. for the components in Fig. 11. Second, the model does not
predict mixing or modulation effects that result from nonlin-
earities. Third, the noise of the devices in the signal path has a
V. CMOS RING OSCILLATOR “cyclostationary” behavior, i.e., periodically varying statistics,
Submicron CMOS technologies have demonstrated potential because the bias conditions are periodic functions of time. In
for high-speed phase-locked systems [ 5 ] ,raising the possibility this section, we address these issues, first identifying three
of designing fully integrated RF CMOS frequency synthesiz- types of noise: additive, high-frequency multiplicative, and
ers. Fig. 10 shows a three-stage ring oscillator wherein both low-frequency multiplicative.
the signal path and the control path are differential to achieve
high common-mode rejection. A, Additive Noise
To calculate the phase noise, we model the signal path in Additive noise consists of components that are directly
the VCO with a linearized (single-ended) circuit (Fig. 11). As added to the output as shown in Fig. 7 and formulated by
mentioned in Section IV, the linear approximation allows a (6) and (14).
first-order analysis of the topologies considered in this paper, To calculate the additive phase noise in Fig. 10 with the aid
but its accuracy must be checked if other oscillators are of of (14), we note that for w M wo the voltage gain in each stage
is close to unity. (Simulations of the actual CMOS oscillator
interest. In Fig. 11, R and C represent the output resistance
indicate that for W O = 27r x 970 MHz and noise injected at
and the load capacitance of each stage, respectively, ( R M
w - W O = 27r x 10 MHz onto one node, the components
l / g m 3 = l / g m 4 ) , and G,R is the gain required for steady observed at the three nodes differ in magnitude by less than
oscillations. The noise of each differential pair and its load 0.1 dB.) Therefore, the total output phase noise power density
devices are modeled as current sources Inl-In3, injected onto due to In1-Jns is
nodes 1-3, respectively. Before calculating the noise transfer
function, we note that the circuit of Fig. 11 oscillates if, at
W O , each stage has unity voltage gain and 120’ of phase shift.
- - - -
Writing the open-loop transfer function and imposing these where it is assumed I& = I:2 = I& = I:. For the differential
two conditions, we have W O = & / ( R C ) and G,R = 2. The stage of Fig. 10, the thermal noise current per unit bandwidth
336 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 31, NO 3, MARCH 1996

Fig. 12. High-frequency multiplicative noise.

is equal to E = 8kT(gml + gm3)/3 M 8 k T I R . Thus,

In this derivation, the thermal drain noise current of MOS


devices is assumed equal to 2= 41;T(2gm/3). For short-
channel devices, however. the noise may be higher [6]. Using
a charge-based model in our simulation tool, we estimate the
factor to be 0.873 rather than 2/3. In reality, hot-electron
effects further raise this value.
Additive phase noise is predicted by the linearized model
- -
with high accuracy if the stages in the ring operate linearly for Fig. 13. Frequency modulation due to tail current noise
most of the period. In a three-stage CMOS oscillator designed
for the RF range, the differential stages are in the linear region oscilPatory system. Simulations indicate that for the oscilla-
for about 90% of the period. Therefore, the linearized model tor topologies considered here, these two components have
emulates the CMOS oscillator with reasonable accuracy. How- approximately equal magnitudes. Thus, the nonlinearity folds
ever, as the number of stages increases or if each stage entails all the noise components below W O to the region above and
more nonlinearity, the error in the linear approximation may vice versa, effectively doubling the noise power predicted by
increase. (6). Such components are significant if they are close to W O
Since additive noise is shaped according to (16), its effect is and are herein called high-frequency multiplicative noise. This
significant only for components close to the carrier frequency. phenomenon is illustrated in Fig. 12. (Note that a component
+ +
at 3w0 A w is also translated to W O A w , but its magnitude
B. High-Frequency Multiplicative Noise is negligible.)
The nonlinearity in the differential stages of Fig. 10, es- This effect can also be viewed as sampling of the noise
pecially as they turn off, causes noise components to be by the differential pairs, especially if each stage experi-
multiplied by the carrier (and by each other). If the input/output ences hard switching. As each differential pair switches twice
characteristic of each stage is expressed as VoUt= all,$n + in every period, a noise component at w, is translated to
+
CQV,; Q~V,:, then for an input consisting of the carrier and a 2w0 f wn. Note that for highly nonlinear stages, the Taylor
+
noise component, e.g., K n ( t ) = A0 cos wot A, cosw,t, the expansion considered above may need to include higher order
output exhibits the following important terms: terms.

wn)t
vo”tl(t) fx Q2’4OAn COS(W0 f C. Low-Frequency Multiplicative Noise
~&~~c
( tl)i , ~
cos(wo
, ~-:2w,)t Since the frequency of oscillation in Fig. 10 is a function
vout3(t)a 3 ~ ;cos(2w0~ , - Wn)t. of the tail current in each differential pair, noise components
in this current modulate the frequency, thereby contributing
Note that Voutl(t) appears in band if w, is small, i.e., if phase noise [classical frequency modulation (FM)]. Depicted
it is a low-frequency component, but in a fully differential in Fig. 13, this effect can be significant because, in CMOS
configuration, Voutl(t) = 0 because a2 = 0. Also, Vouta(t)oscillators, W O must be adjustable by more than &20% to
is negligible because A, << Ao, leaving Vout3(t) as the only compensate for process variations, thus making the frequency
significant cross-product. quite sensitive to noise in the tail current. This mechanism is
This simplified one-stage analysis predicts the frequency of illustrated in Fig. 14.
the components in response to injected noise, but not their To quantify this phenomenon, we find the sensitivity or
magnitude. When noise is injected into the oscillator, the “gain” of the VCO, defined as HVCO= dwOut/dIss in
magnitude of the observed response at w, and 2w0 - w, Fig. 13, and use a simple approximation. If the noise per unit
depends on the noise shaping properties of the feedback bandwidth in ISS is represented as a sinusoid with the same
# :qn
RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS 331

tit
0 00
~

+
Fig. 14. Low-frequency multiplicative noise.
e
.
. *-””
..J

00
*e*.

0
~

-4 ;” M1

+ 1,s
M L F

+ ‘ss
R4
Lk

power: I, cos w,t, then the output signal of the oscillator (a) (b)
can be written as Fig. 15. Gain stage with (a) stationary and (b) cyclostationary noise.

wot + Kvco / I , cos w,t dt) (17)

For KvcoI,/w, << 1 radian (“narrowband FM’)

. [ C O S ( w o + b,)t - COS(WO - w,)t]. (19)

Thus, the ratio of each sideband amplitude to the carrier


amplitude is equal to ImKvc0/(2w,), i.e.,

IV,12(with respect to carrier) = (20)

Since KVCOcan be easily evaluated in simulation or mea- Fig. 16. Addition of output voltages of N oscillators.
surement, (20) is readily calculated.
It is seen that modulation of the carrier brings the low
frequency noise components of the tail current to the band condition). Simulations indicate that the sideband magnitudes
around WO.Thus, flicker noise in I, becomes particularly in the two cases differ by less than 0.5 dB.
important. It is important to note that this result may not be accurate
In the differential stage of Fig. 3(b), two sources of low- for other types of oscillators.
frequency multiplicative noise can be identified: noise in Iss
and noise in Ms and Me. For comparable device size, these E. Power-Noise Trade-off
two sources are of the same order and must be both taken As with other analog circuits, oscillators exhibit a trade-
into account. off between power dissipation and noise. Intuitively, we note
that if the output voltages of N identical oscillators are added
D. Cyclostationary Noise Sources in phase (Fig. 16), then the total carrier power is multiplied
by N2, whereas the noise power increases by N (assuming
As mentioned previously, the devices in the signal path
noise sources of different oscillators are uncorrelated). Thus,
exhibit cyclostationary noise behavior, requiring the use of pe-
the phase noise (relative to the carrier) decreases by a factor
riodically varying noise statistics in analysis and simulations.
N at the cost of a proportional increase in power dissipation.
To check the accuracy of the stationary noise approximation,
Using the equations developed above, we can also formulate
we perform a simple, first-order simulation on the two cases
this trade-off. For example, from (16), since G,R M 2, we
depicted in Fig. 15. In Fig. 15(a), a sinusoidal current source
have
with an amplitude of 2 nA is connected between the drain and
source of M I to represent its noise with the assumption that
M I carries half of I S S .In Fig. 15(b), the current source is also
a sinusoid, but its amplitude is a function of the drain current To reduce the total noise power by N , G, must increase by the
of M I . Since MOS thermal noise current (in the saturation same factor. For any active device, this can be accomplished
region) is proportional to 6, we use a nonlinear dependent by increasing the width and the bias current by N . (To maintain
source in SPICE [7] as In(t) = a q m s i n w , t , where the same frequency of oscillation, the load resistor is reduced
w, = 27r x 980 MHz. The factor Q is chosen such that by N . ) Therefore, for a constant supply voltage, the power
I,(t) = 2 nA x sinw,t when V,(t) = 1 x Iss/2 (balanced dissipation scales up by N .
338 IEEE JOURNAC OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

TABLE I
COMPARISON OF AND FOUR-STAGE
THREE-STAGE RINGOSCILLATORS

&
3-Stage VCO &Stage VCO

Minimum Required DC Gain 2

Noise Shaping Function 4 Ml M2

Open-Loop Q ?! 4 (e 1.3) Jz (M 1.4)


Total Additive Noise

Power Dissipation 1.8 mW 3.6 mW Fig. 17. Substrate and supply noise in gain stage.

VII. CMOS RELAXATIONOSCILLATOR


F. Three-Stage Versus Four-Stage Oscillators In this section, we apply the analysis methodology described
thus far to a CMOS relaxation oscillator [Fig. 18(a)]. When
The choice of number of stages in a ring oscillator to
designed to operate at 900 MHz, this circuit hardly “relaxes”
minimize the phase noise has often been disputed. With
and the signals at the drain and source of MI and M2 are close
the above formulations, it is possible to compare rings with
to sinusoids. Thus, the linear model of Fig. 7 is a plausible
different number of stages (so long as the approximations
choice. To utilize our previous results, we assume the signals at
remain valid). For the cases of interest in RF applications,
the sources of M I and M2 are fully differentiall and redraw the
we consider three-stage and four-stage oscillators designed to
circuit as in Fig. 18(b), identifying it as a two-stage ring with
operate at the same frequency. Thus, the four-stage oscillator
capacitive degeneration (CA = 2C). The total capacitance
incorporates smaller impedance levels and dissipates more
seen at the drain of M I and MZ is modeled with C1 and C2,
power. Table 1 compares various aspects of the two circuits.
respectively. (This is also an approximation because the input
We make three important observations. 1) Simulations show
impedance of each stage is not purely capacitive.) It can be
that if the four-stage oscillator is to operate at the same speed
easily shown that the open-loop transfer function is
as the three-stage VCO, the value of R in the former must
be approximately 60% of that in the latter. 2) The Q’s of
the two VCO’s (10) are roughly equal. 3) The total additive
thermal noise of the two VCO’s is about the same, because
where C1 = C2 = CD and gm denotes the transconductance of
the four-stage topology has more sources of noise, but with
each transistor. For the circuit to oscillate at W O , H ( j w 0 ) = 1,
lower magnitudes.
and each stage must have a phase shift of 180”, with 90”
From these rough calculations, we draw two conclusions.
contributed by each zero and the remaining 90” by the two
First, the phase noise depends on not only the Q, but the
poles at -gm/CA and -1/(RCD). It follows from the second
number and magnitude of sources of noise in the circuit.
condition that
Second, four-stage VCO’s have no significant advantage
over three-stage VCO’s, except for providing quadrature
outputs.
i.e., W O is the geometric mean of the poles at the drain and
source of each transistor. Combining this result with the first
G. Supply and Substrate Noise condition, we obtain
Even though the gain stage of Fig. 10 is designed as a dif- CA
gmR =
ferential circuit, it nonetheless suffers from some sensitivity to CA - CD ’

supply and substrate noise (Fig. 17). Two phenomena account After lengthy calculations, we have
for this. First, device mismatches degrade the symmetry of the
circuit. Second, the total capacitance at the common source of
the differential pair (i.e., the source junction capacitance of M I
and M2 and the capacitance associated with the tail current and
source) converts the supply and substrate noise to current,
thereby modulating the delay of the gain stage. Simulations
indicate that even if the tail current source has a high dc output
impedance, a 1-mV,, supply noise component at 10 MHz This assumption is justified by decomposing C into two series capacitors,
each one of value 2C, and monitoring the midpoint voltage. The common-
generates sidebands 60 dB below the canier at W O f (27r x 10 mode swing at this node is approximatley 18 dB below the differential swings
MHz). at the source of M I and M2.
RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS 339

LT2vDD
Ml

R T I

-100‘ I I
1 1.1 12 t .3 1A 1.5 1.8
GI42

Fig. 19. Simulated oscillator spectrum with injected white noise.

appear in the spectrum even though the injected noise is white,


and 2) the magnitude of the sidebands does not directly scale
(C) (d)
with the magnitude of the injected noise!
Fig. 18. (a) CMOS relaxation oscillator, (b) circuit of (a) redrawn, (c) noise
current of one transistor, and (d) tranfformed noise current. To understand the cause of this behavior, consider a much
simpler case, illustrated in Fig. 20. In Fig. 20(a), a sinusoid at
1 GHz is applied across a 1-k0 resistor, and a long transient
For C, = O.~CA,Qreaches its maximum value-unity. In simulation followed by interpolation and FFT is used to obtain
other words, the maximum Q occurs if the (floating) timing the depicted spectrum. (The finite width results from the finite
capacitor is equal to the load capacitance. The noise shaping length of the data record and the “arches” are attributed to
function is therefore equal to ( ~ ~ / A w ) ~ / 4 . windowing effects.) Now, as shown in Fig. 20(b), we add a
Since the drain-source noise current of M I and M2 appears 30-MHz squarewave with 2 ns transition time and proceed as
between two internal nodes of the circuit [Fig. 18(c)], the before. Note that the two circuits share only the ground node.
transformation shown in Fig. 18(d) can be applied to allow In this case, however, the spectrum of the 1-GHz sinusoid
the use of our previous derivations. It can be shown that exhibits coherent sidebands with 15 MHz spacing! Observed in
AT&T’ s internal simulator (ADVICE), HSPICE, and Cadence
SPICE, this effect is attributed to the additional points that
the program must calculate at each edge of the squarewave,
and the total additive thermal noise observed at each drain is
leading to errors in subsequent interpolation.
10 Fortunately, this phenomenon does not occur if only sinu-
3 soids are used in simulations.
This power must be doubled to account for high-frequency
multiplicative noise. B. Oscillator Simulations
In order to compute the response of oscillators to each
VIII. SIMULATION
RESULTS noise source, we approximate the noise per unit bandwidth
at frequency w, with an impulse (a sinusoid) of the same
A. Simulation Issues power at that frequency. As shown in Fig. 21, the “sinusoidal
The time-varying nature of oscillators prohibits the use noise” is injected at various points in the circuit and the output
of the standard small-signal ac analysis available in SPICE spectrum is observed. This approach is justified by the fact that
and other similar programs. Therefore, simulations must be random Gaussian noise can be expressed as a Fourier series of
performed in the time domain. As a first attempt, one may sinusoids with random phase [8], [SI.Since only one sinusoid
generate a pseudo-random noise with known distribution, is injected in each simulation, the interaction among noise
introduce it into the circuit as a SPICE piecewise linear components themselves is assumed negligible, a reasonable
waveform, run a transient analysis for a relatively large number approximation because if two noise components at, say, -60
of oscillation periods, write the output as a series of points dB are multiplied, the product is at -120 dB.
equally spaced in time, and compute the fast Fourier transform In the simulations, the oscillators were designed for a center
(FFT)of the output. The result of one such attempt is shown in frequency of approximately 970 MHz. Each circuit and its
Fig. 19. It is important to note that 1) many coherent sidebands linearized models were simulated in the time domain in steps
340 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 31, NO 3, MARCH 1996

Fig. 21. Simulated configuration.

The vertical axis represents 10 log qzs.Note that the observed


magnitude of the 980-MHz component differs by less than 0.2
dB in the two cases, indicating that the linearized model is
indeed an accurate representation. As explained in Section VI-
B, the 960-MHz component originates from third-order mixing
of the carrier and the 980-MHz component and essentially
doubles the phase noise.
In order to investigate the limitation of the linear model, the
oscillator was made progressively more nonlinear. Shown in
Fig. 23 is the output spectra of a four-stage CMOS oscillator,
revealing approximately 1 dB of error in the prediction by the
linear model. The error gradually increases with the number of
stages in the ring and reaches nearly 6 dB for an eight-stage
oscillator.
For bipolar ring oscillators (differential pairs with no emitter
followers), simulations reveal an error of approximately 2 dB
for three stages and 7 dB for four stages in the ring.

IX, EXPERIMENTAL
RESULTS

A. Measurements
Two different oscillator configurations have been fabricated
in a 0.5-pm CMOS technology to compare the predictions in
this paper with measured results. Note that there are three sets
of results: theoretical calculations based on linear models but
including multiplicative noise, simulated predictions based on
the actual CMOS oscillators, and measured values.
9 9.5 10 10.5 11
The first circuit is a 2.2-GHz three-stage ring oscillator.
x 100 MHZ Fig. 24 shows one stage of the circuit along with the measured
(b) device parameters. The sensitivity of the output frequency to
Fig. 20 Simple simulation revealing effect of pulse waveforms, (a) sin- the tail current of each stage is about 0.43 MHzIpA. The
gle sinusoidal source and (b) sinusoidal source along with a square wave measured spectrum is depicted in Fig. 25(a) and (b) with
generator.
two different horizontal scales. Due to lack of data on the
flicker noise of the process, we consider only thermal noise
of 30 ps for 8 ps, and the output was processed in MATLAB at relatively large frequency offsets, namely, 1 MHz and 5
to obtain the spectrum. Since simulations of the linear model MHz.
yield identical results to the equations derived above, we will It is important to note that low-frequency flicker noise
not distinguish between the two hereafter. causes the center of the spectrum to fluctuate constantly. Thus,
Shown in Fig. 22 are the output spectra of the linear model as the resolution bandwidth (RBW) of the spectrum analyzer is
and actual circuit of a three-stage oscillator in 0.5-pm CMOS reduced [from 1 MHz in Fig. 25(a) to 100 kHz in Fig. 25(b)],
technology with a Z-nA, 980-MHz sinusoidal current injected the carrier power is subject to more averaging and appears to
into the signal path (the drain of one of the differential pairs). decrease. To maintain consistency with calculations, in which
RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS 341

or .......,.........
OL I
...,."..........,
4 I
. . . . . . . . . . .l'1 ....... ....).
! . . . . . . II . ...... -1
I

....... ..................... .. I . . . . . ..............................................

................................ ..: . . . ...... : . . . . . . . . . . .:. . . . . . . .:. . . . . . . . . ;

a-
a

4!
............................. :..... .....................
......................................................... :..............:..............

-6oc -80 [_ :
-80 ......................................
......................................
.... . . . . . . . .. ............. .. . . . . . . . .. ............. .. . . . .

-100

-120

-140

-180

-1 80
92 9.4 9.8 9.8 10 10.2 10.4
x 100 MHz

.......................... j .................... ............ . . . . . . . . . . . . . . . . -@+

-100
I

I /I I
II I

-100
-120
-120
-140
-140
-160
-180
-180
-180
9 9.2 9.4 9.6 9.8 10 10
9.2 9.4 9.6 9.8 10 10.2 10.4 x 100 MHz
x 100 MHz
(b)
(b)
Fig. 23. Simulated output spectra of (a) linear model and (b) actual circuit
Fig. 22. Simulated output spectra of (a) linear model and (b) actual circuit of a four-stage CMOS oscillator.
of a three-stage CMOS oscillator.

the phase noise is normalized to a constant carrier power,


this power (i.e., the output amplitude) is measured using an
oscilloscope.
The noise calculation proceeds as follows. First, find the
additive noise power in (16), and double the result to account gm= 11214 U
for third-order mixing (high-frequency multiplicative noise). Ms-Me: WILE 13.4u/0.5~
Next, calculate the low-frequency multiplicative noise from g m = 11630 U
(20) for one stage and multiply the result by three. We M g : WIL = 1 3 . 4 ~ l 0 . 5 ~
assume (from simulations) that the internal differential voltage I, 790 UA
swing is equal to 1 V,, (0.353 V,,,) and the drain noise - g m = 11530 U
current of MOSFET's is given by = 4kT(0.863gm). For 2 Fig. 24. Gain stage used in 2-GHz CMOS oscillator.
Aw = 2n x 1 MHz, calculations yield
Simulations of the actual CMOS oscillator predict the total
high-frequency multiplicative noise = - 100.1 dBc/Hz (29)
noise to be -98.1 dBc/Hz. From Fig. 25(b), with the carrier
low-frequency multiplicative noise = - 106.3 dBc/Hz (30) power of Fig. 25(a), the phase noise is approximately equal
total normalized phase noise = -99.2 dBc/Hz. (31) to -94 dBc/Hz.
342 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

Ml-M,: W/L= 100u10.5~


g=
,, 1/84 Ili
R=275 R
C = 0.6 pF
- - Iss= 3 mA

Fig . 26. Relaxation oscillator parameters

(b)
F ig. 25. Measured output spectrum of ring oscillator (10 dB/div. vertical
SIsale). (a) 5 MHz/div. horizontal scale and 1 MHz resolution bandwidth, (b)
1 MHz horizontal scale and 100 lcHz resolution bandwidth.

Similarly, fcir A w = 27r x 5 MHz, calculations yield

high-frequency multiplicative noise = - 114.0 dBc/Hz (32)


low-frequency multiplicative noise = -120.2 dBc/Hz (33)
total normalized phase noise = - 113.1 dBc/Hz (34)

amd simulations predict - 112.4 dBc/Hz, while Fig. 25(a)


indicates a phase noise of - 109 dBc/Hz. Note that these values
correspond to a center frequency of 2.2 GHz and should be Fig. 27. Measured output spectrum of relaxation oscillator (10 dB/div.
lowered by approximately 8 dB for 900 MHz operation, as vertical scale). (a) 2 MJWdiv. horizontal scale and 100 kHz resolution
bandwidth and (b) 1 MHz horizontal scale and 10 1<Hzresolution bandwidth.
shown in (9).
The second circuit is a 920-MHz relaxation oscillator,
depicted in Fig. 26. The measured spectra are shown in - 120 dBckIz, respectively, while the measured value is - 115
Fig. 27. Since simulations indicate that the low-frequency dBc/Hz.
multiplicative noise is negligible in this implementation, we
consider only the thermal noise in the signal path. For A w =
27r x 1 MHz, calculations yield a relative phase noise of B. Discussion
-105 dBc/Hz, simulations predict -98 dB, and the spectrum Using the above measured data points and assuming a noise
in Fig. 27 gives -102 dBc/Hz. For A w = 27r x 5 MHz, shaping function as in (10) with a linear noise-power trade-off
the calculated and simulated results are -119 dBc/Hz and (Fig. 16), we can make a number of observations.
RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS 343

How much can the phase noise be lowered by scaling REFERENCES


device dimensions? If the gate oxide of MOSFET’s is re-
[l] A. A. Abidi and R. G. Meyer, “Noise in relaxation oscillators,” IEEE
duced indefinitely, their transconductance becomes relatively J. Solid-state Circuits, vol. SC-18, pp. 794-802, Dec. 1983.
independent of their dimensions, approaching roughly that [2] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in
cmos ring oscillators,” in Proc. ISCAS, June 1994.
of bipolar transistors. Thus, in the gain stage of Fig. 24 [3] A. A. Abidi, “Direct conversion radio tranceivers for digital communi-
the transconductance of A41 and A42 (for 1 ~ = 1 1 ~ 2= cations,” in ZSSCC Dig. Tech. Papers, Feb. 1995, pp. 186187.
395 pA) would go from (214 O)-’ to (66 s2)-’. Scaling down [4] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,”
Proc. IEEE, pp. 329-330, Feb. 1966.
the load resistance proportionally and assuming a constant [5] B. Razavi, K. F. Lee, and R.-H. Yan, “Design of high-speed low-power
oscillation frequency, we can therefore lower the phase noise frequency dividers and phase-locked loops in deep submicron CMOS,”
IEEE J. Solid-state Circuits, vol. 30, pp. 101-109, Feb. 1995.
by 101og(214/66) M 5 dB. For the relaxation oscillator, on [6] Y. P. Tsividis, Operation and Modeling of the MOS Transistor. New
the other hand, the improvement is about IO dB. These are, York McGraw-Hill, 1987.
[7] J. A. Connelly and P. Choi, Macromodeling with SPICE. Englewood
of course, greatly simplified calculations, but they provide Cliffs, NJ: Prentice-Hall, 1992.
an estimate of the maximum improvement expected from [8] S. 0. Rice, “Mathematical analysis of random noise,’’ Bell System Tech.
technology scaling. In reality, short-channel effects, finite J., pp. 282-332, July 1944, and pp. 46156, Jan. 1945.
[9] P. Bolcato et al., “A new and efficient transient noise analysis technique
thickness of the inversion layer, and velocity saturation further for simulation of CCD image sensors or particle detectors,” in Proc.
limit the transconductance that can be achieved for a given CICC, 1993, pp. 14.8.1-14.8.4.
bias current. [lo] T. Kwasniewski, et al., “Inductorless oscillator design for personal
communications devices-A 1.2 pm CMOS process case study,” in
It is also instructive to compare the measured phase noise Proc. CICC, May 1995, pp. 327-330.
of the above ring oscillator with that of a 900-MHz three-stage
CMOS ring oscillator reported in [lo]. The latter employs
single-ended CMOS inverters with rail-to-rail swings in a 1.2-
pm technology and achieves a phase noise of -83 dBc/Hz at
100 kHz offset while dissipating 7.4 mW from a 5-V supply.
Assuming that

Relative Phase Noise K


(wg)zL 1
__
(35)
Behzad Razavi (S’87-M’91) received the B.Sc.
aw Vswingz IDD degree in electrical engineering from Tehran (Sharif)
University of Technology, Tehran, Iran, in 1985, and
where J&iIlg denotes the internal voltage swing and 100 is the M.Sc. and Ph.D. degrees in electrical engineer-
ing from Stanford University, Stanford, CA, in 1988
the total supply current, we can utilize the measured phase and 1991, respectively
noise of one oscillator to roughly estimate that of the other. From 1992 to 1996, he was a Member of Tech-
With the parameters of the 2.2-GHz oscillator and accounting nical Staff at AT&T Bell Laboratories, Holmdel,
NJ, where his research involved integrated circuit
for different voltage swings and supply currents, we obtain a design for communication systems He is now with
phase noise of approximately -93 dBc/Hz at 100 kHz offset Hewlett-Packard Laboratories, Palo Alto, CA. His
current interests include wireless transceivers,data conversion, clock recovery,
for the 900-MHz oscillator in [IO]. The 10 dB discrepancy is frequency synthesis, and low-voltage low-power circuits. He has been a Visit-
attributed to the difference in the minimum channel length, ing Lecturer at Princeton University, Princeton, NJ, and Stanford University.
l/f noise at 100 kHz, and the fact that the two circuits He is also a member of the Technical Program Committee of the International
Solid-state Circuits Conference. He has served as Guest Editor to the IEEE
incorporate different gain stages. JOURNAL OF SOLID-STATE CIRCUITS and International Journal of High Speed
Electronics and is currently an Associate Editor of JSSC. He is the author of
the book Principles of Data Conversion System Design (IEEE Press, 1995),
ACKNOWLEDGMENT and editor of Monolothic Phase-Locked Loops and Clock Recovery Circuits
The author wishes to thank V. Gopinathan for many illu- (IEEE Press, l996).
Dr. Razavi received the Beatrice Winner Award for Editorial Excellence
minating discussions and T. Aytur for providing the at the 1994 ISSCC, the best paper award at the 1994 European Solid-State
oscillator simulation and measurement results. Circuits Conference, and the best panel award at the 1995 ISSCC.
928 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 6, JUNE 1998

Correspondence
Corrections to “A General Theory of Comments on “A 64-Point Fourier Transform Chip for
Phase Noise in Electrical Oscillators” Video Motion Compensation Using Phase Correlation”1

Ali Hajimiri and Thomas H. Lee Kevin J. McGee

Abstract— The fast Fourier transform (FFT) processor of the above


The authors of the above paper1 have found an error in (19) on paper,1 contains many interesting and novel features. However, bit re-
p. 185. The factor of 8 in the denominator should be 4; therefore versed input/output FFT algorithms, matrix transposers, and bit reversers
(19) should read have been noted in the literature. In addition, lower radix algorithms
can be modified to be made computationally equivalent to higher radix
i2n 1 c2
algorithms. Many FFT ideas, including those of the above paper,1 can
also be applied to other important algorithms and architectures.
n
1f
Lf1!g = 10 1 log 4q2 n=0 2 :
max 1! I. INTRODUCTION
1
In the above paper, the authors present a fast Fourier transform
(FFT) processor that contains many interesting and novel features.
Noise power around the frequency n!0 + 1! causes two equal The mathematics in the above paper,1 describe a matrix computation
sidebands at !0 6 1!: However, the noise power at n!0 0 1! where both time inputs and frequency outputs are in bit-reversed
has a similar effect as mentioned in the paper. Therefore, twice the order. Bit-reversed input/output FFT algorithms, while not widely
power of noise at n!0 + 1! should be taken into account. This will known, are not new, having been previously described in [3]. Fig.
also change the 4 in the denominator of (21) to 2 to read 1, for example, is a 16-point, radix-4, undecimated, bit reversed
input/output, constant output geometry graph based on [3].
Lf1!g = 10 1 log 0q2rms 1 2in1 =1f
2 2
: The algorithm1 is also described as a decimation-in-time-and-
max 1!2 frequency (DITF) type, but the architecture appears to be based
on decimation-in-time (DIT). In the above paper,1 Figs. 4 and 10
Similarly, (24) must change, and its correct form is show a first calculation stage with unity twiddles before the butterfly
and a second and third calculation stage with prebutterfly twiddles.
2 2
1!1=f = !1=f 1 c0  !1=f 1 12 cc01 : Although the butterfly implementation of Fig. 51 may be unique,
20rms the use of prebutterfly twiddles in all three stages, along with unity
twiddles in the first, would seem to indicate DIT. The architecture1
This will result in the factor of 1/2 becoming redundant in (29), i.e., is also a pipeline and contains many elements common to this type
of processor, such as matrix transposers and bit reversers, as will be
Lf1!g = 10 1 log VkT2 1 Rp 1 (C!
1 !0 2

0 )2
1 1! : described below.
max

However, note that the discussion following (29) is still valid. II. MATRIX TRANSPOSERS AND BIT REVERSERS
The factor c02 =20rms
2
should be changed to (c0 =20rms )2 in the Block serial/parallel or parallel/serial converters, sometimes called
following instances: matrix transpose or corner turn buffers, are used in many systems.
1) p. 185, second column, last paragraph; They perform a matrix transpose on data blocks by exchanging rows
2) p. 190, second column, first paragraph; and columns. Fig. 2 (from [7]) shows, from upper left to lower right,
3) p. 190, second column, second paragraph. the flow of data through a 4 2 4 shift-based transposer. The rotator
Nevertheless, the expression used to calculate the 0rms to predict
lines show where data will be routed on the next clock cycle and
the output is the transpose of the input. The switching action was
phase noise of ring oscillators is based on a simulation that takes
noted in [7] and [8] and rotator designs can be found in [4], [7],
and [8]. Although Fig. 6(b)1 is also an 8 2 8 transposer, it is being
this effect into account automatically, and therefore the predictions
are still valid. The authors regret any confusion this error may have
used in a somewhat unusual way. By providing a complex (real and
caused.

Manuscript received February 27, 1998. Manuscript received January 31, 1997; revised March 5, 1998.
The authors are with the Center for Integrated Systems, Stanford University, The author was with the Naval Undersea Warfare Center, Newport, RI
Stanford, CA 94305-4070 USA. 02841 USA. He is now at 33 Everett Street, Newport, RI 02840 USA.
Publisher Item Identifier S 0018-9200(98)03730-5. Publisher Item Identifier S 0018-9200(98)03731-7.
1 A. Hajimiri and T. H. Lee, IEEE J. Solid-State Circuits, vol. 33, pp. 1 C. C. W. Hui, T. J. Ding, J. V. McCanny, and R. F. Woods, IEEE J.
179–194, Feb. 1998. Solid-State Circuits, vol. 31, pp. 1751–1761, Nov. 1996.

0018–9200/98$10.00  1998 IEEE


790 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Jitter and Phase Noise in Ring Oscillators


Ali Hajimiri, Sotirios Limotyrakis, and Thomas H. Lee, Member, IEEE

Abstract—A companion analysis of clock jitter and phase noise in long- and short-channel regimes of operation. Section VI
of single-ended and differential ring oscillators is presented. The describes the effect of substrate and supply noise as well as
impulse sensitivity functions are used to derive expressions for the the noise due to the tail-current source in differential struc-
jitter and phase noise of ring oscillators. The effect of the number
of stages, power dissipation, frequency of oscillation, and short- tures. Section VII explains the design insights obtained from
channel effects on the jitter and phase noise of ring oscillators is this treatment for low jitter/phase-noise design. Section VIII
analyzed. Jitter and phase noise due to substrate and supply noise summarizes the measurement results.
is discussed, and the effect of symmetry on the upconversion of
1/f noise is demonstrated. Several new design insights are given
for low jitter/phase-noise design. Good agreement between theory II. PHASE NOISE
and measurements is observed. The output of a practical oscillator can be written as
Index Terms—Design methodology, jitter, noise measurement,
oscillator noise, oscillator stability, phase jitter, phase-locked
(1)
loops, phase noise, ring oscillators, voltage-controlled oscillators.
where the function is periodic in 2 and and
model fluctuations in amplitude and phase due to internal
I. INTRODUCTION and external noise sources. The amplitude fluctuations are
significantly attenuated by the amplitude limiting mechanism,
D UE to their integrated nature, ring oscillators have be-
come an essential building block in many digital and
communication systems. They are used as voltage-controlled
which is present in any practical stable oscillator and is
particularly strong in ring oscillators. Therefore, we will
oscillators (VCO’s) in applications such as clock recovery focus on phase variations, which are not quenched by such
circuits for serial data communications [1]–[4], disk-drive read a restoring mechanism.
channels [5], [6], on-chip clock distribution [7]–[10], and As an example, consider the single-ended ring oscillator
integrated frequency synthesizers [10], [11]. Although they with a single current source on one of the nodes shown in
have not found many applications in radio frequency (RF), Fig. 1. Suppose that the current source consists of an impulse
they can be used for some low-tier RF systems. of current with area (in coulombs) occurring at time
Recently, there has been some work on modeling jitter This will cause an instantaneous change in the voltage of that
and phase noise in ring oscillators. References [12] and [13] node, given by
develop models for the clock jitter based on time-domain
(2)
treatments for MOS and bipolar differential ring oscillators,
respectively. Reference [14] proposes a frequency-domain
where is the effective capacitance on that node at
approach to find the phase noise based on an linear time-
the time of charge injection. This produces a shift in the
invariant model for differential ring oscillators with a small
transition time. For small the change in the phase is
number of stages.
proportional to the injected charge
In this paper, we develop a parallel treatment of frequency-
domain phase noise [15] and time-domain clock jitter for ring
(3)
oscillators. We apply the phase-noise model presented in [16]
to obtain general expressions for jitter and phase noise of the
where is the voltage swing across the capacitor and
ring oscillators.
The dimensionless function is
The next section briefly reviews the phase-noise model
the time-varying proportionality constant and is periodic in 2
presented in [16]. In Section III, we apply the model to timing
It is large when a given perturbation causes a large phase shift
jitter and develop an expression for the timing jitter of oscilla-
and small where it has a small effect [16]. Since thus
tors, while Section IV provides the derivation of a closed-form
represents the sensitivity of every point of the waveform to a
expression to calculate the rms value of the impulse sensitivity
perturbation, is called the impulse sensitivity function.
function (ISF). Section V introduces expressions for jitter and
The time dependence of the ISF can be demonstrated by
phase noise in single-ended and differential ring oscillators
considering two extreme cases. The first is when the impulse
Manuscript received April 8, 1998; revised November 2, 1998. is injected during a transition; this will result in a large phase
A. Hajimiri is with the California Institute of Technology, Pasadena, CA shift. As the other case, consider injecting an impulse while
91125 USA. the output is saturated to either the supply or the ground.
S. Limotyrakis and T. H. Lee are with the Center for Integrated Systems,
Stanford University, Stanford, CA 94305 USA. This impulse will have a minimal effect on the phase of the
Publisher Item Identifier S 0018-9200(99)04200-6. oscillator, as shown in Fig. 2.
0018–9200/99$10.00  1999 IEEE
HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS 791

Fig. 1. Five-stage inverter-chain ring oscillator.

Fig. 2. Effect of impulses injected during transition and peak.

Being interested in its phase we can treat an oscillator on the foregoing argument, we obtain the following time-
as a system that converts voltages and currents to phase. As dependent impulse response:
is evident from the discussion leading to (3), this system is
linear for small perturbations. It is also time variant, no matter (4)
how small the perturbations are.
Unlike amplitude changes, phase shifts persist indefinitely, where is a unit step.
since subsequent transitions are shifted by the same amount. Knowing the response to an impulse, we can calculate
Thus, the phase impulse response of an oscillator is a time- in response to any injected current using the superposition
varying step. Also note that as long as the introduced change integral
in the voltage due to the current impulse is small, the resultant
phase shift is linearly proportional to the injected charge, and
hence the transfer function from current to phase is linear.
The unit impulse response of the system is defined as the (5)
amount of phase shift per unit current impulse [16]. Based
792 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

where represents the noise current injected into the node


of interest. Note that the integration arises from the closed-
loop nature of the oscillator. The single-sideband phase-noise
spectrum due to a white-noise current source is given by [16]1

(6)

where is the rms value of the ISF, is the single-


sideband power spectral density of the noise current source,
and is the frequency offset from the carrier. In the case
of multiple noise sources injecting into the same node,
represents the total current noise due to all the sources and is
given by the sum of individual noise power spectral densities
[17]. If the noise sources on different nodes are uncorrelated,
the waveform (and hence the ISF) of all the nodes are the same
except for a phase shift, assuming identical stages. Therefore,
the total phase noise due to all noise sources is times Fig. 3. Clock jitter increasing with time.
the value given by (6) (or 2 times for a differential ring
oscillator).
From (5), it follows that the upconversion of low-frequency
noise, such as 1 noise, is governed by the dc value of the
ISF. The corner frequency between 1 and 1 regions in
the spectrum of the phase noise is called and is related to
the 1 noise corner through the following equation [16]:

(7)

where is the dc value of the ISF. Since the height of the


positive and negative lobes of the ISF is determined by the
slope of the rising and falling edges of the output waveform,
Fig. 4. RMS jitter versus measurement time on a log–log plot.
respectively, symmetry of the rising and falling edges can
reduce and hence the upconversion of 1 noise. is the variance of the uncertainty introduced by one stage
during one transition. Noting that is proportional to
III. JITTER the standard deviation of the jitter after seconds is [13]
In an ideal oscillator, the spacing between transitions is (8)
constant. In practice, however, the transition spacing will
be variable. This uncertainty is known as clock jitter and where is a proportionality constant determined by circuit
increases with measurement interval (i.e., the time de- parameters.
lay between the reference and the observed transitions), as Another instructive special case that is not usually consid-
illustrated in Fig. 3. This variability accumulation (i.e., “jitter ered is when the noise sources are totally correlated with one
accumulation”) occurs because any uncertainty in an earlier another. Substrate and supply noise are examples of such noise
transition affects all the following transitions, and its effect sources. Low-frequency noise sources, such as 1 noise, can
persists indefinitely. Therefore, the timing uncertainty when also result in a correlation between induced jitter on transitions
seconds have elapsed is the sum of the uncertainties over multiple cycles. In this case, the standard deviations rather
associated with each transition. than the variances add. Therefore, the standard deviation of the
The statistics of the timing jitter depend on the correlations jitter after seconds is proportional to
among the noise sources involved. The case of each transi- (9)
tion’s being affected by independent noise sources has been
considered in [12] and [13]. The jitter introduced by each stage where is another proportionality constant. Noise sources
is assumed to be totally independent of the jitter introduced such as thermal noise of devices are usually modeled as
by other stages, and therefore the total variance of the jitter is uncorrelated, while substrate and supply-noise sources, as
given by the sum of the variances introduced at each stage. For well as low-frequency noise, are approximated as partially
ring oscillators with identical stages, the variance will be given or fully correlated sources. In practice, both correlated and
by where is the number of transitions during and uncorrelated sources exist in a circuit, and hence a log–log
1 A more accurate treatment [17] shows that the phase noise does not grow
plot of the timing jitter versus the measurement delay
without bound as fo approaches zero (it becomes flat for small values of for an open-loop oscillator will demonstrate regions with
fo ): However, this makes no practical difference in this discussion. slopes of 1/2 and 1, as shown in Fig. 4.
HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS 793

Fig. 5. ISF for ring oscillators of the same frequency with different number of stages.

In most digital applications, it is desirable for to


decrease at the same rate as the period In practice, we wish
to keep constant the ratio of the timing jitter to the period.
Therefore, in many applications, phase jitter, defined as

(10)

is a more useful measure.


An expression for can be obtained using (5). As shown
in Appendix A, for or where is an
integer, the phase jitter due to a single white noise source is
given by

(11)

Using (10) and (11), the proportionality constant in (8) is Fig. 6. Approximate waveform and ISF for ring oscillator.
calculated to be

(12)

IV. CALCULATION OF THE ISF FOR RING OSCILLATORS


To calculate phase noise and jitter using (6) and (12), one
needs to know the rms value of the ISF. Although one can Fig. 7. Relationship between rise time and delay.
always find the ISF through simulation, we obtain a closed-
form approximate equation for the rms value of the ISF of ring this relation should not be generalized to other points in time.
oscillators, which usually makes such simulations unnecessary. Also, the widths of the lobes of the ISF decrease as becomes
It is instructive to look at the actual ISF of ring oscillators to larger, since each transition occupies a smaller fraction of the
gain insight into what constitutes a good approximation. Fig. 5 period. Based on these observations, we approximate the ISF
shows the shape of the ISF for a group of single-ended CMOS as triangular in shape and with symmetric rising and falling
ring oscillators. The frequency of oscillation is kept constant edges, as shown in Fig. 6. The case of nonsymmetric rising
(through adjustment of channel length), while the number of and falling edges is considered in Appendix B.
stages is varied from 3 to 15 (in odd numbers). To calculate the The ISF has a maximum of 1 where is the
ISF, a narrow current pulse is injected into one of the nodes maximum slope of the normalized waveform in (1). Also, the
of the oscillator, and the resulting phase shift is measured a width of the triangles is approximately 2 , and hence the
few cycles later in simulation. slopes of the sides of the triangles are 1. Therefore, assuming
As can be seen, increasing the number of stages reduces the equality of rise and fall times, can be estimated as
peak value of the ISF. The reason is that the transitions of the
normalized waveform become faster for larger Since the
sensitivity during the transition is inversely proportional to the
slope, the peak of the ISF drops. It should be noted that only (13)
the peak of the ISF is inversely proportional to the slope, and
794 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Fig. 8. RMS values of the ISF’s for various single-ended ring oscillators versus number of stages.

On the other hand, stage delay is proportional to the rise time Equation (16) is valid for differential ring oscillators as
well, since in its derivation no assumption specific to single-
(14) ended oscillators was made. Fig. 9 shows the for three
sets of differential ring oscillators, with a varying number of
where is the normalized stage delay and is a proportion- stages (4–16). The data shown with plus signs correspond to
ality constant, which is typically close to one, as can be seen oscillators in which the total power dissipation and the drain
in Fig. 7. voltage swing are kept constant by scaling the tail-current
The period is 2 times longer than a single stage delay sources and load resistors as changes. Members of the
second set of oscillators have a fixed total power dissipation
(15) and fixed load resistors, which result in variable swings and
for whom data are shown with circles. The third case is
Using (13) and (15), the following approximate expression for that of a fixed tail current for each stage and constant load
is obtained: resistors, whose data are illustrated using crosses. Again, in
spite of the diverse variations of the frequency and other
circuit parameters, the 1 dependency of and its
(16) independence from other circuit parameters still holds. In the
case of a differential ring oscillator, which
Note that the 1 dependence of is independent of corresponds to is the best fit approximation for
the value of Fig. 8 illustrates for the ISF shown in This is shown with the solid line in Fig. 9. A similar result
Fig. 5 with plus signs on log–log axes. The solid line shows can be obtained for bipolar differential ring oscillators.
the line of which is obtained from (16) for Although decreases as the number of stages increases,
To verify the generality of (16), we maintain a one should not prematurely conclude that the phase noise can
fixed channel length for all the devices in the inverters while be reduced using a larger number of stages because the number
varying the number of stages to allow different frequencies of of noise sources, as well as their magnitudes, also increases for
oscillation. Again, is calculated, and is shown in Fig. 8 a given total power dissipation and frequency of oscillation.
with circles. We also repeat the first experiment with a different In the case of asymmetric rising and falling edges, both
supply voltage (3 V as opposed to 5 V), and the result is shown and will change. As shown in Appendix B, the 1
with crosses. As can be seen, the values of are almost corner of the phase-noise spectrum is inversely proportional
identical for these three cases. to the number of stages. Therefore, the 1 corner can be
It should not be surprising that is primarily a function reduced either by making the transitions more symmetric in
of because the effect of variations in other parameters, terms of rise and fall times or by increasing the number of
such as and device noise, have already been decoupled stages. Although the former always helps, the latter has other
from , and thus the ISF is a unitless, frequency- and implications on the phase noise in the 1 region, as will be
amplitude-independent function. shown in the following section.
HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS 795

Fig. 9. RMS values of the ISF’s for various differential ring oscillators versus number of stages.

V. EXPRESSIONS FOR JITTER AND where


PHASE NOISE IN RING OSCILLATORS
(19)
In this section, we derive expressions for the phase noise
and jitter of different types of ring oscillators. Throughout and
this section, we assume that the symmetry criteria required to
(20)
minimize (and hence the upconversion of 1 noise) are
already met and that the jitter and phase noise of the oscillator
are dominated by white noise. For CMOS transistors, the drain and is the gate overdrive in the middle of transition, i.e.,
current noise spectral density is given by
During one period, each node is charged to and
then discharged to zero. In an -stage single-ended ring
(17) oscillator, the power dissipation associated with this process
is However, during the transitions, some extra
current, known as crowbar current, is drawn from the supply,
where is the zero-bias drain source conductance, is which does not contribute to charging and discharging the
the mobility, is the gate-oxide capacitance per unit area,
capacitors and goes directly from supply to ground through
and are the channel width and length of the device,
both transistors. In a symmetric ring oscillator, these two
respectively, and is the gate voltage overdrive. The components are approximately equal, and their difference will
coefficient is 2/3 for long-channel devices in the saturation depend on the ratio of the rise time and stage delay. Therefore,
region and typically two to three times greater for short-
the total power dissipation is approximately given by
channel devices [18]. Equation (17) is valid in both short-
and long-channel regimes as long as an appropriate value for (21)
is used.
Assuming to make the waveforms symmetric
to the first order, we have
A. Single-Ended CMOS Ring Oscillators
We start with a single-ended CMOS ring oscillator with (22)
equal-length NMOS and PMOS transistors. Assuming that
the maximum total channel noise from NMOS where is the delay of each stage and and are the
and PMOS devices, when both the input and output are at rise and fall time, respectively, associated with the maximum
is given by slope during a transition.
Assuming that the thermal noise sources of the different
devices are uncorrelated, and assuming that the waveforms
(18) (and hence the ISF) of all the nodes are the same except for a
phase shift, the total phase noise due to all noise sources is
796 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

times the value given by (6). Taking only these inevitable Using (28) and (29), we obtain the same expressions for
noise sources into account, (6), (16), (18), (21), and (22) result phase noise and jitter as given by (23) and (24), except for
in the following expressions for phase noise and jitter: a new

(23) (30)

which results in a larger phase noise and jitter than the long-
(24)
channel case by a factor of Again, note the absence
of any dependency on the number of stages.
where is the characteristic voltage of the device. For
long-channel mode of operation, it is defined as
Any extra disturbance, such as substrate and supply B. Differential CMOS Ring Oscillators
noise, or noise contributed by extra circuitry or asymmetry in Now consider a differential MOS ring oscillator with resis-
the waveform will result in a larger number than (23) and (24). tive load. The total power dissipation is
Note that lowering threshold voltages reduces the phase noise,
(31)
in agreement with [12]. Therefore, the minimum achievable
phase noise and jitter for a single-ended CMOS ring oscillator, where is the number of stages, is the tail bias current
assuming that all symmetry criteria are met, occurs for zero of the differential pair, and is the supply voltage. The
threshold voltage frequency of oscillation can be approximated by

(25) (32)

(26) Surprisingly, tail-current source noise in the vicinity of


does not affect the phase noise. Rather, its low-frequency noise
As can be seen, the minimum phase noise is inversely propor- as well as its noise in the vicinity of even multiples of the
tional to the power dissipation and grows quadratically with oscillation frequency affect the phase noise. Tail noise in the
the oscillation frequency. Further, note the lack of dependence vicinity of even harmonics can be significantly reduced by a
on the number of stages (for a given power dissipation variety of means, such as with a series inductor or a parallel
and oscillation frequency). Evidently, the increase in the capacitor. As before, the effect of low-frequency noise can
number of noise sources (and in the maximum power due be minimized by exploiting symmetry. Therefore, only the
to the higher transition currents required to run at the same noise of the differential transistors and the load are taken into
frequency) essentially cancels the effect of decreasing as account. The total current noise on each single-ended node is
increases, leading to no net dependence of phase noise on given by
This somewhat surprising result may explain the confusion
that exists regarding the optimum , since there is not a strong
dependence on the number of stages for single-ended CMOS
ring oscillators. Note that (25) and (26) establish the lower
(33)
bound and therefore should not be used to calculate the phase
noise and jitter of an arbitrary oscillator, for which (6) and
(12) should be used, respectively. where is the load resistor, for a
We may carry out a similar calculation for the short-channel balanced stage in the long-channel limit and
case. For such devices, the drain current may be expressed as in the short-channel regime. The phase noise and jitter due
to all 2 noise sources is 2 times the value given by (6)
(27) and (12). Using (16), the expression for the phase noise of the
differential MOS ring oscillator is
where is the critical electric field and is defined as the value
of electric field resulting in half the carrier velocity expected
from low field mobility. Combining (17) with (27), we obtain
(34)
the following expression for the drain current noise of a MOS
device in short channel: and is given by

(28)
(35)
The frequency of oscillation can be approximated by
Equations (34) and (35) are valid in both long- and short-
channel regimes of operation with the right choice of
Note that, in contrast with the single-ended ring oscillator,
(29) a differential oscillator does exhibit a phase noise and jitter
dependency on the number of stages, with the phase noise
HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS 797

degrading as the number of stages increases for a given fre-


quency and power dissipation. This result may be understood
as a consequence of the necessary reduction in the charge
swing that is required to accommodate a constant frequency
of oscillation at a fixed power level as increases. At the
same time, increasing the number of stages at a fixed total
power dissipation demands a proportional reduction of tail-
current sources, which will reduce the swing, and hence ,
by a factor of 1

C. Bipolar Differential Ring Oscillator


Fig. 10. Phasors for noise contributions from each source.
A similar approach allows us to derive the corresponding
results for a bipolar differential ring oscillator. In this case,
the power dissipation is given by (31) and the oscillation
frequency by (32). The total noise current is given by the
sum of collector shot noise and load resistor noise

(36)

where is the electron charge, is the collector


current during the transition, and Using these
relations, the phase noise and jitter of a bipolar ring oscillator
are again given by (34) and (35) with the appropriate choice
of

VI. OTHER NOISE SOURCES


Other noise sources, such as tail-current source noise in a
differential structure, or substrate and supply noise sources, Fig. 11. Sideband power below carrier for equal sources on all five nodes
may play an important role in the jitter and phase noise of at nf0 + m
f :

ring oscillators. The low-frequency noise of the tail-current


source affects phase noise if the symmetry criteria mentioned
Expanding the term in brackets in a Fourier series, we can
in Section II are not met by each half circuit. In such cases,
show that it is zero except at dc and multiples of i.e.,
the ISF for the tail-current source has a large dc value, which
increases the upconversion of low-frequency noise to phase
noise. This upconversion is particularly prominent if the tail (38)
device has a large 1 noise corner.
Substrate and supply noise are among other important where is the th Fourier coefficient of the ISF. Equation (38)
sources of noise. There are two major differences between means that for identical sources, only noise in the vicinity of
these noise sources and internal device noise. First, the power integer multiples of affects the phase.
spectral density of these sources is usually nonwhite and often To verify this effect, sinusoidal currents with an amplitude
demonstrates strong peaks at various frequencies. Even more of 10 A were injected into all five nodes of the five-stage ring
important is that the substrate and supply noise on different oscillator of Fig. 1 at different offsets from integer multiples
nodes of the ring oscillator have a very strong correlation. of the frequency of oscillation, and the induced sidebands were
This property changes the response of the oscillator to these measured. The measured sideband power with respect to the
sources. carrier is plotted in Fig. 11.
To understand the effect of this correlation, let us consider As can be seen in Fig. 11, only injection at low frequency
the special case of having equal noise sources on all the nodes and in the vicinity of the fifth harmonic are integrated, and
of the oscillator. If all the inverters in the oscillator are the show a 20 dB/dec slope. The effect of injection in the vicinity
same, the ISF for different nodes will only differ in phase by of harmonics that are not integer multiples of is much
multiples of as shown in Fig. 10. Therefore, the total smaller than at the integer ones. Ideally, there should be no
phase due to all the sources is given by superposition of (5) sideband induced by the injection in the vicinity of harmonics
that are not integer multiples of ; however, as can be seen
in Fig. 11, there is some sideband power due to the amplitude
response.
Low-frequency noise can also result in correlation between
(37) uncertainties introduced during different cycles, as its value
does not change significantly over a small number of periods.
798 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Therefore, the uncertainties add up in amplitude rather than The jitter and phase noise behavior are different for dif-
power, resulting in a region with a slope of one in the log–log ferential ring oscillators. As (34) suggests, jitter and phase
plot of jitter even in the absence of external noise sources noise increase with an increasing number of stages. Hence
such as substrate and supply noise. if the 1 noise corner is not large, and/or proper symmetry
measures have been taken, the minimum number of stages
(three or four) should be used to give the best performance.
VII. DESIGN IMPLICATIONS This recommendation holds even if the power dissipation is
One can use (23) and (34) to compare the phase-noise not a primary issue. It is not fair to argue that burning more
performance of single-ended and differential MOS ring os- power in a larger number of stages allows the achievement of
cillators. As can be seen for stages, the phase noise better phase noise, since dissipating the same total power in a
of the differential ring oscillator is approximately smaller number of stages results in better jitter/phase noise as
times larger than the phase noise of a single- long as it is possible to maximize the total charge swing.
ended oscillator of equal and Since the minimum Another insight one can obtain from (34) and (35) is
for a regular ring oscillator is three, even a properly that the jitter of a MOS differential ring oscillator at a
designed differential CMOS ring oscillator underperforms its given and is smaller than that of a differential
single-ended counterpart, especially for a larger number of bipolar ring oscillator, at least for today’s range of circuit
stages. This difference is even more pronounced if proper and process parameters. As we go to shorter channel lengths,
precautions to reduce the noise of the tail current are not the characteristic voltage for the MOS devices given by (30)
taken. However, the differential ring oscillator may still be becomes smaller, and thus phase noise degrades with scaling.
preferred in IC’s because of the lower sensitivity to substrate Bipolar ring oscillators do not suffer from this problem.
and supply noise, as well as lower noise injection into other LC oscillators generally have better phase noise and jitter
circuits on the same chip. The decision to use differential compared to ring oscillators for two reasons. First, a ring
versus single-ended ring oscillators should be based on both oscillator stores a certain amount of energy in the capacitors
of these considerations. during every cycle and then dissipates all the stored energy
The common-mode sensitivity problem in a single-ended during the same cycle, while an LC resonator dissipates only
ring oscillator can be mitigated to some extent by using two 2 of the total energy stored during one cycle. Thus, for a
identical ring oscillators laid out close to each other that given power dissipation in steady state, a ring oscillator suffers
oscillate out of phase because of small coupling inverters from a smaller maximum charge swing Second, in a ring
[19]. Single-ended configurations may be used in a less noisy oscillator, the device noise is maximum during the transitions,
environment to achieve better phase-noise performance for a which is the time where the sensitivity, and hence the ISF, is
given power dissipation. the largest [16].
As shown in Appendix B, asymmetry of the rising and
falling edges degrades phase noise and jitter by increasing
the 1 corner frequency. Thus, every effort should be taken VIII. EXPERIMENTAL RESULTS
to make the rising and falling edges symmetric. By properly The phase-noise measurements in this section were per-
adjusting the symmetry properties, one can suppress or even formed using three different systems: an HP 8563E spectrum
eliminate low-frequency-noise upconversion [16]. As shown in analyzer with phase-noise measurement capability, an RDL
[16], differential symmetry is insufficient, and the symmetry of NTS-1000A phase-noise measurement system, and an HP
each half circuit is important. One practical method to achieve E5500 phase-noise measurement system. The jitter measure-
this symmetry is to use more linear loads, such as resistors or ments were performed using a Tektronix CSA 803A commu-
linearized MOS devices. This method reduces the 1 noise nication signal analyzer.
upconversion and substrate and supply coupling [20]. Another Tables I–III summarize the phase-noise measurements. All
revealing implication, shown in Appendix A, is the reduction the reported phase-noise values are at a 1-MHz offset from
of the 1 corner frequency as increases. Hence for a the carrier, chosen to achieve the largest dynamic range in
process with large 1 noise, a larger number of stages may the measurement. Table I shows the measurement results for
be helpful. three different inverter-chain ring oscillators. These oscillators
One question that frequently arises in the design of ring are made of the CMOS inverters shown in Fig. 12(a), with no
oscillators is the optimum number of stages for minimum jitter frequency tuning mechanism. The output is taken from one
and phase noise. As seen in (23), for single-ended oscillators, node of the ring through a few stages of tapered inverters.
the phase noise and jitter in the 1 region is not a strong Oscillators number 1 and 2 are fabricated in a 2- m 5-V
function of the number of stages for single-ended CMOS ring CMOS process, and oscillator number 3 is fabricated in a 0.25-
oscillators. However, if the symmetry criteria are not well m 2.5-V process. The second column shows the number of
satisfied and/or the process has a large 1 noise, a larger stages in each of the oscillators. The ratios of the NMOS
will reduce the jitter. In general, the choice of the number and PMOS devices, as well as the supply voltages, the total
of stages must be made on the basis of several design criteria, measured supply currents, and the frequencies of oscillation
such as 1 noise effect, the desired maximum frequency of are shown next. The phase-noise prediction using (23) and
oscillation, and the influence of external noise sources, such (6), together with the measured phase noise, are shown in the
as supply and substrate noise, that may not scale with last three columns.
HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS 799

TABLE I
INVERTER-CHAIN RING OSCILLATORS

TABLE II
CURRENT-STARVED INVERTER-CHAIN RING OSCILLATORS

As an illustrative example, we will show the details of agreement with the measured results. The die photo of the chip
phase-noise calculations for oscillator number 3. Using (16) to containing these oscillators is shown in Fig. 13. The slightly
calculate the phase noise can be obtained from (6). We superior phase noise of the three-stage ring oscillator (number
calculate the noise power when the stage is halfway through 4) can be attributed to lower oscillation frequency and longer
a transition. At this point, the drain current is simulated to be channel length (and hence smaller ).
3.47 mA. An of 4 10 V/m and a of 2.5 is used in Table III summarizes the results obtained for differential
(28) to obtain a noise power of A Hz ring oscillators of various sizes and lengths with the inverter
The total capacitance on each node is fF, and topology shown in Fig. 12(c), covering a large span of frequen-
hence fC. There is one such noise source on cies up to 5.5 GHz. All these ring oscillators are implemented
each node; therefore, the phase noise is times the value in the same 0.25- m 2.5-V process, and all the oscillators,
given by (6), which results in MHz dBc/Hz. except the one marked with N/A, have the tuning circuit
Table II summarizes the data obtained for current-starved shown. The resistors are implemented using an unsilicided
ring oscillators with the cell structure shown in Fig. 12(b), polysilicon layer. The main reason to use poly resistors is to
all implemented in the same 0.25- m 2.5-V process. Ring reduce 1 noise upconversion by making the waveform on
oscillators with a different number of stages were designed each node closer to the step response of an RC network, which
with roughly constant oscillation frequency and total power is more symmetrical. The value of these load resistors and the
dissipation. Frequency adjustment is achieved by changing ratios of the differential pair are shown in Table III. A
the channel length, while total power dissipation control is fixed 2.5-V power supply is used, resulting in different total
performed by changing device width. The ratios of the power dissipations. As before, the measured phase noise is in
inverter and the tail NMOS and PMOS devices are shown good agreement with the predicted phase noise using (34) and
in Table II. The node is kept at while node (6). The die photo of oscillator number 26 can be found in
is at 0 V. The measured total current dissipation and Fig. 14.
the frequency of oscillation can be found in columns 7 and To illustrate further how one obtains the phase-noise pre-
8. Phase-noise calculations based on (23) and (6) are in good dictions shown in Table III, we elaborate on the phase-noise
800 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

TABLE III
DIFFERENTIAL RING OSCILLATORS

(a) (b) (c)


Fig. 12. Inverter stages for (a) inverter-chain ring oscillators, (b) current-starved inverter-chain ring oscillators, and (c) differential ring oscillators.

calculations for oscillator number 12. The noise current due Therefore, with an of 0.9, (34) predicts a phase noise of
to one of differential pair NMOS devices is given by (28). MHz dBc/Hz
The total capacitance on each node in the balanced case is Timing jitter for oscillator number 12 can be measured using
fF, and the simulated voltage swing is 1.208 V; the setup shown in Fig. 15. The oscillator output is divided
therefore, fC. In the balanced case, this current into two equal-power outputs using a power splitter. The CSA
is half of the tail current, i.e., mA, and therefore 803A is not capable of showing the edge it uses to trigger,
the noise current of the NMOS device has a single-sideband as there is a 21-ns minimum delay between the triggering
spectral density of A Hz The thermal transition and the first acquired sample. To be able to look
noise due to the load resistor is A Hz; at the triggering edge and perhaps the edges before that, a
therefore, the total current noise density is given by delay line of approximately 25 ns is inserted in the signal
A Hz For a differential ring oscillator with path in front of the sampling head. This way, one may look
stages, there is one such noise source on each node; at the exact edge used to trigger the signal. If the sampling
therefore, the phase noise is 2 times the value given by head and the power splitter were noiseless, this edge would
(6), which results in MHz dBc/Hz The total show no jitter. However, the power splitter and the sampling
power dissipation is mW, and head introduce noise onto the signal, which cannot be easily
HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS 801

Fig. 13. Die photograph of the current-starved single-ended oscillators.

Fig. 14. Die photograph of the 12-stage differential ring oscillator.

The effect of this excess jitter should be subtracted from the


jitter due to the DUT. Assuming no correlation between the
jitter of the DUT and the sampling head, the equivalent jitter
due to the DUT can be estimated by

(39)

where is the effective rms timing jitter, is the


measured rms jitter at a delay after the triggering edge,
and is the jitter on the triggering edge.
Fig. 16 shows the rms jitter versus the measurement delay
for oscillator number 12 on a log–log plot. The best fit for the
Fig. 15. Timing jitter measurement setup using CSA803A. data shown in Fig. 16 is Equations (12)
and (35) result in and
respectively. The region of the jitter plot with the slope of one
distinguished from the device under test (DUT)’s jitter. This can be attributed to the 1 noise of the devices, as discussed
extra jitter can be directly measured by looking at the jitter on at the end of Section VI.
the triggering edge. This edge can be readily identified since In a separate experiment, the phase noise of oscillator
it has lower rms jitter than the transitions before and after it. number 7 is measured for different values of and
802 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Fig. 16. RMS jitter versus measurement interval for the four-stage, 2.8-GHz differential ring oscillator (oscillator number 12).

on the phase noise and jitter at a given total power dissipation


and frequency of oscillation was shown for single-ended and
differential ring oscillators using the general expression for
the rms value of the ISF. The upconversion of low-frequency
1 was analyzed showing the effect of waveform asymmetry
and the number of stages. New design insights arising from
this approach were introduced, and good agreement between
theory and measurements was obtained.

APPENDIX A
RELATIONSHIP BETWEEN JITTER AND PHASE NOISE
The phase jitter is
(40)

where
Fig. 17. Phase noise versus symmetry voltage for oscillator number 7.
(41)
These bias voltages are chosen in such a way as
to keep a constant oscillation frequency while changing only Therefore
the ratio of rise time to fall time. The 1 corner of the
phase noise is measured for different ratios of the pullup and
pulldown currents while keeping the frequency constant. One
can observe a sharp reduction in the corner frequency at the (42)
point of symmetry in Fig. 17.
For a white-noise current source, the autocorrelation func-
tion is ; therefore
IX. CONCLUSION
An analysis of the jitter and phase noise of single-ended (43)
and differential ring oscillators was presented. The general
noise model, based on the ISF, was applied to the case of ring
which is
oscillators, resulting in a closed-form expression for the phase
noise and jitter of ring oscillators [(6), (23), (34)]. The model
was used to perform a parallel analysis of jitter and phase for (44)
noise for ring oscillators. The effect of the number of stages
HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS 803

Analog and digital designers prefer using phase noise and


timing jitter, respectively. The relationship between these two
parameters can be obtained by noting that timing jitter is the
standard deviation of the timing uncertainty

(45)
Fig. 18. Approximate waveform and the ISF for asymmetric rising and
where represents the expected value. Since the autocor- falling edges.
relation function of is defined as

(46) APPENDIX B
NONSYMMETRIC RISING AND FALLING EDGES
the timing jitter in (45) can be written as We approximate the ISF in this Appendix by the function
depicted in Fig. 18. The rms value of the ISF is
(47)

The relation between the autocorrelation and the power spec-


trum is given by the Khinchin theorem [21], i.e.,

(48)
(52)

where represents the power spectrum of There- where and are the maximum slope during the rising
fore, (47) results in the following relationship between clock and falling edge, respectively, and represents the asymmetry
jitter and phase noise: of the waveform and is defined as

(49) (53)

noting that
It may be useful to know that can be approximated
by for large offsets [22]. As can be seen from the
(54)
foregoing, the rms timing jitter has less information than the
phase-noise spectrum and can be calculated from phase noise
using (49). However, unless extra information about the shape Combining (52) and (54) results in the following:
of the phase-noise spectrum is known, the inverse is not
possible in general. (55)
In the special case where the phase noise is dominated
by white noise, and are given by (6) and (12). which reduces to (16) in the special case of i.e.,
Therefore, can be expressed in terms of phase noise in the symmetric rising and falling edges. The dc value of the ISF,
1 region as can be calculated from Fig. 18 in a similar manner and
is given by
(50)
(56)

where is the phase noise measured in the 1 Using (7), the 1 corner is given by
region at an offset frequency of and is the oscillation
frequency. Therefore, based on (8), the rms cycle-to-cycle jitter
(57)
will be given by

As can be seen for a constant rise-to-fall ratio, the 1 corner


(51)
decreases inversely with the number of stages; therefore, ring
oscillators with a smaller number of stages will have a larger
Note that for (50) and (51) to be valid, the phase noise at 1 noise corner. As a special case, if the rise and fall time
should be in the 1 region. are symmetric, , and the 1 corner approaches zero.
804 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

ACKNOWLEDGMENT [19] T. Kwasniewski, M. Abou-Seido, A. Bouchet, F. Gaussorgues, and J.


Zimmerman, “Inductorless oscillator design for personal communica-
The authors would like to thank M. A. Horowitz, G. Nasser- m
tions devices—A 1.2  CMOS process case study,” in Proc. CICC,
bakht, A. Ong, C. K. Yang, B. A. Wooley, and M. Zargari for May 1995, pp. 327–330.
[20] J. G. Maneatis and M. A. Horowitz, “Precise delay generation using cou-
helpful discussions and support. They would further like to pled oscillators,” IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282,
thank Texas Instruments, Inc., and Stanford Nano-Fabrication Dec. 1993.
facilities for fabrication of the oscillators. [21] W. A. Gardner, Introduction to Random Processes. New York:
McGraw-Hill, 1990.
[22] W. F. Egan, Frequency Synthesis by Phase Lock. New York: Wiley,
1981.
REFERENCES
[1] L. DeVito, J. Newton, R. Croughwell, J. Bulzacchelli, and F. Benkley,
“A 52 and 155 MHz clock-recovery PLL,” in ISSCC Dig. Tech. Papers, Ali Hajimiri received the B.S. degree in electronics
Feb. 1991, pp. 142–143. engineering from Sharif University of Technology,
[2] A. W. Buchwald, K. W. Martin, A. K. Oki, and K. W. Kobayashi, “A 6-
Tehran, Iran, in 1994 and the M.S. and Ph.D.
GHz integrated phase-locked loop using AlCaAs/Ga/As heterojunction
degrees in electrical engineering from Stanford Uni-
bipolar transistors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1752–1762,
versity, Stanford, CA, in 1996 and 1998, respec-
Dec. 1992.
[3] B. Lai and R. C. Walker, “A monolithic 622 Mb/s clock extraction data tively.
retiming circuit,” in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 144–144. He was a Design Engineer with Philips, where
[4] R. Farjad-Rad, C. K. Yang, M. Horowitz, and T. H. Lee, “A 0.4 mm he worked on a BiCMOS chipset for GSM cellular
CMOS 10 Gb/s 4-PAM pre-emphasis serial link transmitter,” in Symp. units from 1993 to 1994. During the summer of
VLSI Circuits Dig. Tech Papers, June 1998, pp. 198–199. 1995, he was with Sun Microsystems, where he
[5] W. D. Llewellyn, M. M. H. Wong, G. W. Tietz, and P. A. Tucci, “A worked on the UltraSparc microprocessor’s cache
33 Mbi/s data synchronizing phase-locked loop circuit,” in ISSCC Dig. RAM design methodology. During the summer of 1997, he was with Lucent
Tech. Papers, Feb. 1988, pp. 12–13. Technologies (Bell Labs), where he investigated low-phase-noise integrated
[6] M. Negahban, R. Behrasi, G. Tsang, H. Abouhossein, and G. Bouchaya, oscillators. In 1998, he joined the Faculty of the California Institute of
“A two-chip CMOS read channel for hard-disk drives,” in ISSCC Dig. Technology, Pasadena, as an Assistant Professor. His research interests are
Tech. Papers, Feb. 1993, pp. 216–217. high-speed and RF integrated circuits. He is coauthor of The Design of Low
[7] M. G. Johnson and E. L. Hudson, “A variable delay line PLL for CPU- Noise Oscillators (Boston, MA: Kluwer Academic, 1999).
coprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp. Dr. Hajimiri was the Bronze Medal Winner of the 21st International
1218–1223, Oct. 1988. Physics Olympiad, Groningen, the Netherlands. He was a corecipient of the
[8] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator International Solid-State Circuits Conference 1998 Jack Kilby Outstanding
with 5–110 MHz of lock range for microprocessors,” IEEE J. Solid-State Paper Award.
Circuits, vol. 27, pp. 1599–1607, Nov. 1992.
[9] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, “A wide-
bandwidth low-voltage PLL for PowerPCTM microprocessors,” IEEE
J. Solid-State Circuits, vol. 30, pp. 383–391, Apr. 1995.
[10] I. A. Young, J. K. Greason, J. E. Smith, and K. L. Wong, “A PLL clock Sotirios Limotyrakis was born in Athens, Greece,
generator with 5–110 MHz lock range for microprocessors,” in ISSCC in 1971. He received the B.S. degree in electrical
Dig. Tech. Papers, Feb. 1992, pp. 50–51. engineering from the National Technical University
[11] M. Horowitz, A. Chen, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung, of Athens in 1995 and the M.S. degree in electri-
W. Richardson, T. Thrush, and Y. Fujii, “PLL design for a 500 Mb/s cal engineering from Stanford University, Stanford,
interface,” in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 160–161. CA, in 1997, where he currently is pursuing the
[12] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in Ph.D. degree.
CMOS ring oscillators,” in Proc. ISCAS, June 1994. In the summer of 1993, he was with K.D.D.
[13] J. McNeill, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits, vol. Corp., Saitama R&D Labs, Japan, where he worked
32, pp. 870–879, June 1997. on the design of communication protocols. During
[14] B. Razavi, “A study of phase noise in CMOS oscillators,” IEEE J. the summers of 1996 and 1997, he was with the
Solid-State Circuits, vol. 31, pp. 331–343, Mar. 1996. Texas Instruments Inc. R&D Center, Dallas, TX, where he focused on
[15] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Phase noise in multi- LNA, low-phase-noise oscillator design, and GSM mobile unit transmit path
gigahertz CMOS ring oscillators,” in Proc. Custom Integrated Circuits architectures. His current research interests include the design of mixed-signal
Conf., May 1998, pp. 49–52. circuits for high-speed data conversion and broad-band communications.
[16] A. Hajimiri and T. H. Lee, “A general theory of phase noise in electrical Mr. Limotyrakis received the W. Burgess Dempster Memorial Fellowship
oscillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–194, Feb. from the School of Engineering, Stanford University, in 1995.
1998.
[17] , The Design of Low Noise Oscillators. Boston, MA: Kluwer
Academic, 1999.
[18] A. A. Abidi, “High-frequency noise measurements of FET’s with small
dimensions,” IEEE Trans. Electron Devices, vol. ED-33, pp. 1801–1805, Thomas H. Lee (S’87–M’87), for a photograph and biography, see p. 585 of
Nov. 1986. the May 1999 issue of this JOURNAL.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998 483

On-Chip Measurement of the Jitter Transfer


Function of Charge-Pump Phase-Locked Loops
Benoı̂t R. Veillette, Student Member, IEEE, and Gordon W. Roberts, Member, IEEE

Abstract— An all-digital technique for the measurement of


the jitter transfer function of charge-pump phase-locked loops
(PLL’s) is introduced. Input jitter may be generated using one
of two methods. Both rely on delta–sigma modulation to shape
the unavoidable quantization noise to high frequencies. This
noise is filtered by the low-pass characteristic of the device
and has little impact on the test results. For an input–output
response measurement, the output jitter is compared against
a threshold. As the stimulus generation and output analysis
circuits are digital, do not require calibration, and demand a
small area overhead, this jitter transfer function measurement
scheme may be placed on the die to adaptively tune a PLL after
fabrication. The technique can also implement built-in self-test
(BIST) for the characterization or manufacture test of PLL’s.
The validity of the scheme was verified experimentally with off-
the-shelf components.
Fig. 1. Jitter transfer function test setup.
Index Terms—Mixed analog-digital integrated circuits, phase-
locked loops, self-testing, semiconductor device testing, sigma–
delta modulation. These circuits should not necessitate calibration themselves.
Furthermore, the silicon area overhead should be small.
Fig. 1 shows a typical test setup for the measurement
I. INTRODUCTION of the jitter transfer function in a laboratory [2]. A signal

P HASE-LOCKED loops (PLL’s) operating on digital sig-


nals are fundamental in microelectronic systems. Indeed,
they realize essential functions such as clock distributions and
source generates a high-frequency carrier which is phase
modulated with the source output of a spectrum analyzer
using an Armstrong phase modulator [3]. This jittery signal
clock recovery. They can thus be found in large digital circuits is fed to the clock input of a data generator. The recovered
such as microprocessors and on mixed-signal IC’s for digital signal from the PLL is down-modulated and observed on the
communications. These devices process digital signals where spectrum analyzer. It can be seen that this procedure requires
the phase information is contained in the transition times. They precision analog signal sources and instruments. Looking
usually make use of sequential phase detectors which require for shortcuts, engineers are tempted to break the task and
charge-pumps and are thus called charge-pump PLL’s [1]. measure components independently [4] to infer the device
The specifications for these PLL’s are extremely aggressive, characteristics. However, the nature of the phase-locked loop
especially when embedded in high-speed digital communica- renders this solution unattractive as the tight feedback and
tion systems. Process variations can adversely affect circuit the sensitivity of some nodes to parasitics makes it difficult
performance and result in low yield. To increase the number to relate the values of components to the PLL behavior.
of good parts, some of the components can be trimmed. This A significant level of accuracy may only be achieved by
is, however, a very expensive process. Furthermore, aging or measuring the system as a whole.
different operating conditions can later affect circuits such that In this paper we propose to verify charge-pump PLL’s
they no longer meet specifications. The solution is to allow characteristics using mostly synchronous digital circuits. It
the PLL to self-calibrate. With this property, the expensive implies that the stimulus signals can only change at the clock
trimming stage can be avoided and changes in the operating edges and that the output signals may only be sampled at
conditions can be tracked. However, the challenge now is to the same clock edges. This would seem like an unbearable
integrate on-chip a characterization scheme. This involves two constraint as jitter, both created and measured, is quantized
functional blocks: a stimulus source and a response analyzer. to the test clock period. The results of any test would thus
be severely limited in precision. However, this hurdle can be
Manuscript received July 15, 1997; revised October 20, 1997. This work
was supported by NSERC and the Micronet, a Canadian federal network overcome using low-pass delta–sigma ( ) modulation [5].
of centers of excellence dealing with microelectronic devices, circuits, and This technique can encode high quality signals, in this case
systems for ultra large scale integration. a sinewave, on one or a few bits. The quantization noise
The authors are with Microelectronics and Computer Systems Laboratory,
McGill University, Montréal, PQ H3A 2A7, Canada. introduced in the operation is shaped to high frequencies.
Publisher Item Identifier S 0018-9200(98)01021-X. Since PLL’s are low-pass, high-frequency jitter is filtered
0018–9200/98$10.00  1998 IEEE
484 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

Fig. 2. Digital phase locked loop functional block diagram. (a)

out. The output will thus exhibit a sufficiently high SNR to


be considered a pure sinewave. Therefore, the input high-
frequency jitter noise does not affect test results. This filtering
principle was demonstrated for a voice codec, another low-pass
circuit [6].
(b)
Two methods will be presented to create jitter. The first
one is the digital modulation of the edges of a reference Fig. 3. Models of charge-pump phase-locked loop: (a) continuous time and
(b) discrete time.
signal using a higher frequency clock. The alternative is the
injection of a sinusoidal signal in the loop through a second
charge pump. The second method requires a lower clock The continuous-time linear model of this PLL in steady-state
frequency but the first one is nonintrusive. Contrary to the operation is illustrated in Fig. 3(a). The variables and
usual measurement procedure, a fixed output jitter is selected are the phase of the reference signal and the VCO output
and the amplitude of the input jitter at a given frequency is signal, respectively. It should be understood that while these
varied. Ultimately, the test signal amplitude that results in the variables represent jitter in a signal, the other variables in the
selected output jitter is obtained and used to compute the jitter circuit stand for either voltage or current. The phase detector
transfer function. It is important to note that while both jitter converts phase to one of the two analog quantities while the
creation schemes are digital, the jitter loop injection method VCO performs the complementary operation. In Fig. 3(a), the
does not require a test clock frequency larger than the PLL parameter is the composite gain of the phase detector and
operating frequency. Nevertheless, as will be shown, the test charge-pump circuits expressed in A/rad or V/rad depending
accuracy can be increased with the digital phase modulation on the type of charge pump. The transfer function of the loop
and a higher clock frequency. filter is denoted . The gain of the VCO is labeled
The outline of the paper is the following. First, an overview and is stated in rad/V. If a counter is present, then its effect
of the PLL will be provided in Section II to establish the is lumped into this constant by dividing the VCO intrinsic
notation. In Sections III and IV, the two methods for stimulat- gain by the counter length ( in Fig. 2). With this operation,
ing the device will be explained. Section V will study signal a counter in the PLL becomes transparent for the proposed
generation using delta–sigma modulation. The analysis of the measurement method.
output signal as well as the test methodology will be the topics The continuous-time model allows satisfactory predictions
of Section VI. Section VII will briefly look at the issue of of the PLL behavior. On the other hand, the phase of digital
accuracy. Experimental setup and results will be discussed in signals is contained in the signal transitions and is thus
Sections VIII and IX, respectively. Overhead will be addressed better represented as a discrete-time sequence. Therefore,
in Section X. Finally, conclusions will be drawn. the analysis of PLL operating on digital signals should be
performed using difference equations or -transform tools.
II. PHASE-LOCKED LOOP OVERVIEW Indeed, it has been shown that the discrete-time model is more
accurate, especially at high jitter frequency [7]. This model is
A block diagram of a simple charge pump PLL is shown shown in Fig. 3(b). The closed-loop equation governing the
in Fig. 2. To the left, a sequential phase detector compares operation of the PLL is
the transitions of the reference input and voltage-controlled
oscillator (VCO) output signals. A two-output phase-frequency
detector (PFD) is illustrated, but other types of sequential (1)
phase detectors may also be employed. The output of the PFD
can be any of three logic states, and thus a charge pump is In this equation, the discrete-time transfer function is
required for digital-to-analog conversion. The charge pump the impulse invariant transform of the series combination of
can be of the current or voltage type. The jitter signal injection the loop filter and VCO transfer functions ( ). The
technique of Section V relies on current charge pumps, but it function is labeled the jitter transfer function. Many
could also be adapted for a voltage charge pump as long as the PLL specifications can be extracted from this transfer function
filter allows summation of voltages. Referring back to Fig. 2, such as the jitter bandwidth and the jitter peaking. The latter
the low-pass filter removes short term variations and shapes measure corresponds to the maximum value of .
the PLL characteristics. The VCO in turn generates a square Process variations can significantly alter the charge pump
wave whose frequency depends on the level of its analog input. and VCO gains as well as the filter passive components values.
A counter may be inserted in the feedback loop to lock the The method we propose can be used to automatically trim
VCO clock to a lower frequency reference signal. components for compliance with a desired . Alterna-
VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION 485

tively, it can be used to screen devices by comparing the


measured magnitude of against a mask.

III. INPUT DIGITAL PHASE MODULATION


The first method that we will consider to generate the
test stimulus is the digital phase modulation of the PLL
input. Sinusoidal jitter of arbitrary frequency and amplitude
can be generated by modifying the edges of the reference
signal digitally. A test clock frequency which is a multiple of
the reference signal frequency is a prerequisite. This method Fig. 4. Digital phase modulation circuit.
therefore may not be suitable for high-frequency PLL’s. The
principle is to control the instantaneous phase of a 1-b digital
signal by delaying it by an amount set by a multibit digital
signal. Fig. 4 illustrates a circuit that can realize this operation.
The input signal to the delay cells string is a pulse whose
period is an integer multiple of the test clock period. The
digital signal generator output, denoted , is updated at
every PLL cycle and delays the input signal by a variable Fig. 5. Typical digital phase modulated waveform.
amount. Fig. 5 shows a typical waveform at the output of this
circuit for a test clock ( ) eight times the reference signal
frequency ( ). The dotted lines represent the test clock edges
with the thick ones used as zero jitter marks. The PLL input
signal jitter can thus be expressed as

rad (2) (a) (b)


Fig. 6. Phase spectrum: (a) input signal jitter and (b) output signal jitter.
Obviously, the number of inputs to the multiplexer is limited
by the silicon area available but also by the ratio of the fre-
quency of the clock operating the delay cells to the frequency
of the reference signal. The limited number of delay cells is
likely to make the jitter of the input signal coarsely
quantized such that its SNR is unacceptably low. However,
PLL’s are frequency selective with respect to jitter. Indeed,
the jitter transfer function is low-pass and the device filters
high-frequency jitter. When incorporating a modulator
in the signal generator, quantization noise can be shaped to
high frequencies [5]. Therefore, the encoded signal from a Fig. 7. Discrete-time model of phase-locked loop modified for signal injec-
low-pass modulator, such as in Fig. 5, contains tion.
a high-quality low-frequency sinewave and high-frequency
quantization noise as illustrated in Fig. 6(a). While is jitter. The PLL reference signal meanwhile is a square wave
a multibit signal as it controls an multiplexer, and therefore the input jitter will be zero. This setup
modulation can ultimately encode signals on a single bit. will be used to evaluate the characteristics of the PLL through
Because the PLL is low-pass and is designed to reject high- the measure of its transfer function. Examining
frequency components in the loop, the quantization noise will the model of Fig. 7, it can be seen that this transfer function
be suppressed, leaving only the sinusoid in the output jitter as will be equal to
shown in Fig. 6(b). Section V will examine how such a signal
can be generated.
(3)

IV. SIGNAL INJECTION IN THE LOOP It is thus equivalent within a multiplicative constant to the jitter
The second option for the generation of jitter is the injection transfer function . For a spectral test such as the
of a test signal at the input of the loop filter as shown in jitter transfer function, the test signal is a sinewave encoded
Fig. 7. This signal source, represented here by the variable into a single bit. The quantization noise is concentrated at high
, is injected through a second charge pump with gain frequencies and is filtered out as explained in the previous
. It should be understood that this signal source is not a section. It is important to note that the clock period for signal
jittery digital signal but an analog signal embedded in a 1-b injection must be an integer multiple of the reference signal
digital signal encoded using a modulator. However, this period to prevent aliasing of the quantization noise back in
signal, when referred back to the input, is equivalent to input the PLL passband. This condition implies that the signal
486 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

(a)

Fig. 8. Injecting a signal into the PLL.

injection frequency cannot be higher than the reference signal


frequency. Converting the 1-b digital stream to an analog
signal and summing it with the output of the phase detector
is quite simple. A second current charge pump is placed
in parallel with the phase detector charge pump and both
outputs are connected together, forming a current summing
node as shown in Fig. 8. The accuracy of this analog-to-
digital conversion is a function of the matching of the two
current sources and , typically ranging between 0.1–1%
in monolithic form. In this schematic, the impedance
implements the loop filter and is the controlling voltage (b)
of the VCO. Fig. 9. Spectrum of test jitter signals: (a) signal band and (b) Nyquist
interval.
V. NOISE-SHAPED SINUSOID GENERATION
The basis of the jitter generation methods described in the the curve downward in Fig. 9 as decreases. Three possible
previous two sections is low-pass delta–sigma modulation. methods, illustrated in Fig. 10, may be employed on-chip to
This technique allows the encoding of a bandlimited signal generate any of the signals of Fig. 9. The selection of one of
represented by a large number of bits onto a very small number these for implementation is based on the available resources
of bits. An error is created by this quantization operation but it in the system, silicon area, PLL speed, and required accuracy.
is filtered such that it appears mostly at high frequencies. This Each method is described below.
feat is realized by feeding back the quantization error to a filter
and summing it with the input before the next quantization. A. Direct Frequency Synthesis
Using delta–sigma modulation, arbitrary signals, such as a
sinusoid, can be represented by a square wave (1-b signal) The most straightforward and versatile signal generation
with the difference being composed mostly of high-frequency scheme is a ROM-based digital frequency synthesizer [8]
noise. followed by a low-pass modulator. The output of the
Fig. 9 shows the spectrum of a sinusoid for jitter generation modulator may be a 1-b signal or a multibit signal [9]
coded using different quantizers. Two curves show the power as required by the jitter creation technique used. However,
density spectrum of multibit signals for the digital phase the direct frequency synthesis solution requires a large silicon
modulation method. They were generated for different test area and is thus usually not a good choice for built-in self-
clock ratios ( ) and thus different quantization steps ( ). test (BIST). However, if a large block of RAM is present in
The last curve shows a 1-b signal for the signal injection the system, then it could be enrolled to store data for signal
generation during the PLL test phase.
method. The same second-order low-pass modulator was
used to produce each curve, only the quantizer was changed.
The input signal amplitude was kept the same in each case. As B. Delta–Sigma Oscillator
explained previously, the signal is located at low frequencies An alternative is to use a circuit called a low-pass
where the noise power is small. It can also be seen that the oscillator [10]. The circuit, illustrated in Fig. 10(b), is a
quality of the signal improves with the number of quantization digital resonator where the frequency setting multiplier has
steps (smaller step size ). This is evident by the drift in been replaced by the combination of a modulator and
VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION 487

taken. Some form of optimization is necessary to obtain good


results [12]. Also, a different data stream is required for each
frequency and amplitude desired. Nonetheless, considerable
(a) speed can be achieved with this technique, and the overhead
is the smallest of the three methods.

VI. EVALUATION OF THE OUTPUT JITTER


The exact gauging of the jitter response is rather
difficult. However, a measure that can be made with good
accuracy is the point when reaches a predetermined
value or threshold. The edges of the test clock, assumed to
be jitter free, will be used to establish this threshold. Fig. 11
(b) illustrates how this can be accomplished. The dashed lines
represent the rising edge of the test clock with the bold ones
indicating the rising transition of the reference clock. The
output signal is sampled at positive test clock edges until two
adjacent samples yield a zero followed by a one, indicating
a rising edge of the signal. If this rising edge occurred in
(c) intervals immediately before or after the reference test clock
Fig. 10. Generating delta–sigma encoded sinusoids: (a) direct digital fre- edge, then jitter is below threshold; otherwise, an error is
quency synthesis, (b) delta–sigma oscillator, and (c) fixed-length periodic generated. Over a time interval, the number of errors are
byte stream.
counted and a bit error rate (BER) measure can be obtained.
This averaging is done to prevent glitches and noise signals
from significantly affecting the final result. In Fig. 11, a ratio
of the test clock frequency over the reference signal frequency
of eight allows a minimum value for the threshold of .
Larger values could also be used for the threshold by allowing
more test clock cycles in the valid interval.
Fig. 11. Comparing waveform against jitter threshold. The previous method requires a test clock frequency at least
three times the reference frequency. Indeed, three periods of
a multiplexer. To avoid large multiplexers, the output of the the test clock for a PLL period is the limiting case as two of
oscillator is usually a single bit. Therefore, a second the clock periods must compose the valid interval, leaving one
modulator may be required for multibit signal generation. to catch jitter above threshold. A different scheme, however,
However, as the structure is similar to the first one except for can be used that compares jitter against a rad threshold
the quantizer, hardware can be time-shared. Because a ROM using a 50% duty cycle test clock of the same frequency
or a multiplier is not required, the silicon implementation of as the reference clock. Fig. 12(a) illustrates how it can be
oscillators is very efficient. However, the presence of implemented with a few gates and registers. The threshold
a nonlinear block in the feedback loop makes this device is verified by sampling a data signal with both the reference
difficult to predict using a linear model. Off-line simulations clock, used at the input of the PLL, and the recovered clock
are required to achieve maximum accuracy. Nevertheless, at the VCO output. This data signal will toggle between one
oscillators can be implemented quickly and may possibly use and zero at the reference signal falling edges, resulting in a
available on-chip computing resources such as a digital signal frequency half that of the reference clock. When the output
processor (DSP). jitter exceeds rad, sampling errors will occur with the VCO
output clock because it will sample a different data than the
reference clock. This is shown in Fig. 12(b) where an error
C. Fixed-Length Bit Streams occurs when the jitter goes from 0.9 to 1.1 . This circuit
The last method consists in generating a data stream from a is somewhat similar to the circuit typically used for a jitter
software sinewave generator and low-pass modulator and tolerance test [2].
then selecting a subset of this stream. This subset is stored in A single frequency point test is thus performed as follows:
memory and the resulting data stream is then repeated [11] an input jitter ( or ) is applied to the PLL and its
as illustrated in Fig. 10(c). One can view this approach as amplitude is modified until the maximum value resulting in
a special case of the ROM-based digital frequency synthesis output jitter below threshold is found. The fastest procedure
scheme presented above. This method is particularly useful for for obtaining or is to use a binary search algorithm.
single-bit signals as they can be represented using a very small The initial amplitude ( ) is zero and the initial increment
number of bits, on the order of a hundred. The downside is ( ) is 0.5. A test is performed with amplitude .
that signal quality will vary widely with different subsets of If the VCO output jitter is lower than the threshold, then
the same length of a given modulator output if care is not the next amplitude will be . Otherwise, the
488 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

(a)

(b)
Fig. 12. (a) Circuit to evaluate  rad jitter threshold. (b) Typical waveforms. Fig. 13. Signal-to-noise ratio of the output jitter versus the PLL relative
bandwidth.

amplitude remains the same ( ). The increment is


then divided by two and the procedure is repeated until the reference, digital data communication systems such as SONET
desired accuracy is achieved. The uncertainty associated with mandate a relative loop bandwidth of about 0.1%.
the signal amplitude following a step measure will be Also of concern is the amount of jitter present in the test
which equals 2 . However, arbitrary accuracy cannot be clock. However, the clock signal should be generated by a
achieved as noise sources are always present. tester and, provided a sound floorplan, this source of error
should be negligible. A more significant problem is the jitter,
both static and random, generated internally. It will add to the
VII. ACCURACY
jitter from the input signal and thus modify the effective output
The accuracy of the measured results will depend on many jitter threshold. Its effect can only be reduced by using a jitter
factors. Foremost is the residual input signal jitter quantization threshold much larger than the jitter noise. Alternatively, it
noise which passes through the PLL. The jitter creation clock could be accounted for and subtracted from the final results.
frequency along with the modulator noise transfer func- Finally, for the signal injection method, the matching of the
tion, the number of quantization bits, and the PLL bandwidth two charge pumps is obviously a cause of errors.
will influence the effective SNR of the measured jitter at the
output. It is interesting to note that for each frequency point
on the jitter transfer function, the SNR of the measured output VIII. EXPERIMENTAL SETUP
jitter will remain constant. This is because, under a white noise A test setup, whose schematic is shown in Fig. 14, was
assumption for the quantizer error, the noise present in the PLL implemented on a breadboard using off-the-shelf components.
input signal jitter does not vary for a selected modulator The device under test (DUT) is centered around the VCO from
implementation. It is only a function of the quantizer step a 74HC4046 monolithic phase-locked loop. However, because
and does not change with the signal amplitude and frequency. this IC uses a voltage charge pump followed by a passive
Neglecting internal noise sources, the PLL output jitter noise filter and since its phase detector could not be separated from
power density is a result of the input jitter noise power density this block, an XC4010 FPGA was programmed to implement
shaped by the jitter transfer function. It can thus be seen the classical phase-frequency detector [13]. The charge pump
that the PLL output jitter noise power is independent of the is built out of discrete NPN and PNP transistors, a resistor,
parameters of the test sinusoid. Furthermore, as the threshold and analog switches from a 74HC4066. A circuit is also
is fixed, the PLL output sinewave jitter amplitude must also required to maintain the transistors inside their linear region
be constant. In the measurement procedure, the input jitter when the switches are open, but it is not shown here. The
sinewave amplitude is varied to account for the response of PLL was operated at 100 kHz as parasitics of the board and
the PLL at a given frequency. Consequently, for a given PLL, the time constant of some components do not allow for a
the SNR of the measured output jitter is fully determined from higher frequency. However, the measurement scheme should
the quantizer step size for the input signal jitter and the jitter be extendable to much higher frequency as the test circuits are
threshold at the output. Fig. 13 shows the theoretical SNR of similar in nature to the PLL components.
the output jitter with respect to the ratio of the cutoff frequency Both digital signals for jitter creation ( and ) are
of a second-order PLL over the PLL lock frequency, referred generated by a low-pass oscillator programmed on the
to here as the PLL relative bandwidth. A curve is displayed same field-programmable gate array (FPGA). It uses 24-b
for each of the three input jitter quantization granularities. buses to achieve a tunability of 55 parts per million of
A second-order modulator is used to generate the input its programmable clock. A 3-b quantizer in addition to the
signal, and the output jitter amplitude is held constant at . For standard 1-b quantizer makes the signal generator capable of
VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION 489

Fig. 14. Experimental setup.

multibit output [refer to Fig. 10(b)] for the purpose of digital TABLE I
phase modulation. It should be noted that apart from the EXPERIMENT PARAMETERS
quantizer, modulator circuitry is not duplicated as it will
be operated in time-shared mode at double speed for multibit
operation.
The input to the PLL can be set to accommodate both jitter
generation methods. For the loop jitter injection scheme, a 100-
kHz square wave is presented to the input of the phase detector.
The input can also be the same signal phase modulated with the
help of an 800-kHz test clock. Eight jitter steps are therefore
possible, resulting in a quantization.
Various jitter threshold circuits are also implemented on
the FPGA. The jitter threshold circuit of Fig. 12(a) will
be employed in conjunction with the loop jitter injection. On
the other hand, thresholds of and are implemented The results of the experiments on the first configuration
for the digital phase modulation method, making use of the are shown in Fig. 15. A measured jitter transfer function for
higher frequency test clock. each jitter generation method is displayed. The dotted line
For each test, a warm-up stage of 214 data cycles is executed represents the theoretical jitter transfer function as predicted
to remove transients before a 216 data cycle test stage is from the direct measurement of the components. The phase
performed. The error threshold is set to 64, corresponding modulation scheme used a threshold for this experiment.
to a BER of 10 3 . A control module built around a finite The curve shows a 0.4-dB offset which can be attributed
state machine selects the amplitude of the input jitter for the mostly to the static jitter of the PLL. For the other jitter
ensuing test according to the output of the jitter threshold creation scheme, the signal injection clock was chosen to be
circuit, using the binary search algorithm. At each frequency 50 kHz, that is half the PLL rate, in order to demonstrate
point, the amplitude is resolved to an accuracy of 15 b within the flexibility in selecting this parameter. The offset is larger,
13 s. The entire digital circuitry for all the experiments requires possibly because of mismatch between the two charge pumps
81% of the resources of an XC4010 FPGA. This experimental realized out of discrete transistors. The first two columns of
setup is connected to a workstation through I/O modules to Table II summarize the features of the curves after removal
allow a driving software to set the low-pass oscillator of the offsets. Both methods yield similar results for the PLL
frequency as well as read the amplitude. bandwidth and jitter peaking. The theoretical predictions are
slightly off, most probably because of the parasitics of the
setup which were not accounted for in the calculations.
IX. EXPERIMENTAL RESULTS The jitter transfer functions measured in the second ex-
The jitter transfer function measurement was carried out periment are shown in Fig. 16. This PLL exhibits a larger
for both the jitter injection and the digital phase modulation bandwidth and is more damped. It can be seen that the curve
techniques on two different PLL configurations with different obtained here with the jitter injection technique is of lesser
bandwidths and damping values. Table I summarizes the main quality. This came about because the larger bandwidth yields
parameters of these experiments. The same current amplitude a lower output jitter SNR. From the graph of Fig. 13, it can
was used for both charge pumps ( ). The transfer be seen that this SNR is barely over 20 dB. On the other
functions are presented in the continuous-time domain as this hand, the digital phase modulation still shows a smooth curve
is more typical of what can be found in industry. because of the 3-b quantization which results in lower jitter
490 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

sought. However, in many applications, a smaller number of


signal frequencies and amplitudes are necessary and a fixed-
length periodic bit stream could generate the signals for the
cost a few kilobits of RAM. For example, to verify that the
bandwidth of the PLL is smaller than some value, only one test
is required. Specific values for overhead or measurement time
depend heavily on the PLL application. Moreover, one should
not have a dogmatic stance about overhead as the addition of
the on-chip measurement circuits add value to a system. The
economic gains from a BIST may far outweigh the cost of
extra silicon.

XI. CONCLUSIONS
We have presented a PLL jitter transfer function measure-
ment technique which is entirely digital except for the possible
Fig. 15. Jitter transfer function for experiment 1. addition of a charge pump. The technique is suitable for on-
chip measurement since it does not require trimming and the
TABLE II silicon overhead is small. Two methods were introduced for
RESULTS SUMMARY the creation of jitter, allowing tradeoffs between test clock
frequency on one side and loading, complexity, and accuracy
on the other side. Experimental results were presented which
suggest this scheme could be successfully implemented on
silicon.

ACKNOWLEDGMENT
The authors acknowledge the suggestions of B. Gerson from
PMC Sierra.

REFERENCES
[1] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun.,
vol. COMM-28, pp. 1849–1857, Nov. 1980.
[2] L. DeVito, “A versatile clock recovery architecture and monolithic im-
plementation,” in Monolithic Phase-Locked Loops and Clock Recovery
Circuits: Theory and Design, B. Razavi, Ed. New York: IEEE Press,
1996.
[3] E. H. Armstrong, “A method of reducing disturbances in radio signaling
by a system of frequency modulation,” in Proc. IRE, May 1936, vol.
24, no. 5, pp. 689–740.
[4] P. Goteti, G. Devarayanadurg, and M. Soma, “DFT for embedded
charge-pump PLL systems incorporating IEEE 1149.1,” in Proc. IEEE
1997 CICC, Santa Clara, CA, May 1997, pp. 210–213.
[5] M. W. Hauser, “Principles of oversampling A/D conversion,” J. Audio
Eng. Soc., vol. 39, nos. 1/2, pp. 3–26, Jan./Feb. 1991.
[6] M. F. Toner and G. W. Roberts, “A BIST scheme for an SNR, gain
tracking, and frequency response test of sigma–delta ADC,” IEEE Trans.
Circuits Syst.–II, vol. 41, pp. 1–15, Jan. 1995.
[7] J. P. Hein and J. W. Scott, “z -domain model for discrete-time PLL’s,”
Fig. 16. Jitter transfer function for experiment 2. IEEE Trans. Circuits Syst., vol. 35, pp. 1393–1400, Nov. 1988.
[8] J. Tierney, C. M. Rader, and B. Gold, “A digital frequency synthesizer,”
IEEE Trans. Audio Electroacoustic, vol. 19, pp. 48–57, 1971.
noise levels at the output. Again, the meaningful parameters [9] J. G. Kenney and L. R. Carley, “Design of multi-bit noise shaping data
converter,” Analog Integrated Circuits and Signal Processing J., May
are summarized in the two right-most columns of Table II. 1993, vol. 3, no. 3, pp. 259–272.
[10] A. K. Lu, G. W. Roberts, and D. A. Johns, “A high-quality analog
X. IMPLEMENTATION oscillator using oversampling D/A conversion techniques,” IEEE Trans.
Circuits Syst.–II, vol. 41, pp. 437–444, July 1994.
For any integrated measurement scheme, the area overhead [11] E. M. Hawrysh and G. W. Roberts, “An integration of memory-based
analog signal generation into current DFT architectures,” in Proc. 1996
is obviously a major concern. While the digital portion of ITC, Washington, DC, Oct. 1996, pp. 528–537.
the experimental setup uses a large portion of the FPGA, a [12] B. Dufort and G. W. Roberts, “Signal generation using periodic single
much more compact implementation is possible. Indeed, a and multi bit sigma–delta modulated streams,” in Proc. IEEE 1997 ITC,
Washington, DC, Nov. 1997, pp. 396–405.
oscillator was selected as the signal generator because [13] C. A. Sharpe, “A 3-state phase detector can improve your next PLL
of its versatility as a complete jitter transfer function was design,” EDN Mag., pp. 55–59, Sept. 1976.
VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION 491

Benoı̂t R. Veillette (S’97) was born in Trois- Gordon W. Roberts (S’85–M’85) was born in
Riviéres, Québec, Canada, on January 1, 1971. Toronto, Canada, in 1959. He received the B.A.Sc.
He received the B.Eng. (Honors) degree and the degree in electrical engineering from the University
M.Eng. degree from McGill University, Montréal, of Waterloo in 1983 and the M.Eng. and Ph.D.
PQ, Canada, in 1993 and 1995, respectively. He degrees also in electrical engineering from the Uni-
is now completing the Ph.D. degree in electrical versity of Toronto in 1986 and 1989.
engineering from the same institution. In 1989 he joined the faculty of McGill University
His current research interests are in delta–sigma where he is presently an Associate Professor. He
modulation, analog integrated circuits for commu- co-authored several text books and has contributed
nications, and mixed-signal testing. seven chapters to various edited volumes related to
analog IC design and test.
He is presently an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS
AND SYSTEMS and Editor of the IEEE Design and Test Magazine. He has
received numerous department and faculty awards for teaching, as well as
several IEEE awards for his work related to mixed-signal testing.
Physical Processes of Phase Noise Differential LC Oscillators

Systems Laboratory

University o f California
Los Angeles,CA 90095-1594

Introduction The results are validated against SpectreRF simulations and mea-
There is an unprecedented interest among circuit designers today surements on two differential CMOS oscillators tuned by resona-
to obtain insight into mechanisms of phase noise in LC oscilla- tors with very different Qls.
tors. For only with this insight is it possible to optimize oscillator Recognizing Phase Noise
circuits using low-quality integrated resonators to comply with For the purposes of analysis, a noise spectrum is considered as
the exacting phase noise specifications of modern wireless sys- consisting of uncorrelated sinewaves in a 1 Hz bandwidth at any
tems. Various numerical simulators are now available to assist the given frequency. Voltage or current noise produces amplitude and
circuit designer [ 11, [2], [3], in some cases accompanied by qualita- phase fluctuations when superimposed on a periodic signal (from
tive interpretations [4]. At present, therefore, the situation of the
now on, a large sinewave V0sin(2nf,t)). This is clearly seen [lo] by
oscillator designer is similar to the designer of amplifiers who is isolating one sinewave vn in the noise spectrum, say at a frequency
equipped only with SPICE, but who lacks physical insight and offset +fmfrom the sinewave frequency f,. Figure 1 shows this as a
methods for simple yet accurate analysis with which to optimize a phasor vn rotating relative to the sinewave phasor V,, which is then
circuit. decomposed into two equal collinear phasors at +fm, and two anti-
Over the years, various attempts at phase noise analysis have pro- phase conjugate phasors which are assigned a negative relative fre-
duced results that are variations on Leeson’s classic “heuristic der- quency -fm . Grouping the phasors pairwise as ?fm, it is seen that
ivation without formal proof” [SI, [6]. These analyses are based one pair modulates the amplitude of the sinewave with time (AM),
on a linear model of an LC resonator in steady-state oscillation while the other sweeps its phase (PM). Thus, half of any additive
through application of either feedback or negative conductance. noise on a sinewave produces phase noise, the other half amplitude
The results confirm Leeson by showing that phase noise is propor- noise. When sin(w,t) is accompanied either by noise sinewave pha-
tional to noise-to-carrier ratio and inversely to the square of reso- sors +sin(wO+wm)t,+sin(wO-wm)t or by fcos(a,+am)t, +cos(o,-
nator quality factor. However, without knowledge of the constant a,)t, then phase noise alone is present.
of proportionality, which Leeson leaves as an unspecified noise
factor, the actual phase noise cannot be predicted. Simple Model of the Differential Oscillator
It is now well understood that the large-signal periodic switching This paper treats the well-known tail-current biased differential
of a self-limited oscillator [7] underpins this noise factor [SI. At L C oscillator (Figure 2). In steady state, the differential pair acts as
first sight, an accurate noise analysis of an oscillator subject to peri- a negative conductance that switches the tail current I, into the LC
odic bias currents appears intractable, however by using sensible resonator. Owing to filtering in the L C circuit, the square wave of
approximations Huang has solved this problem for a Colpitts oscil- current creates a sinusoidal voltage across the resonator of ampli-
lator [9] and obtained good agreement between analysis and mea- tude (4/z)I,R. This voltage drives the differential pair into switch-
surements of thermally induced phase noise. The mechanisms of ing, thus sustaining oscillation. In a CMOS oscillator the ampli-
flicker noise upconversion, which are important in CMOS oscilla- tude may build up to several volts, eventually limited by the supply
tors, remain obscure. voltage.
In previous work on noise in mixers [ll], we have shown how
In this paper we concentrate on an understanding of the popular
differential LC oscillator. We introduce simple models to capture a simple model of the switching differential pair is sufficient to
the nonlinear processes that convert voltage or current thermal explain all frequency translations of noise. This model is used here.
noise in resistors or transistors into phase noise in the oscillator. Suppose that some noise (v”) accompanies the resonator sinewave.
The analysis does not require hypothetical elements, such as lim- Assuming that a small fraction of the resonator voltage around the
iters or amplitude control loops, to fully explain phase noise. A zero crossing is enough to fully switch the differential pair, then
simple expression at the end accurately specifies thermally induced the noise simply advances or retards the instant of zero crossing
phase noise, and lends substance to Leeson’s original hypothesis. (Figure 3(a)). The randomly pulse-width modulated current at the
switch output may be decomposed into the original periodic square
Next, the upconversion of flicker noise into phase noise is traced wave in the absence of noise, superimposed with pulses of con-
to mechanisms first identified in the 1930’s, but apparently since stant height but random width (Figure 3(b)). In turn, these pulses
forgotten. Unlike thermally induced phase noise, which appears as may be approximated by a train of impulses at twice the oscillation
phase modulation sidebands, flicker noise is shown to upconvert by frequency multiplying the original noise waveform vn(t) (Figure
bias-dependent frequency modulation.
3(c)).

25-1-1
0 2000 IEEE IEEE 2000 CUSTOM INTEGRATED CIRCUITS CONFERENCE
0-7803-5809-O/OO/$lO.OO 569
Thermally Induced Phase Noise where y is the noise factor of a single FET, classically 2/3. It is
Resonator Noise important to note that the AM noise resulting from upconversion,
if impiessed across a varactor at the resonator, will modulate the
Now consider a current source insin((w,+wm)t+@) representing varactor, thus the oscillation frequency by AM-to-FM conversion
noise in the loss conductance of the resonator, where i:=4kT/R. [E!]. Although the process is different, the resulting sidebands
According to the model above, this modulates the zero crossing are indistinguishable from PM noise sidebands. Unlike the other
instants of the differential pair, producing a current which, in addi- mechanisms of phase noise, this effect depends on the varactor
tion to the usual square wave, also consists of current pulses sam- characteristics and VCO tuning range and it may be significant
pling this noise at 20,. After sampling, frequency components only in certain situations.
appear at O,?O,, 3w,+0,, ... However, usually the resonator will
filter the 3"' and higher harmonics, leaving o ~ ~ was, the only DifferentialPair Noise
important terms. These will induce a symmetric voltage response Noise originating in the differential pair is unlike the previous two
in the resonator, and through feedback arrive at steady state. The cases. There, only certain parts of the noise spectrum contributed
steady-state oscillation, in general, is of the form: significantly to the total phase noise. White noise in the resonator is
uOut= V,sinw,,t + Asin(w, - w,)t + Bcos(w,, - ~ , ~ ) t filtered at harmonics of the resonant frequency. White noise in the
tail current only experiences a significant conversion gain around
+Csin(w,, + +
w,")t Dcos(w,, + w,,,)t the second harmonic of the oscillation frequency. However, the
and here A=-C= i"x(L0,2/40,), while BzD-0. The relative signs simple model says that an impulse train samples white noise in the
of A and C prove that the steady-state response to current noise in differential pair, which if true, will cause it to accumulate without
the resonator's resistor is phase noise in the oscillator. The single- bound at any specified offset frequency om.
sideband phase noise density is found by the ratio of the sideband
In reality, any practical differential pair requires a non-zero input
power at a given frequency to the power in the fundamental oscil-
voltage excursion to switch, and this is provided by the oscillation
lation frequency. Thus, the thermally induced phase noise density
waveform across the resonator. Therefore, noise in the differential
due to resonator loss is: pair is actually not sampled by impulses, but by time windows of
finite width. The window height is proportional to transconduc-
tance, and width is set by tail current, and slope of the oscillation
waveform at zero crossing. The input-referred noise spectral den-
where N,=2, the number of loss sources (in the left and right reso- sity of the differential pair is inversely proportional to transcon-
nators) and N,=4 because uncorrelated quadrature noise originat- ductance. Thus, the narrower the sampling window, that is, the
ing at o,+o, contributes to SSB phase noise at offset w,. larger the sampling bandwidth, the lower the noise spectral density
Tail Current Noise [ 111. Analysis shows that the noise bandwidth product is constant,
and produces pure phase noise. After taking into account the accu-
The switching action of the differential pair commutates noise in mulation of frequency translations throughout the sampling band-
the tail currents like a single-balanced mixer. The noise is trans- width, the following compact yet exact expression is reached:
lated up and down in frequency, and enters the resonator. The
resulting voltage drives the differential pair, the noise components
modulating the zero crossing instants. The resulting impulses of
current feed back into the resonator. The steady-state solution is
found by solving simultaneous equations of a form that anticipates We note that [8]has arrived at a similar analysis for the first two
the end result, much like in any feedback circuit. sources of noise, but was unable to obtain a closed-form expression
The single-balanced mixer shows the largest conversion gain for this last term.
around the fundamental switching frequency, 1/3'd the current Proving Leeson's Hypothesis
conversion gain around the 3'" harmonic, and so on. Therefore, Leeson originally postulated that thermally induced phase noise in
only mixing by the fundamental at is important. Noise originat- any oscillator takes the form:
ing in the tail current at W, upconverts to w0+w,. Similarly, noise
at 20,,f0, downconverts to o , ~ o , .
Analysis shows that the upconversion produces coefficients A=C,
B=-D, both of which indicate AM only. It should be noted that
where F is an unspecified noise factor. By summing the expressions
AM noise superimposed on the resonator fundamental frequency
obtained above for thermally induced phase noise arising from the
does not modulate the zero crossings of the switching differential
resonator, differential pair and tail bias current, respectively, for
pair, and therefore does not propagate in the feedback loop back
into the resonator. However, the downconversion results in phase the differential oscillator Leeson's noise factor is:
noise only, with A=-C, and B=D=O. The phase noise caused by
thermal noise originally at 20, is:

We emphasize that this simple expression captures all nonlinear


effects and frequency translations. At low bias currents while the

570 25-1-2
amplitude of oscillation is smaller than the power supply, the dif- However, this is not the only mechanism of indirect FM. At RF,
ferential pair acts as a pure current switch driving the resonator active device capacitance is also significant, and it no longer appears
and V,=(4/x)RIT [13]. Then the second term comprising F sim- as a pure negative resistance to the resonator. For example, the dif-
plifies to 2y. This means that as tail current increases and assuming ferential pair commutates current flowing in the capacitor C, at
gmblrSR is held constant, the noise factor remains constant and phase the tail, which presents a negative capacitor (or, equivalently, an
noise improves as V i , that is, as I,’. This has been observed by inductor in a narrowband sense) at the differential output (Figure
others [131. However, beyond a critical tail current the amplitude 6). This speeds up the oscillation frequency. Flicker noise in the
Vuis pegged constant, limited by supply voltage. Further increases differential pair FETs modulates the duty cycle of commutation,
in I, will cause the differential pair’s contribution to noise factor to and therefore the effective negative capacitance. Here, too, Grosz-
rise, degrading phase noise proportionally to I, (Figure 4). There- kowski gives a method of systematic analysis [16], which captures
fore, for least phase noise the tail current should be just enough to the reactive components in the active devices by measuring the area
drive the amplitude to its maximum possible value. n enclosed by hysteresis in the dynamic negative resistance curve.
Flicker Noise Upconversion
Close-in to the oscillation frequency, the slope of the phase noise
_
Aw -
w,
- -- n
2Q2w,L
+q-
n2(1- n’)
2Q2 n=2 (1 n’)’ n2/Q’
.m:
+
spectrum in all CMOS VCO’s turns from -20 to -30 dB/decade. Thus the sensitivity of the reactance to bias current or offset volt-
This is ascribed to the upconversion of flicker noise in FETs. To age in the differential pair is estimated, which is another means
understand this, let us first see if the analysis above explains this whereby flicker noise modulates the frequency of oscillation.
upconversion.
Validation of Analysis
Flicker noise in the tail current source at frequency W, indeed
upconverts to O&O, and enters the resonator, but as AM, not PM T h e phase noise model was validated on two CMOS differential
noise. Therefore, in the absence of a high gain varactor to convert L C oscillators. One oscillator uses a low Q, on-chip inductor, while
AM to FM, flicker noise in the tail current will not appear as phase the other uses off-chip inductors with large Q Flicker noise is
noise. Next consider flicker noise in the differential pair. T h e pre- modelled as a bias-independent, gate-referred voltage source [ 141.
ceding analysis says that this modulates zero crossings, and injects T h e measured data and SpectreRF simulations are plotted with
a noise current into the resonator consisting of flicker noise sam- predictions based on this paper. Excellent agreement (Figure 7) is
pled by an impulse train with frequency 20,. Thus noise originat- found across the entire spectrum, which encompasses thermally
ing at frequency O, produces currents at O, and at 20,f0, . Both induced phase noise and upconverted flicker noise.
frequencies are strongly attenuated in the resonator, and neither
K. S. Kundert, “Introduction to RF simulation and its application,” IEEE
explains flicker-induced phase noise at w,+o,. One can only con- J’ournulof SolidSture Circuitr, pp. 1298-319, 1999.
clude that the mechanisms of flicker noise upconversion are quite A. Demir, A. Mehrotra, and J. Roychowdhury, “Phase noise in oscillators: a
different than for thermally induced phase noise. unifying theory and numerical methods for characterisation,” in Derign und
Automution Confirence, San Francisco, p p 26-3 1, 1998.
FundamentalSources of FM in Oscillators B. De Smedt and G. Gielen, “Accurate simulation of phase noise in oscillators,”
in Europun Solid-Sture Circuits Confirence, p p 208-1 1, 1997.
In 1934, Groszkowski [151 while studying electronic oscillators A. Hajimiri and T H. Lee, “A general theory of phase noise in electrical oscilla-
realized that the steady-state oscillation frequency seldom coin- tors,” IEEEJournul of 3olid-Stute Circuirr, vol. 33, no. 2, p p 179-94, 1998.
cides with the natural frequency of the resonator which tunes D. B. Leeson, “A Simple Model of Feedback Oscillator Noise Spectrum,”
Proceedings of the IEEE, vol. 54, pp. 329-330, 1966.
the oscillator. He found that the discrepancy arises because the J. Craninckx and M. Steyaert, “Low-noise voltage-controlled oscillators using
active device in the oscillator, such as the differential pair current enhanced LC-tanks,” I E E E Trumucrionr on Circuirr und Syslems 11: Anulog und
switch in the circuit considered here, drives the resonator with a DigiiulSignulProcesring, vol. 42, no. 12, p p 794-804, 1995.
K. K. Clarke and D. T Hess, Cummuniculion Circurrr: Anulysis undDerign.
harmonic-rich waveform. T h e harmonics will flow into the lower Malabar, FL: Krieger, 1971.
impedance capacitor (Figure 5) and upset the exact reactive power C. Samori, A. L. Lacaita, E Villa, and E Zappa, “Spectrum folding and phase
balance between the Land the C required for steady state. Now the noise in LC tuned oscillators,” IEEE Trunructionr on Circuits andSysremr 11:
Anulog und DigitulSignul Processing, vol. 45, no. 7, pp. 781-90, 1998.
frequency of oscillation must shift down until the reactive power in
Q Huang, “On the exact design of R F oscillators,” CICCProceedings, pp. 4 1 4 ,
the inductor increases to equal the reactive power in the capacitor 1998.
due to the fundamental and all harmonics. T h e shift, Am, is: W. P. Robins, Phure Noire in SignulSuurcer. London: Peter Peregrinus, 1982.
H. Darabi and A. Abidi, “Noise in CMOS Mixers: A Simple Physical Model,”
_
Aw -
--1 n2(1-n2) IEEEJournuluf S o l i d S r ~ r eCircuirr, vol. 35, no. 1, in press, 2000.

w, 2Q‘ 2 +
(1 - n2)’ n2/ Q’ ’ m’
C. Samori, A. L. Lacaita, A. Zanchi, S. Levantino, and E Torrisi, “Impact of
Indirect Stability on Phase Noise Performance of Fully-Integrated LC Tuned
VCOs,” in Europeun Solid-Stute Circuits Confirence, Duisburg, Germany, p p
where mnis the normalized level of the nthharmonic. AO is the sum 202-205, 1999.
c of all negative terms, which means that oscillation frequency slows A. Hajimiri and T H . Lee, “Phase Noise in CMOS Differential L C oscillators,”
in Symporium on VLSI Cirnrirr, Honolulu, HI, pp. 48-51, 1998.
down with more harmonic content. Now the harmonic content at J. Chang, A. A. Abidi, and C. R. Viswanathan, “Flicker Noise in CMOS
the output of a periodically switching differential pair is a function Transistors from Subthreshold to Strong Inversion at Various Temperatures,”
of the tail current. In the autonomous oscillator, the drive to the IEEE Trunructionr on Electron Dcuicer, vol. 41, pp. 1965-1971,1994.
differential pair is also a function of tail current. The sensitivity J. Groszkowski, “The Interdependence of Frequency Variation and Harmonic
Contcnt, and the problem of Constant-Frequency Oscillators,” Proc. of the I R E ,
a ~ / a I ,is responsible for an “indirect” F M [7] due to flicker noise vol. 21, no. 7, pp 958-981, 1934.
in I, J. Groszkowski, Frequency of Sel/-OrciNu~ions.Oxford: Pergamon Press, 1964.

25-1-3 571
Figure 5. Harmonics of oscil-
Figure 1. Noise phasor added to a lating current flow into capaci-
sinewave decomposes into PM and tor, increasing its reactive energy.
AM sidebands. Steady state frequency shifts
down until inductor energy bal-
ances.

T
Figure 2. Differential LC oscillator
biased by tail current.
Nonlinear active
/Y~LV~/(~L)~
Figure 6. Capacitors associated
with active device appear as
reactances across the resonator,
shifting frequency.

LO Voltage -40

N -60
til 5
l%
% -80
.-
v)
0
z
al
2-100
c
a
-120
Approximate Model of Noise Pulses Samdina
Impdsetain -140
J I I Jime 1 10 100 1000
N o i s h Offset Frequency,kHz
-60
Figure 3. (a) Noise at input of differential pair modulates instants
of zero crossing. (b) Output current consists of square wave, -70
plus random noise pulses. (c) Noise pulses modelled as a train of N
impulses sampling noise waveform. 5 -80
I%
9
cm -90
.-v)
Phase
Noise Oscillation Figure 4. Increasing tail current
:-100
0

(ZI
v)
first causes amplitude to rise,
until limited by supply. Phase
E -110
noise diminishes with rising -120
amplitude, then worsens due to
higher noise factor. -130
' Bias Current, IT 1 10 100 1000
Offset Frequency,kHz
Figure 7. Validation of the analysis presented in this paper. Measured phase noise is compared with predictions
from analysis, and with SpectreRF simulations. (a) 0.35-pm CMOS 1.1 GHz oscillator using resonator with
loaded Q o f 6. (b) 0.25-pm CMOS 830 MHz oscillator using discrete inductor with loaded of Q o f 25.

572 25-1-4
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998 179

A General Theory of Phase Noise


in Electrical Oscillators
Ali Hajimiri, Student Member, IEEE, and Thomas H. Lee, Member, IEEE

Abstract— A general model is introduced which is capable Since any oscillator is a periodically time-varying system,
of making accurate, quantitative predictions about the phase its time-varying nature must be taken into account to permit
noise of different types of electrical oscillators by acknowledging accurate modeling of phase noise. Unlike models that assume
the true periodically time-varying nature of all oscillators. This
new approach also elucidates several previously unknown design linearity and time-invariance, the time-variant model presented
criteria for reducing close-in phase noise by identifying the mech- here is capable of proper assessment of the effects on phase
anisms by which intrinsic device noise and external noise sources noise of both stationary and even of cyclostationary noise
contribute to the total phase noise. In particular, it explains the sources.
details of how 1=f noise in a device upconverts into close-in Noise sources in the circuit can be divided into two groups,
phase noise and identifies methods to suppress this upconversion.
The theory also naturally accommodates cyclostationary noise namely, device noise and interference. Thermal, shot, and
sources, leading to additional important design insights. The flicker noise are examples of the former, while substrate and
model reduces to previously available phase noise models as supply noise are in the latter group. This model explains
special cases. Excellent agreement among theory, simulations, and the exact mechanism by which spurious sources, random
measurements is observed. or deterministic, are converted into phase and amplitude
Index Terms—Jitter, oscillator noise, oscillators, oscillator sta- variations, and includes previous models as special limiting
bility, phase jitter, phase locked loops, phase noise, voltage cases.
controlled oscillators. This time-variant model makes explicit predictions of the
relationship between waveform shape and noise upcon-
version. Contrary to widely held beliefs, it will be shown
I. INTRODUCTION that the corner in the phase noise spectrum is smaller
than noise corner of the oscillator’s components by a
T HE recent exponential growth in wireless communication
has increased the demand for more available channels in
mobile communication applications. In turn, this demand has
factor determined by the symmetry properties of the waveform.
This result is particularly important in CMOS RF applications
imposed more stringent requirements on the phase noise of because it shows that the effect of inferior device noise
local oscillators. Even in the digital world, phase noise in the can be reduced by proper design.
guise of jitter is important. Clock jitter directly affects timing Section II is a brief introduction to some of the existing
margins and hence limits system performance. phase noise models. Section III introduces the time-variant
Phase and frequency fluctuations have therefore been the model through an impulse response approach for the excess
subject of numerous studies [1]–[9]. Although many models phase of an oscillator. It also shows the mechanism by which
have been developed for different types of oscillators, each noise at different frequencies can become phase noise and
of these models makes restrictive assumptions applicable only expresses with a simple relation the sideband power due to
to a limited class of oscillators. Most of these models are an arbitrary source (random or deterministic). It continues
based on a linear time invariant (LTI) system assumption with explaining how this approach naturally lends itself to the
and suffer from not considering the complete mechanism by analysis of cyclostationary noise sources. It also introduces
which electrical noise sources, such as device noise, become a general method to calculate the total phase noise of an
phase noise. In particular, they take an empirical approach in oscillator with multiple nodes and multiple noise sources, and
describing the upconversion of low frequency noise sources, how this method can help designers to spot the dominant
such as noise, into close-in phase noise. These models source of phase noise degradation in the circuit. It concludes
are also reduced-order models and are therefore incapable of with a demonstration of how the presented model reduces
making accurate predictions about phase noise in long ring to existing models as special cases. Section IV gives new
oscillators, or in oscillators that contain essential singularities, design implications arising from this theory in the form of
such as delay elements. guidelines for low phase noise design. Section V concludes
with experimental results supporting the theory.

Manuscript received December 17, 1996; revised July 9, 1997. II. BRIEF REVIEW OF EXISTING MODELS AND DEFINITIONS
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305-4070 USA. The output of an ideal sinusoidal oscillator may be ex-
Publisher Item Identifier S 0018-9200(98)00716-1. pressed as , where is the amplitude,
0018–9200/98$10.00  1998 IEEE
180 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 2. A typical RLC oscillator.

Fig. 1. Typical plot of the phase noise of an oscillator versus offset from
carrier. The semi-empirical model proposed in [1]–[3], known also
as the Leeson–Cutler phase noise model, is based on an LTI
assumption for tuned tank oscillators. It predicts the following
is the frequency, and is an arbitrary, fixed phase refer- behavior for :
ence. Therefore, the spectrum of an ideal oscillator with no
random fluctuations is a pair of impulses at . In a practical
oscillator, however, the output is more generally given by

(1) (3)

where and are now functions of time and is a where is an empirical parameter (often called the “device
periodic function with period 2 . As a consequence of the excess noise number”), is Boltzmann’s constant, is the
fluctuations represented by and , the spectrum of a
absolute temperature, is the average power dissipated in
practical oscillator has sidebands close to the frequency of
the resistive part of the tank, is the oscillation frequency,
oscillation, .
is the effective quality factor of the tank with all the
There are many ways of quantifying these fluctuations (a
loadings in place (also known as loaded ), is the offset
comprehensive review of different standards and measurement
from the carrier and is the frequency of the corner
methods is given in [4]). A signal’s short-term instabilities are
usually characterized in terms of the single sideband noise between the and regions, as shown in the sideband
spectral density. It has units of decibels below the carrier per spectrum of Fig. 1. The behavior in the region can be
hertz (dBc/Hz) and is defined as obtained by applying a transfer function approach as follows.
The impedance of a parallel RLC, for , is easily
calculated to be
1 Hz
(2)
(4)

where 1 Hz represents the single side-


band power at a frequency offset of from the carrier with a
measurement bandwidth of 1 Hz. Note that the above definition where is the parallel parasitic conductance of the tank.
includes the effect of both amplitude and phase fluctuations, For steady-state oscillation, the equation should
and . be satisfied. Therefore, for a parallel current source, the closed-
The advantage of this parameter is its ease of measurement. loop transfer function of the oscillator shown in Fig. 2 is given
Its disadvantage is that it shows the sum of both amplitude and by the imaginary part of the impedance
phase variations; it does not show them separately. However, it
is important to know the amplitude and phase noise separately (5)
because they behave differently in the circuit. For instance,
the effect of amplitude noise is reduced by amplitude limiting
mechanism and can be practically eliminated by the applica- The total equivalent parallel resistance of the tank has an
tion of a limiter to the output signal, while the phase noise equivalent mean square noise current density of
cannot be reduced in the same manner. Therefore, in most . In addition, active device noise usually contributes
applications, is dominated by its phase portion, a significant portion of the total noise in the oscillator. It is
, known as phase noise, which we will simply traditional to combine all the noise sources into one effective
denote as . noise source, expressed in terms of the resistor noise with
HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS 181

Fig. 3. Phase and amplitude impulse response model.

a multiplicative factor, , known as the device excess noise (a) (b)


number. The equivalent mean square noise current density can
therefore be expressed as . Unfortunately,
it is generally difficult to calculate a priori. One important
reason is that much of the noise in a practical oscillator
arises from periodically varying processes and is therefore
cyclostationary. Hence, as mentioned in [3], and are
usually used as a posteriori fitting parameters on measured
data.
Using the above effective noise current power, the phase
noise in the region of the spectrum can be calculated as

(c)
Fig. 4. (a) Impulse injected at the peak, (b) impulse injected at the zero
crossing, and (c) effect of nonlinearity on amplitude and phase of the oscillator
in state-space.

(6)
III. MODELING OF PHASE NOISE
Note that the factor of 1/2 arises from neglecting the con-
tribution of amplitude noise. Although the expression for the A. Impulse Response Model for Excess Phase
noise in the region is thus easily obtained, the expression
An oscillator can be modeled as a system with inputs
for the portion of the phase noise is completely empirical.
(each associated with one noise source) and two outputs
As such, the common assumption that the corner of the
that are the instantaneous amplitude and excess phase of the
phase noise is the same as the corner of device flicker
oscillator, and , as defined by (1). Noise inputs to this
noise has no theoretical basis.
system are in the form of current sources injecting into circuit
The above approach may be extended by identifying the
nodes and voltage sources in series with circuit branches. For
individual noise sources in the tuned tank oscillator of Fig. 2
each input source, both systems can be viewed as single-
[8]. An LTI approach is used and there is an embedded
input, single-output systems. The time and frequency-domain
assumption of no amplitude limiting, contrary to most practical
fluctuations of and can be studied by characterizing
cases. For the RLC circuit of Fig. 2, [8] predicts the following:
the behavior of two equivalent systems shown in Fig. 3.
Note that both systems shown in Fig. 3 are time variant.
(7) Consider the specific example of an ideal parallel LC oscillator
shown in Fig. 4. If we inject a current impulse as shown,
where is yet another empirical fitting parameter, and the amplitude and phase of the oscillator will have responses
is the effective series resistance, given by similar to that shown in Fig. 4(a) and (b). The instantaneous
voltage change is given by
(8)
(9)
where , , , and are shown in Fig. 2. Note that it
is still not clear how to calculate from circuit parameters. where is the total injected charge due to the current
Hence, this approach represents no fundamental improvement impulse and is the total capacitance at that node. Note
over the method outlined in [3]. that the current impulse will change only the voltage across the
182 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a) (b)
Fig. 5. (a) A typical Colpitts oscillator and (b) a five-stage minimum size (a) (b)
ring oscillator. Fig. 6. Phase shift versus injected charge for oscillators of Fig. 5(a) and (b).

capacitor and will not affect the current through the inductor. where it has the maximum effect on phase. As can be seen, the
It can be seen from Fig. 4 that the resultant change in and current-phase relation is linear for values of charge up to 10%
is time dependent. In particular, if the impulse is applied of the total charge on the effective capacitance of the node
at the peak of the voltage across the capacitor, there will be no of interest. Also note that the effective injected charges due
phase shift and only an amplitude change will result, as shown to actual noise and interference sources in practical circuits
in Fig. 4(a). On the other hand, if this impulse is applied at the are several orders of magnitude smaller than the amounts of
zero crossing, it has the maximum effect on the excess phase charge injected in Fig. 6. Thus, the assumption of linearity is
and the minimum effect on the amplitude, as depicted in well satisfied in all practical oscillators.
Fig. 4(b). This time dependence can also be observed in the It is critical to note that the current-to-phase transfer func-
state-space trajectory shown in Fig. 4(c). Applying an impulse tion is practically linear even though the active elements may
at the peak is equivalent to a sudden jump in voltage at point have strongly nonlinear voltage-current behavior. However,
, which results in no phase change and changes only the the nonlinearity of the circuit elements defines the shape of
amplitude, while applying an impulse at point results only the limit cycle and has an important influence on phase noise
in a phase change without affecting the amplitude. An impulse that will be accounted for shortly.
applied sometime between these two extremes will result in We have thus far demonstrated linearity, with the amount
both amplitude and phase changes. of excess phase proportional to the ratio of the injected charge
There is an important difference between the phase and to the maximum charge swing across the capacitor on the
amplitude responses of any real oscillator, because some node, i.e., . Furthermore, as discussed earlier, the
form of amplitude limiting mechanism is essential for stable impulse response for the first system of Fig. 3 is a step whose
oscillatory action. The effect of this limiting mechanism is amplitude depends periodically on the time when the impulse
pictured as a closed trajectory in the state-space portrait of is injected. Therefore, the unit impulse response for excess
the oscillator shown in Fig. 4(c). The system state will finally phase can be expressed as
approach this trajectory, called a limit cycle, irrespective of
its starting point [10]–[12]. Both an explicit automatic gain (10)
control (AGC) and the intrinsic nonlinearity of the devices
act similarly to produce a stable limit cycle. However, any where is the maximum charge displacement across the
fluctuation in the phase of the oscillation persists indefinitely, capacitor on the node and is the unit step. We call
with a current noise impulse resulting in a step change in the impulse sensitivity function (ISF). It is a dimensionless,
phase, as shown in Fig. 3. It is important to note that regardless frequency- and amplitude-independent periodic function with
of how small the injected charge, the oscillator remains time period 2 which describes how much phase shift results from
variant. applying a unit impulse at time . To illustrate its
Having established the essential time-variant nature of the significance, the ISF’s together with the oscillation waveforms
systems of Fig. 3, we now show that they may be treated as for a typical LC and ring oscillator are shown in Fig. 7. As is
linear for all practical purposes, so that their impulse responses shown in the Appendix, is a function of the waveform
and will characterize them completely. or, equivalently, the shape of the limit cycle which, in turn, is
The linearity assumption can be verified by injecting im- governed by the nonlinearity and the topology of the oscillator.
pulses with different areas (charges) and measuring the resul- Given the ISF, the output excess phase can be calcu-
tant phase change. This is done in the SPICE simulations of lated using the superposition integral
the 62-MHz Colpitts oscillator shown in Fig. 5(a) and the five-
stage 1.01-GHz, 0.8- m CMOS inverter chain ring oscillator
shown in Fig. 5(b). The results are shown in Fig. 6(a) and (b),
respectively. The impulse is applied close to a zero crossing, (11)
HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS 183

(a) (b)
Fig. 7. Waveforms and ISF’s for (a) a typical LC oscillator and (b) a typical
ring oscillator.

Fig. 8. Conversion of the noise around integer multiples of the oscillation


where represents the input noise current injected into the frequency into phase noise.
node of interest. Since the ISF is periodic, it can be expanded
in a Fourier series
consists of two impulses at as shown in Fig. 8.
(12) This time the only integral in (13) which will have a low
frequency argument is for . Therefore is given by
where the coefficients are real-valued coefficients, and
is the phase of the th harmonic. As will be seen later, (16)
is not important for random input noise and is thus
neglected here. Using the above expansion for in the which again results in two equal sidebands at in .
superposition integral, and exchanging the order of summation More generally, (13) suggests that applying a current
and integration, we obtain close to any integer multiple of the
oscillation frequency will result in two equal sidebands at
in . Hence, in the general case is given by

(17)
(13)
B. Phase-to-Voltage Transformation
Equation (13) allows computation of for an arbitrary input So far, we have presented a method for determining how
current injected into any circuit node, once the various much phase error results from a given current using (13).
Fourier coefficients of the ISF have been found. Computing the power spectral density (PSD) of the oscillator
As an illustrative special case, suppose that we inject a low output voltage requires knowledge of how the output
frequency sinusoidal perturbation current into the node of voltage relates to the excess phase variations. As shown in
interest at a frequency of Fig. 8, the conversion of device noise current to output voltage
(14) may be treated as the result of a cascade of two processes.
The first corresponds to a linear time variant (LTV) current-
where is the maximum amplitude of . The arguments to-phase converter discussed above, while the second is a
of all the integrals in (13) are at frequencies higher than nonlinear system that represents a phase modulation (PM),
and are significantly attenuated by the averaging nature of which transforms phase to voltage. To obtain the sideband
the integration, except the term arising from the first integral, power around the fundamental frequency, the fundamental
which involves . Therefore, the only significant term in harmonic of the oscillator output can be used
will be as the transfer function for the second system in Fig. 8. Note
this is a nonlinear transfer function with as the input.
(15) Substituting from (17) into (1) results in a single-tone
phase modulation for output voltage, with given by (17).
As a result, there will be two impulses at in the power Therefore, an injected current at results in a pair
spectral density of , denoted as . of equal sidebands at with a sideband power relative
As an important second special case, consider a current at a to the carrier given by
frequency close to the carrier injected into the node of interest,
given by . A process similar to that (18)
of the previous case occurs except that the spectrum of
184 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a) (b)
Fig. 10. Simulated and calculated sideband powers for the first ten coeffi-
Fig. 9. Simulated power spectrum of the output with current injection at (a)
m = 50 + m = 1 06
cients.
f MHz and (b) f0 f : GHz.

This process is shown in Fig. 8. Appearance of the frequency (18). The ISF for this oscillator is obtained by the simulation
deviation in the denominator of the (18) underscores that method of the Appendix. Here, is equal to ,
the impulse response is a step function and therefore where is the average capacitance on each node of the
behaves as a time-varying integrator. We will frequently refer circuit and is the maximum swing across it. For this
to (18) in subsequent sections. oscillator, fF and V, which results in
Applying this method of analysis to an arbitrary oscillator, fC. For a sinusoidal injected current of amplitude
a sinusoidal current injected into one of the oscillator nodes A, and an of 50 MHz, Fig. 10 depicts the
at a frequency results in two equal sidebands at simulated and predicted sideband powers. As can be seen
, as observed in [9]. Note that it is necessary to use from the figure, these agree to within 1 dB for the higher
an LTV because an LTI model cannot explain the presence of power sidebands. The discrepancy in the case of the low
a pair of equal sidebands close to the carrier arising from power sidebands ( – ) arises from numerical noise in
sources at frequencies , because an LTI system the simulations, which represents a greater fractional error at
cannot produce any frequencies except those of the input and lower sideband power. Overall, there is satisfactory agreement
those associated with the system’s poles. Furthermore, the between simulation and the theory of conversion of noise from
amplitude of the resulting sidebands, as well as their equality, various frequencies into phase fluctuations.
cannot be predicted by conventional intermodulation effects.
This failure is to be expected since the intermodulation terms
arise from nonlinearity in the voltage (or current) input/output C. Prediction of Phase Noise Sideband Power
characteristic of active devices of the form Now we consider the case of a random noise current
. This type of nonlinearity does not directly whose power spectral density has both a flat region and a
appear in the phase transfer characteristic and shows itself only region, as shown in Fig. 11. As can be seen from (18) and the
indirectly in the ISF. foregoing discussion, noise components located near integer
It is instructive to compare the predictions of (18) with multiples of the oscillation frequency are transformed to low
simulation results. A sinusoidal current of 10 A amplitude at frequency noise sidebands for , which in turn become
different frequencies was injected into node 1 of the 1.01-GHz close-in phase noise in the spectrum of , as illustrated in
ring oscillator of Fig. 5(b). Fig. 9(a) shows the simulated Fig. 11. It can be seen that the total is given by the sum
power spectrum of the signal on node 4 for a low frequency of phase noise contributions from device noise in the vicinity
input at MHz. This power spectrum is obtained using of the integer multiples of , weighted by the coefficients
the fast Fourier transform (FFT) analysis in HSPICE 96.1. It . This is shown in Fig. 12(a) (logarithmic frequency scale).
is noteworthy that in this version of HSPICE the simulation The resulting single sideband spectral noise density is
artifacts observed in [9] have been properly eliminated by plotted on a logarithmic scale in Fig. 12(b). The sidebands in
calculation of the values used in the analysis at the exact the spectrum of , in turn, result in phase noise sidebands
points of interest. Note that the injected noise is upconverted in the spectrum of through the PM mechanism discuss
into two equal sidebands at and , as predicted in the previous subsection. This process is shown in Figs. 11
by (18). Fig. 9(b) shows the effect of injection of a current at and 12.
GHz. Again, two equal sidebands are observed The theory predicts the existence of , , and flat
at and , also as predicted by (18). regions for the phase noise spectrum. The low-frequency noise
Simulated sideband power for the general case of current sources, such as flicker noise, are weighted by the coefficient
injection at can be compared to the predictions of and show a dependence on the offset frequency, while
HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS 185

(a)

(b)

Fig. 11. Conversion of noise to phase fluctuations and phase-noise side-


bands. Fig. 12. (a) PSD of (t) and (b) single sideband phase noise power
1
spectrum, Lf ! g.

the white noise terms are weighted by other coefficients


and give rise to the region of phase noise spectrum. It is obvious from the foregoing development that the corner
apparent that if the original noise current contains of the phase noise and the corner of the device noise
low frequency noise terms, such as popcorn noise, they can should be coincident, as is commonly assumed. In fact, from
appear in the phase noise spectrum as regions. Finally, Fig. 12, it should be apparent that the relationship between
the flat noise floor in Fig. 12(b) arises from the white noise these two frequencies depends on the specific values of the
floor of the noise sources in the oscillator. The total sideband various coefficients . The device noise in the flicker noise
noise power is the sum of these two as shown by the bold line dominated portion of the noise spectrum can
in the same figure. be described by
To carry out a quantitative analysis of the phase noise
sideband power, now consider an input noise current with a (22)
white power spectral density . Note that in (18)
represents the peak amplitude, hence, for where is the corner frequency of device noise.
Hz. Based on the foregoing development and (18), Equation (22) together with (18) result in the following
the total single sideband phase noise spectral density in dB expression for phase noise in the portion of the phase
below the carrier per unit bandwidth due to the source on one noise spectrum:
node at an offset frequency of is given by

(23)
(19)
The phase noise corner, , is the frequency where
the sideband power due to the white noise given by (21) is
Now, according to Parseval’s relation we have equal to the sideband power arising from the noise given
by (23), as shown in Fig. 12. Solving for results in the
(20) following expression for the corner in the phase noise
spectrum:
where is the rms value of . As a result
(24)
(21)
This equation together with (21) describe the phase noise
This equation represents the phase noise spectrum of an spectrum and are the major results of this section. As can
arbitrary oscillator in region of the phase noise spectrum. be seen, the phase noise corner due to internal noise
For a voltage noise source in series with an inductor, sources is not equal to the device noise corner, but is
should be replaced with , where smaller by a factor equal to . As will be discussed
represents the maximum magnetic flux swing in the inductor. later, depends on the waveform and can be significantly
We may now investigate quantitatively the relationship reduced if certain symmetry properties exist in the waveform
between the device corner and the corner of the of the oscillation. Thus, poor device noise need not imply
phase noise. It is important to note that it is by no means poor close-in phase noise performance.
186 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 14. 0(x), 0e (x), and (x) for the Colpitts oscillator of Fig. 5(a).
Fig. 13. Collector voltage and collector current of the Colpitts oscillator of
Fig. 5(a).
used in all subsequent calculations, in particular, calculation
of the coefficients .
Note that there is a strong correlation between the cyclosta-
D. Cyclostationary Noise Sources
tionary noise source and the waveform of the oscillator. The
In addition to the periodically time-varying nature of the maximum of the noise power always appears at a certain point
system itself, another complication is that the statistical prop- of the oscillatory waveform, thus the average of the noise may
erties of some of the random noise sources in the oscillator not be a good representation of the noise power.
may change with time in a periodic manner. These sources are Consider as one example the Colpitts oscillator of Fig. 5(a).
referred to as cyclostationary. For instance, the channel noise The collector voltage and the collector current of the transistor
of a MOS device in an oscillator is cyclostationary because the are shown in Fig. 13. Note that the collector current consists
noise power is modulated by the gate source overdrive which of a short period of large current followed by a quiet interval.
varies with time periodically. There are other noise sources The surge of current occurs at the minimum of the voltage
in the circuit whose statistical properties do not depend on across the tank where the ISF is small. Functions , ,
time and the operation point of the circuit, and are therefore and for this oscillator are shown in Fig. 14. Note that,
called stationary. Thermal noise of a resistor is an example of in this case, is quite different from , and hence
a stationary noise source. the effect of cyclostationarity is very significant for the LC
A white cyclostationary noise current can be decom- oscillator and cannot be neglected.
posed as [13]: The situation is different in the case of the ring oscillator
of Fig. 5(b), because the devices have maximum current
(25)
during the transition (when is at a maximum, i.e., the
where is a white cyclostationary process, is a sensitivity is large) at the same time the noise power is large.
white stationary process and is a deterministic periodic Functions , , and for the ring oscillator of
function describing the noise amplitude modulation. We define Fig. 5(b) are shown in Fig. 15. Note that in the case of the
to be a normalized function with a maximum value of ring oscillator and are almost identical. This
1. This way, is equal to the maximum mean square noise indicates that the cyclostationary properties of the noise are
power, , which changes periodically with time. Applying less important in the treatment of the phase noise of ring
the above expression for to (11), is given by oscillators. This unfortunate coincidence is one of the reasons
why ring oscillators in general have inferior phase noise
performance compared to a Colpitts LC oscillator. The other
important reason is that ring oscillators dissipate all the stored
energy during one cycle.
(26)
E. Predicting Output Phase Noise with Multiple Noise Sources
As can be seen, the cyclostationary noise can be treated as
The method of analysis outlined so far has been used to
a stationary noise applied to a system with an effective ISF
predict how much phase noise is contributed by a single noise
given by
source. However, this method may be extended to multiple
(27) noise sources and multiple nodes, as individual contributions
by the various noise sources may be combined by exploiting
where can be derived easily from device noise character- superposition. Superposition holds because the first system of
istics and operating point. Hence, this effective ISF should be Fig. 8 is linear.
HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS 187

In particular, consider the model for LC oscillators in [3], as


well as the more comprehensive presentation of [8]. Those
models assume linear time-invariance, that all noise sources
are stationary, that only the noise in the vicinity of is
important, and that the noise-free waveform is a perfect
sinusoid. These assumptions are equivalent to discarding all
but the term in the ISF and setting . As a specific
example, consider the oscillator of Fig. 2. The phase noise
due solely to the tank parallel resistor can be found by
applying the following to (19):

(28)
Fig. 15. 0(x), 0e (x), and (x) for the ring oscillator of Fig. 5(b).
where is the parallel resistor, is the tank capacitor, and
is the maximum voltage swing across the tank. Equation
The actual method of combining the individual contributions (19) reduces to
requires attention to any possible correlations that may exist
among the noise sources. The complete method for doing so (29)
may be appreciated by noting that an oscillator has a current
noise source in parallel with each capacitor and a voltage noise Since [8] assumes equal contributions from amplitude and
source in series with each inductor. The phase noise in the phase portions to , the result obtained in [8] is
output of such an oscillator is calculated using the following two times larger than the result of (29).
method. Assuming that the total noise contribution in a parallel tank
oscillator can be modeled using an excess noise factor as
1) Find the equivalent current noise source in parallel with
in [3], (29) together with (24) result in (6). Note that the
each capacitor and an equivalent voltage source in series
generalized approach presented here is capable of calculating
with each inductor, keeping track of correlated and
the fitting parameters used in (3), ( and ) in terms of
noncorrelated portions of the noise sources for use in
later steps. coefficients of ISF and device noise corner, .
2) Find the transfer characteristic from each source to the
output excess phase. This can be done as follows. IV. DESIGN IMPLICATIONS
a) Find the ISF for each source, using any of the Several design implications emerge from (18), (21), and (24)
methods proposed in the Appendix, depending on that offer important insight for reduction of phase noise in the
the required accuracy and simplicity. oscillators. First, they show that increasing the signal charge
b) Find and (rms and dc values) of the ISF. displacement across the capacitor will reduce the phase
3) Use and coefficients and the power spectrum of noise degradation by a given noise source, as has been noted
the input noise sources in (21) and (23) to find the phase in previous works [5], [6].
noise power resulting from each source. In addition, the noise power around integer multiples of the
4) Sum the individual output phase noise powers for uncor- oscillation frequency has a more significant effect on the close-
related sources and square the sum of phase noise rms in phase noise than at other frequencies, because these noise
values for correlated sources to obtain the total noise components appear as phase noise sidebands in the vicinity
power below the carrier. of the oscillation frequency, as described by (18). Since the
contributions of these noise components are scaled by the
Note that the amount of phase noise contributed by each
Fourier series coefficients of the ISF, the designer should
noise source depends only on the value of the noise power
seek to minimize spurious interference in the vicinity of
density , the amount of charge swing across the effec-
for values of such that is large.
tive capacitor it is injecting into , and the steady-state
Criteria for the reduction of phase noise in the region
oscillation waveform across the noise source of interest. This
are suggested by (24), which shows that the corner of
observation is important since it allows us to attribute a definite
the phase noise is proportional to the square of the coefficient
contribution from every noise source to the overall phase noise.
. Recalling that is twice the dc value of the (effective)
Hence, our treatment is both an analysis and design tool,
ISF function, namely
enabling designers to identify the significant contributors to
phase noise.
(30)

F. Existing Models as Simplified Cases it is clear that it is desirable to minimize the dc value of
As asserted earlier, the model proposed here reduces to the ISF. As shown in the Appendix, the value of is
earlier models if the same simplifying assumptions are made. closely related to certain symmetry properties of the oscillation
188 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a)

(b)

(a) (b)
Fig. 17. Simulated power spectrum with current injection at f m = 50 MHz
for (a) asymmetrical node and (b) symmetrical node.

(c)
oscillator. In the second experiment, the same source is applied
to the asymmetric node. As can be seen from the power
spectra of the figure, noise injected into the asymmetric
node results in sidebands that are 12 dB larger than at the
symmetric node.
(d) Note that (30) suggests that upconversion of low frequency
noise can be significantly reduced, perhaps even eliminated,
by minimizing , at least in principle. Since depends
on the waveform, this observation implies that a proper
choice of waveform may yield significant improvements in
close-in phase noise. The following experiment explores this
Fig. 16. (a) Waveform and (b) ISF for the asymmetrical node. (c) Waveform
and (d) ISF for one of the symmetrical nodes.
concept by changing the ratio of to over some range,
while injecting 10 A of sinusoidal current at 100 MHz into
one node. The sideband power below carrier as a function
of the to ratio is shown in Fig. 18. The SPICE-
waveform. One such property concerns the rise and fall simulated sideband power is shown with plus symbols and
times; the ISF will have a large dc value if the rise and the sideband power as predicted by (18) is shown by the
fall times of the waveform are significantly different. A solid line. As can be seen, close-in phase noise due to
limited case of this for odd-symmetric waveforms has been upconversion of low-frequency noise can be suppressed by
observed [14]. Although odd-symmetric waveforms have small an arbitrary factor, at least in principle. It is important to note,
coefficients, the class of waveforms with small is not however, that the minimum does not necessarily correspond to
limited to odd-symmetric waveforms. equal transconductance ratios, since other waveform properties
To illustrate the effect of a rise and fall time asymmetry, influence the value of . In fact, the optimum to ratio
consider a purposeful imbalance of pull-up and pull-down in this particular example is seen to differ considerably from
rates in one of the inverters in the ring oscillator of Fig. 5(b). that used in conventional ring oscillator designs.
This is obtained by halving the channel width of the The importance of symmetry might lead one to conclude
NMOS device and doubling the width of the PMOS that differential signaling would minimize . Unfortunately,
device of one inverter in the ring. The output waveform while differential circuits are certainly symmetrical with re-
and corresponding ISF are shown in Fig. 16(a) and (b). As spect to the desired signals, the differential symmetry dis-
can be seen, the ISF has a large dc value. For compari- appears for the individual noise sources because they are
son, the waveform and ISF at the output of a symmetrical independent of each other. Hence, it is the symmetry of
inverter elsewhere in the ring are shown in Fig. 16(c) and each half-circuit that is important, as is demonstrated in the
(d). From these results, it can be inferred that the close-in differential ring oscillator of Fig. 19. A sinusoidal current of
phase noise due to low-frequency noise sources should be 100 A at 50 MHz injected at the drain node of one of
smaller for the symmetrical output than for the asymmetrical the buffer stages results in two equal sidebands, 46 dB
one. To investigate this assertion, the results of two SPICE below carrier, in the power spectrum of the differential output.
simulations are shown in Fig. 17. In the first simulation, Because of the voltage dependent conductance of the load
a sinusoidal current source of amplitude 10 A at devices, the individual waveform on each output node is not
MHz is applied to one of the symmetric nodes of the fully symmetrical and consequently, there will be a large
HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS 189

m = 100
Fig. 20. Measured sideband power versus injected current at f

Fig. 18. Simulated and predicted sideband power for low frequency injection + m=55
kHz, f0 f : MHz, f02 + m = 10 9
f : MHz, f03 + m = 16 3 MHz.
f :

versus PMOS to NMOS W=L ratio.

amines the linearity of current-to-phase conversion using a


five-stage, 5.4-MHz ring oscillator constructed with ordinary
CMOS inverters. A sinusoidal current is injected at frequencies
kHz, MHz,
MHz, and MHz, and the sideband powers
at are measured as the magnitude of the injected
current is varied. At any amplitude of injected current, the
sidebands are equal in amplitude to within the accuracy of
the measurement setup (0.2 dB), in complete accordance with
the theory. These sideband powers are plotted versus the
input injected current in Fig. 20. As can be seen, the transfer
function for the input current power to the output sideband
power is linear as suggested by (18). The slope of the best
Fig. 19. Four-stage differential ring oscillator.
fit line is 19.8 dB/decade, which is very close to the predicted
slope of 20 dB/decade, since excess phase is proportional
upconversion of noise to close-in phase noise, even though to , and hence the sideband power is proportional to ,
differential signaling is used. leading to a 20-dB/decade slope. The behavior shown in
Since the asymmetry is due to the voltage dependent con- Fig. 20 verifies that the linearity of (18) holds for injected
ductance of the load, reduction of the upconversion might be input currents orders of magnitude larger than typical noise
achieved through the use of a perfectly linear resistive load, currents.
because the rising and falling behavior is governed by an The second experiment varies the frequency offset from
RC time constant and makes the individual waveforms more an integer multiple of the oscillation frequency. An input
symmetrical. It was first observed in the context of supply sinusoidal current source of 20 A (rms) at ,
noise rejection [15], [16] that using more linear loads can , and is applied to one node and the output
reduce the effect of supply noise on timing jitter. Our treatment is measured at another node. The sideband power is plotted
shows that it also improves low-frequency noise upconversion versus in Fig. 21. Note that the slope in all four cases is
into phase noise. 20 dB/decade, again in complete accordance with (18).
Another symmetry-related property is duty cycle. Since the The third experiment aims at verifying the effect of the
ISF is waveform-dependent, the duty cycle of a waveform coefficients on the sideband power. One of the predictions
is linked to the duty cycle of the ISF. Non-50% duty cycles of the theory is that is responsible for the upconver-
generally result in larger for even . The high- tank of sion of low frequency noise. As mentioned before, is
an LC oscillator is helpful in this context, since a high will a strong function of waveform symmetry at the node into
produce a more symmetric waveform and hence reduce the which the current is injected. Noise injected into a node with
upconversion of low-frequency noise. an asymmetric waveform (created by making one inverter
asymmetric in a ring oscillator) would result in a greater
increase in sideband power than injection into nodes with
V. EXPERIMENTAL RESULTS more symmetric waveforms. Fig. 22 shows the results of an
This section presents experimental verifications of the model experiment performed on a five-stage ring oscillator in which
to supplement simulation results. The first experiment ex- one of the stages is modified to have an extra pulldown
190 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

m
Fig. 21. Measured sideband power versus f , for injections in vicinity of
Fig. 23. Phase noise measurements for a five-stage single-ended CMOS ring
oscillator. f0 = 232 MHz, 2-m process technology.
multiples of f0 .

fC. As discussed in the previous section, noise current


injected during a transition has the largest effect. The cur-
rent noise power at this point is the sum of the current
noise powers due to NMOS and PMOS devices. At this bias
point,
A2 /Hz and (
2
A /Hz. Using the methods outlined in the Appendix,
it may be shown that for ring oscillators.
Equation (21) for identical noise sources then predicts
. At an offset of kHz,
this equation predicts kHz dBc/Hz, in good
agreement with a measurement of 114.5 dBc/Hz. To predict
the phase noise in the region, it is enough to calculate
the corner. Measurements on an isolated inverter on the
Fig. 22. Power of the sidebands caused by low frequency injection into same die show a noise corner frequency of 250 kHz,
symmetric and asymmetric nodes of the ring oscillator. when its input and output are shorted. The ratio is
calculated to be 0.3, which predicts a corner of 75 kHz,
compared to the measured corner of 80 kHz.
NMOS device. A current of 20 A (rms) is injected into this The fifth experiment measures the phase noise of an 11-
asymmetric node with and without the extra pulldown device. stage ring, running at MHz implemented on the same
For comparison, this experiment is repeated for a symmetric die as the previous experiment. The phase noise measurements
node of the oscillator, before and after this modification. Note are shown in Fig. 24. For the inverters in this oscillator,
that the sideband power is 7 dB larger when noise is injected m m and m m, which
into the node with the asymmetrical waveform, while the results in a total capacitance of 43.5 fF and fC.
sidebands due to signal injection at the symmetric nodes are The phase noise is calculated in exactly the same manner as
essentially unchanged with the modification. the previous experiment and is calculated to be
The fourth experiment compares the prediction and mea- , or 122.1 dBc/Hz at a 500-kHz offset.
surement of the phase noise for a five-stage single-ended ring The measured phase noise is 122.5 dBc/Hz, again in good
oscillator implemented in a 2- m, 5-V CMOS process running agreement with predictions. The ratio is calculated
at MHz. This measurement was performed using a to be 0.17 which predicts a corner of 43 kHz, while the
delay-based measurement method and the result is shown in measured corner is 45 kHz.
Fig. 23. Distinct and regions are observed. We The sixth experiment investigates the effect of symmetry
first start with a calculation for the region. For this on region behavior. It involves a seven-stage current-
process we have a gate oxide thickness of nm starved, single-ended ring oscillator in which each inverter
and threshold voltages of V and V. stage consists of an additional NMOS and PMOS device
All five inverters are similar with m m in series. The gate drives of the added transistors allow
and m m, and a lateral diffusion of independent control of the rise and fall times. Fig. 25 shows
m. Using the process and geometry information, the total the phase noise when the control voltages are adjusted to
capacitance on each node, including parasitics, is calculated achieve symmetry versus when they are not. In both cases the
to be fF. Therefore, control voltages are adjusted to keep the oscillation frequency
HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS 191

Fig. 26. Sideband power versus the voltage controlling the symmetry of the
waveform. Seven-stage current-starved single-ended CMOS VCO. f0 = 50
Fig. 24. Phase noise measurements for an 11-stage single-ended CMOS ring MHz, 2-m process technology.
oscillator. f0 = 115 MHz, 2-m process technology.

Fig. 27. Phase noise measurements for a four-stage differential CMOS ring
oscillator. f0 = 200MHz, 0.5-m process technology.
Fig. 25. Effect of symmetry in a seven-stage current-starved single-ended
CMOS VCO. f0 = 60 MHz, 2-m process technology.
is A2 /Hz. Using these numbers
for , the phase noise in the region is predicted to be
constant at 60 MHz. As can be seen, making the waveform , or 103.2 dBc/Hz at an offset
more symmetric has a large effect on the phase noise in the of 1 MHz, while the measurement in Fig. 27 shows a phase
region without significantly affecting the region. noise of 103.9 dBc/Hz, again in agreement with prediction.
Another experiment on the same circuit is shown in Fig. 26, Also note that despite differential symmetry, there is a distinct
which shows the phase noise power spectrum at a 10 kHz region in the phase noise spectrum, because each half
offset versus the symmetry-controlling voltage. For all the circuit is not symmetrical.
data points, the control voltages are adjusted to keep the The eighth experiment investigates cyclostationary effects
oscillation frequency at 50 MHz. As can be seen, the phase in the bipolar Colpitts oscillator of Fig. 5(a), where the con-
noise reaches a minimum by adjusting the symmetry properties duction angle is varied by changing the capacitive divider
of the waveform. This reduction is limited by the phase noise ratio while keeping the effective parallel
in region and the mismatch in transistors in different capacitance constant to maintain
stages, which are controlled by the same control voltages. an of 100 MHz. As can be seen in Fig. 28, increasing
The seventh experiment is performed on a four-stage differ- decreases the conduction angle, and thereby reduces the
ential ring oscillator, with PMOS loads and NMOS differential effective , leading to an initial decrease in phase noise.
stages, implemented in a 0.5- m CMOS process. Each stage is However, the oscillation amplitude is approximately given by
tapped with an equal-sized buffer. The tail current source has , and therefore decreases for large
a quiescent current of 108 A. The total capacitance on each values of . The phase noise ultimately increases for large as
of the differential nodes is calculated to be fF a consequence. There is thus a definite value of (here, about
and the voltage swing is V, which results in 0.2) that minimizes the phase noise. This result provides a
fF. The total channel noise current on each node theoretical basis for the common rule-of-thumb that one should
192 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 28. Sideband power versus capacitive division ratio. Bipolar LC Colpitts
oscillator f0 = 100 MHz.
Fig. 29. State-space trajectory of an nth-order oscillator.

use ratios of about four (corresponding to ) in


Colpitts oscillators [17]. for a few cycles afterwards. By sweeping the impulse injec-
tion time across one cycle of the waveform and measuring
VI. CONCLUSION the resulting time shift , can calculated noting
that , where is the period of oscillation.
This paper has presented a model for phase noise which
Fortunately, many implementations of SPICE have an internal
explains quantitatively the mechanism by which noise sources
feature to perform the sweep automatically. Since for each
of all types convert to phase noise. The power of the model
impulse one needs to simulate the oscillator for only a few
derives from its explicit recognition of practical oscillators
cycles, the simulation executes rapidly. Once is
as time-varying systems. Characterizing an oscillator with the
found, the ISF is calculated by multiplication with . This
ISF allows a complete description of the noise sensitivity
method is the most accurate of the three methods presented.
of an oscillator and also allows a natural accommodation of
cyclostationary noise sources.
This approach shows that noise located near integer mul- B. Closed-Form Formula for the ISF
tiples of the oscillation frequency contributes to the total An th-order system can be represented by its trajectory in
phase noise. The model specifies the contribution of those an -dimensional state-space. In the case of a stable oscillator,
noise components in terms of waveform properties and circuit the state of the system, represented by the state vector, ,
parameters, and therefore provides important design insight by periodically traverses a closed trajectory, as shown in Fig. 29.
identifying and quantifying the major sources of phase noise Note that the oscillator does not necessarily traverse the limit
degradation. In particular, it shows that symmetry properties cycle with a constant velocity.
of the oscillator waveform have a significant effect on the In the most general case, the effect of a group of external
upconversion of low frequency noise and, hence, the impulses can be viewed as a perturbation vector which
corner of the phase noise can be significantly lower than
suddenly changes the state of the system to . As
the device noise corner. This observation is particularly
discussed earlier, amplitude variations eventually die away,
important for MOS devices, whose inferior noise has been
but phase variations do not. Application of the perturbation
thought to preclude their use in high-performance oscillators.
impulse causes a certain change in phase in either a negative
or positive direction, depending on the state-vector and the
APPENDIX
direction of the perturbation. To calculate the equivalent time
CALCULATION OF THE IMPULSE SENSITIVITY FUNCTION
shift, we first find the projection of the perturbation vector on
In this Appendix we present three different methods to a unity vector in the direction of motion, i.e., the normalized
calculate the ISF. The first method is based on direct mea- velocity vector
surement of the impulse response and calculating from
it. The second method is based on an analytical state-space
approach to find the excess phase change caused by an impulse (31)
of current from the oscillation waveforms. The third method
is an easy-to-use approximate method.
where is the equivalent displacement along the trajectory, and
A. Direct Measurement of Impulse Response is the first derivative of the state vector. Note the scalar
In this method, an impulse is injected at different relative nature of , which arises from the projection operation. The
phases of the oscillation waveform and the oscillator simulated equivalent time shift is given by the displacement divided by
HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS 193

the “speed”

(32)

which results in the following equation for excess phase caused


by the perturbation:

(33)

In the specific case where the state variables are node


voltages, and an impulse is applied to the th node, there will
be a change in given by (10). Equation (33) then reduces
to
Fig. 30. ISF’s obtained from different methods.
(34)

where is the norm of the first derivative of the waveform identical stages. The denominator may then be approximated
vector and is the derivative of the th node voltage. Equa- by
tion (34), together with the normalized waveform function
defined in (1), result in the following: (38)

(35) Fig. 30 shows the results obtained from this method compared
with the more accurate results obtained from methods and
where represents the derivative of the normalized waveform . Although this method is approximate, it is the easiest to
on node , hence use and allows a designer to rapidly develop important insights
into the behavior of an oscillator.
(36)
ACKNOWLEDGMENT
The authors would like to thank T. Ahrens, R. Betancourt, R.
It can be seen that this expression for the ISF is maximum Farjad-Rad, M. Heshami, S. Mohan, H. Rategh, H. Samavati,
during transitions (i.e., when the derivative of the waveform D. Shaeffer, A. Shahani, K. Yu, and M. Zargari of Stanford
function is maximum), and this maximum value is inversely University and Prof. B. Razavi of UCLA for helpful discus-
proportional to the maximum derivative. Hence, waveforms sions. The authors would also like to thank M. Zargari, R.
with larger slope show a smaller peak in the ISF function. Betancourt, B. Amruturand, J. Leung, J. Shott, and Stanford
In the special case of a second-order system, one can use Nanofabrication Facility for providing several test chips. They
the normalized waveform and its derivative as the state are also grateful to Rockwell Semiconductor for providing
variables, resulting in the following expression for the ISF: access to their phase noise measurement system.

(37) REFERENCES
[1] E. J. Baghdady, R. N. Lincoln, and B. D. Nelin, “Short-term frequency
where represents the second derivative of the function . In stability: Characterization, theory, and measurement,” Proc. IEEE, vol.
the case of an ideal sinusoidal oscillator , so that 53, pp. 704–722, July 1965.
, which is consistent with the argument [2] L. S. Cutler and C. L. Searle, “Some aspects of the theory and
measurement of frequency fluctuations in frequency standards,” Proc.
of Section III. This method has the attribute that it computes IEEE, vol. 54, pp. 136–154, Feb. 1966.
the ISF from the waveform directly, so that simulation over [3] D. B. Leeson, “A simple model of feedback oscillator noises spectrum,”
only one cycle of is required to obtain all of the necessary Proc. IEEE, vol. 54, pp. 329–330, Feb. 1966.
[4] J. Rutman, “Characterization of phase and frequency instabilities in
information. precision frequency sources; Fifteen years of progress,” Proc. IEEE,
vol. 66, pp. 1048–1174, Sept. 1978.
[5] A. A. Abidi and R. G. Meyer, “Noise in relaxation oscillators,” IEEE
C. Calculation of ISF Based on the First Derivative J. Solid-State Circuits, vol. SC-18, pp. 794–802, Dec. 1983.
[6] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in
This method is actually a simplified version of the second CMOS ring oscillators,” in Proc. ISCAS, June 1994, vol. 4, pp. 27–30.
approach. In certain cases, the denominator of (36) shows little [7] J. McNeil, “Jitter in ring oscillators,” in Proc. ISCAS, June 1994, vol.
variation, and can be approximated by a constant. In such a 6, pp. 201–204.
[8] J. Craninckx and M. Steyaert, “Low-noise voltage controlled oscillators
case, the ISF is simply proportional to the derivative of the using enhanced LC-tanks,” IEEE Trans. Circuits Syst.–II, vol. 42, pp.
waveform. A specific example is a ring oscillator with 794–904, Dec. 1995.
194 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

[9] B. Razavi, “A study of phase noise in CMOS oscillators,” IEEE J. Thomas H. Lee (M’83) received the S.B., S.M.,
Solid-State Circuits, vol. 31, pp. 331–343, Mar. 1996. Sc.D. degrees from the Massachusetts Institute of
[10] B. van der Pol, “The nonlinear theory of electric oscillations,” Proc. Technology (MIT), Cambridge, in 1983, 1985, and
IRE, vol. 22, pp. 1051–1086, Sept. 1934. 1990, respectively.
[11] N. Minorsky, Nonlinear Oscillations. Princeton, NJ: Van Nostrand, He worked for Analog Devices Semiconductor,
1962. Wilmington, MA, until 1992, where he designed
[12] P. A. Cook, Nonlinear Dynamical Systems. New York: Prentice Hall, high-speed clock-recovery PLL’s that exhibit zero
1994. jitter peaking. He then worked for Rambus Inc.,
[13] W. A. Gardner, Cyclostationarity in Communications and Signal Pro- Mountain View, CA, where he designed the phase-
cessing. New York: IEEE Press, 1993. and delay-locked loops for 500 MB/s DRAM’s. In
[14] H. B. Chen, A. van der Ziel, and K. Amberiadis, “Oscillator with odd- 1994, he joined the faculty of Stanford University,
symmetrical characteristics eliminates low-frequency noise sidebands,” Stanford, CA, as an Assistant Professor, where he is primarily engaged in
IEEE Trans. Circuits Syst., vol. CAS-31, Sept. 1984. research into microwave applications for silicon IC technology, with a focus
[15] J. G. Maneatis, “Precise delay generation using coupled oscillators,” on CMOS IC’s for wireless communications.
IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282, Dec. 1993. Dr. Lee was recently named a recipient of a Packard Foundation Fellowship
[16] C. K. Yang, R. Farjad-Rad, and M. Horowitz, “A 0.6mm CMOS 4Gb/s
award and is the author of The Design of CMOS Radio-Frequence Integrated
transceiver with data recovery using oversampling,” in Symp. VLSI
Circuits (Cambridge University Press). He has twice received the “Best Paper”
Circuits, Dig. Tech. Papers, June 1997.
award at ISSCC.
[17] D. DeMaw, Practical RF Design Manual. Englewood Cliffs, NJ:
Prentice-Hall, 1982, p. 46.

Ali Hajimiri (S’95) was born in Mashad, Iran, in


1972. He received the B.S. degree in electronics
engineering from Sharif University of Technology in
1994 and the M.S. degree in electrical engineering
from Stanford University, Stanford, CA, in 1996,
where he is currently engaged in research toward
the Ph.D. degree in electrical engineering.
He worked as a Design Engineer for Philips on a
BiCMOS chipset for the GSM cellular units from
1993 to 1994. During the summer of 1995, he
worked for Sun Microsystems, Sunnyvale, CA, on
the UltraSparc microprocessor’s cache RAM design methodology. Over the
summer of 1997, he worked at Lucent Technologies (Bell-Labs), where he
investigated low phase noise integrated oscillators. He holds one European
and two U.S. patents.
Mr. Hajimiri is the Bronze medal winner of the 21st International Physics
Olympiad, Groningen, Netherlands.
56 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

Transactions Briefs
A Study of Oscillator Jitter Due
to Supply and Substrate Noise
Frank Herzel and Behzad Razavi
(a)

Abstract—This paper investigates the timing jitter of single-ended and


differential CMOS ring oscillators due to supply and substrate noise.
We calculate the jitter resulting from supply and substrate noise, show
that the concept of frequency modulation can be applied, and derive
relationships that express different types of jitter in terms of the sensitivity
of the oscillation frequency to the supply or substrate voltage. Using
examples based on measured results, we show that thermal jitter is
typically negligible compared to supply- and substrate-induced jitter in
high-speed digital systems. We also discuss the dependence of the jitter
of differential CMOS ring oscillators on transistor gate width, power
consumption, and the number of stages. (b)
Index Terms—Jitter, oscillator, phase-locked loops, supply noise. Fig. 1. Single-ended ring oscillator: (a) block diagram and (b) implementa-
tion of one stage.

I. INTRODUCTION
High-speed digital circuits such as microprocessors and memories
employ phase locking at the board-chip interface to suppress timing
skews between the on-chip clock and the system clock [1]–[3].
Fabricated on the same substrate as the rest of the circuit, the
phase-locked loop (PLL) must typically operate from the global (a)
supply and ground busses, thus experiencing both substrate and
supply noise. The noise manifests itself as jitter at the output of the
PLL, primarily through various mechanisms in the voltage-controlled
oscillator (VCO). As exemplified by measured results reported in the
literature, we show that the contribution of device electronic noise
to jitter is typically much less than that due to supply and substrate
noise.
This paper describes the effect of supply and substrate noise on
the performance of single-ended and differential ring oscillators,
providing insights that prove useful in the design of other types of
oscillators as well. Section II summarizes the oscillators studied in
this work and Section III defines various types of jitter. Sections IV
and V quantify the jitter due to thermal noise in the oscillation (b)
loop and frequency-modulating noise, respectively. Sections VI and Fig. 2. Differential ring oscillator: (a) block diagram and (b) implementation
VII apply the developed results to the analysis of supply and of one stage.
substrate noise, and Section VIII presents the dependence of jitter
upon parameters such as device size, the number of stages, and power The simulations were performed with the SPICE parameters of a
dissipation. 0.6-m CMOS technology. We employed the minimum gate length
throughout the paper. Furthermore, unless indicated otherwise, we
II. RING OSCILLATORS UNDER INVESTIGATION use the following parameters for the differential stage: W m, = 80
RL =1
k , ISS =1mA, CL =0
; VDD =3
V. The rms value of
In this paper, we investigate both single-ended ring oscillators
(SERO’s) and differential ring oscillators (DRO’s). The latter are 1 VDD was chosen to be 71 mV, corresponding to a peak amplitude
much more important in digital circuit applications, since DRO’s are of 100 mV for a sinusoidal perturbation.
less affected by supply and substrate noise. The circuit topologies are
shown in Fig. 1 for the SERO and in Fig. 2 for the DRO. III. DEFINITIONS OF JITTER
()
We consider the output voltage Vout t of an oscillator in the steady
()
Manuscript received October 1, 1997; revised August 2, 1998. This paper
was recommended by Associate Editor B. H. Leung. state. The time point of the nth minus-to-plus zero crossing of Vout t
F. Herzel was with the Electrical Engineering Department, University of =
is referred to as tn . The nth period is then defined as Tn tn+1 0 tn .
California at Los Angeles, Los Angeles, CA 90095, USA, on leave from the For an ideal oscillator, this time difference is independent of n, but in
reality it varies with n as a result of noise in the circuit. This results
Institute for Semiconductor Physics, Frankfurt, Oder, Germany.
B. Razavi is with the Electrical Engineering Department, University of
1 =  
in a deviation Tn Tn 0 T from the mean period T . The quantity
1
California, Los Angeles, CA 90095 USA.
Publisher Item Identifier S 1057-7130(99)01471-8. Tn is an indication of jitter.
1057–7130/99$10.00  1999 IEEE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999 57

because the latter type hardly changes when the oscillator is placed
in the loop.
A more general quantification of the jitter is possible by means of
the steady-state autocorrelation function (ACF) defined as

( ) = lim 1
N
C1T m
N !1 N
(1
Tn+m Tn : 1 ) (4)
n=1
To obtain an intuitive understanding of this quantity, we insert (4)
with m =0 in (2), obtaining
(a)
1Tc2 = C1T (0): (5)
Equation (5) states that the ACF with zero argument is the squared
cycle jitter. For a nonzero argument, the ACF decreases with in-
creasing m, finally approaching zero for m ! 1. This indicates that
1
the timing error Tn has a finite memory. In order to express the
cycle-to-cycle jitter by the ACF, we rewrite (3) as
(b)
1 = lim 1 (1
N
Fig. 3. Illustration of (a) long-term jitter and (b) cycle-to-cycle jitter. Tcc2 1 ) Tn+1 0 Tn 2
N !1 N
n=1
More specifically, absolute jitter or long-term jitter
= 2 (0) 2 (1)
C1T 0 C1T : (6)

N This expression will be used for an analytical calculation of the


1
Tabs N ( )= Tn 1 (1) cycle-to-cycle jitter in Section V.
n=1
is often used to quantify the jitter of phase-locked loops. Modeling IV. JITTER DUE TO DEVICE ELECTRONIC NOISE
the total phase error with respect to an ideal oscillator [Fig. 3(a)], The electronic noise of the devices in an oscillator loop leads to
absolute jitter is nonetheless illsuited to describing the performance phase noise and jitter [5], [7], [8]. Our objective is to express jitter
1
of oscillators because, as shown later, the variance of Tabs diverges in terms of phase noise and vice versa. These relationships are useful
with time. as they relate two measurable quantities.
A better figure of merit for oscillators is cycle jitter, defined as the 1
In this paper, we neglect the effect of =f noise because it
rms value of the timing error Tn 1 1 introduces only slow phase variations in the oscillator. Such variations
are suppressed by the large loop bandwidth of PLL’s used in today’s

1Tc = Nlim 1 N
1Tn2 : (2)
digital systems.
!1 N n=1
As derived in the Appendix, for white noise sources in the
oscillator, the single-sideband phase noise S (phase noise with
Cycle jitter describes the magnitude of the period fluctuations, but it respect to the carrier) can be expressed in terms of the cycle-to-cycle
contains no information about the dynamics. jitter according to
!03 =4 1Tcc2 3 2
 !(0!=40!10 )T2 cc (7)
The third type of jitter considered here is cycle-to-cycle jitter
S (!) =
[Fig. 3(b)] given by
(! 0 !0)2 + !03 =8 2 1Tcc4
1Tcc = Nlim 1 N
(Tn+1 0 Tn )2 (3)
where !0 is the oscillation frequency and ! 0 !0 is the offset
!1 N n=1
frequency. The Appendix also shows that the cycle-to-cycle jitter
can be deduced from the phase noise according to

1Tcc2  4!3 S (!)(! 0 !0 )2 :


representing the rms difference between two consecutive periods.
(8)
Note the difference between the cycle jitter and the cycle-to-cycle
jitter: the former compares the oscillation period with the mean period
0
To obtain an estimate of the thermal jitter, we consider the differential
and the latter compares the period with the preceding period. Hence,
CMOS ring oscillator in [5]. For the 2.2-GHz oscillator with a phase
noise of 094 dBc/Hz at 1 MHz offset, we obtain from (8) a thermal
in contrast to cycle jitter, cycle-to-cycle jitter describes the short-
term dynamics of the period. The long-term dynamics, on the other
cycle-to-cycle jitter of 0.3 ps, i.e., less than 0.3 . Similar values are
hand, are not characterized by cycle-to-cycle jitter. For example, if
1 1
=f noise modulates the frequency slowly, Tcc does not reflect the obtained for the 900-MHz CMOS ring oscillators reported in [6].
In most timing applications, such small values are negligible with
result accurately. With respect to the zero crossings, the cycle-to-cycle
respect to other sources of random jitter.
jitter is a double-differential quantity in that three zero crossings of the
The thermal absolute jitter is proportional to the square root of the
output voltage are related to each other. As discussed in Section V,
this results in a completely different dependence on the modulation
1
measurement interval t. As derived in the Appendix, the absolute
jitter is given by
frequency than for the cycle jitter.
p
We should note that an oscillator embedded in a phase-locked loop
periodically receives correction pulses from the phase detector and
1Tabs = f20 1Tcc 1t: (9)
charge pump, and hence its long-term jitter strongly depends on the In [9], the rms value of absolute jitter has been divided by the
PLL dynamics. Thus, for the analysis of a free-running oscillator, square root of the measurement time to obtain a time-independent
cycle jitter and cycle-to-cycle jitter are more meaningful, particularly figure of merit. This is not possible for supply and substrate noise,
1 In this paper, we use a time average definition of jitter which is equivalent since (7)–(9) are derived for white noise in the feedback loop. Supply
to the stochastic average if and only if the process T 2 is ergodic.
1 and substrate noise, however, are generally not white.
58 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

in the remainder. The deviation of the period from the mean is

1T (t) = f0 + 11 f0 (t) 0 f10 (12)

2 cos !m t:
 0 VmfK0
(13)
0

1 (+ )
Multiplying this expression by T t  and averaging the result
with respect to t, we obtain the steady-state ACF
2 2
1T (t +  )1T (t) = Vm2fK4 0 cos !m : (14)
0

Fig. 4. Illustration of frequency modulation through changes of drain junc-


This quantity represents the ACF of the process 1 ()
T t in a
continuous-time description. For the evaluation of the jitter according
(0) (1)
tion capacitances.
to (5) and (6), we need the values C and C of the discrete-time
ACF. If the ACF does not change significantly during one oscillation
period, these values can be determined from the continuous-time
ACF at time points  =0  2 T and  =1  2 T . A numerical
verification of this approach is given below.
Inserting (14) with  =0 in (5), we find the cycle jitter

1Tc = Vpm2Kf 20 : (15)


0
For  =0 and  = T = 1=f0 , we obtain from (6) and (14) the
cycle-to-cycle jitter

1Tcc = VmfK2 0 1 0 cos(!m=f0 ): (16)


Fig. 5. Illustration of the VCO model of an oscillator. 0
Equations (15) and (16) express the jitter in terms of the low-
frequency sensitivity and the modulation frequency. They will be
V. JITTER OF A FREQUENCY-MODULATED OSCILLATOR verified numerically in Sections VI and VII. The main benefit of
The frequency of an oscillator generally depends on the supply these equations is that the calculation of the jitter is reduced to the
and substrate voltage. The variation of the oscillation frequency with calculation or measurement of the oscillation frequency as a function
a voltage may be described by a sensitivity function, also called the of the supply or substrate voltage.
gain of the VCO and denoted by KVCO . From (15), we note that cycle jitter is independent of frequency
For example, as shown in Fig. 4, the drain junction capacitance of so long as the quasi-static approximation (11) holds. By contrast,
M1 and M2 varies with VDD and VSub , thus modulating the frequency cycle-to-cycle jitter increases with frequency. For fm  f0 we find
of the ring oscillator. In some cases, KVCO itself may be a function from (16)
of the modulating frequency. In Fig. 4, for example, high-frequency
supply noise results in fast changes in VP and hence substantial 1Tcc  VmpK2f0 !3 m : (17)
displacement current through CS (the capacitance contributed by M1 , 0
M2 , and the current source). Note that, in general, the frequency Note that the cycle-to-cycle jitter 1Tcc is approximately proportional
of an oscillator depends on various bias and supply voltages, as to the modulation frequency fm . This can be interpreted by noting
conceptually illustrated in Fig. 5. that 1Tcc is a double-differential quantity, as is evident from (5)
An oscillator subject to supply and substrate noise may be con- and (6).
sidered as a VCO with different “control” voltages each having a Having reduced the jitter calculation to the static sensitivity K0 ,
different sensitivity. In this section, we attribute the cycle jitter and the we need to extract this quantity from simulations. For this purpose,
cycle-to-cycle jitter of a frequency-modulated oscillator to the static we apply a dc voltage perturbation to the supply. Fig. 6 shows the
sensitivity K0 . As shown in Sections VI and VII, these expressions oscillation frequency of the SERO and the DRO as a function of the
describe supply and substrate noise quite accurately. supply voltage. The frequency varies linearly with the supply voltage
Let the modulating control voltage be a small sinusoidal pertur- over a relatively wide range of VDD . The slope of the curves in Fig. 6
bation represents the low-frequency sensitivity K0 , indicating that the SERO
is much more sensitive than the DRO. Using these values in (15) and
(16), we can predict the jitter from supply and substrate noise easily.
1Vm (t) = Vm cos !m t: (10)
VI. JITTER DUE TO SUPPLY NOISE
We assume, as an approximation, that the frequency change follows The supply and substrate noise created in a digital system is quite
the control voltage according to complex. In addition to components at the clock frequency and
harmonics and subharmonics thereof, the noise spectrum generally
1f0 (t) = VmK0 cos !mt (11) exhibits random signals resulting from the activities of each building
block as well. A rigorous treatment requires that the noise spectrum
where K0 is the static sensitivity, also called low-frequency VCO be measured in a realistic environment and subsequently incorporated
gain. This approximation is referred to as quasi-static approximation in the analysis as explained in Section V.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999 59

(a) (a)

(b) (b)

Fig. 6. Oscillation frequency of (a) the single-ended ring oscillator and (b) Fig. 7. Cycle jitter and cycle-to-cycle jitter of (a) the SERO and (b) the DRO
the differential ring oscillator as a function of static supply voltage. as a function of supply voltage noise frequency. The solid lines represent the
quasi-static FM expressions.

In the following, we investigate the jitter due to sinusoidal supply


voltage perturbations. The calculation of the jitter consists of the
following steps: interpolation of the voltage waveform to find the
zero crossings; calculation of the periods Tn and subtraction of the
 1
mean period T to obtain Tn ; and calculation of the cycle jitter (2)
and the cycle-to-cycle jitter (3) by performing time averaging.
We should also mention that simulations indicate that jitter has Fig. 8. Substrate noise modeling.
a relatively linear dependence on the noise amplitude for supply
variations as large as a few hundred millivolts.
Fig. 7 plots the analytical and simulated cycle and cycle-to-cycle 1
of 0 V in Vsub . Simulations confirm that the static sensitivity K0
jitter of single-ended and differential ring oscillators. As can be seen, is indeed equal for supply and substrate noise, apart from the sign.
the analytical results of Section V predict the jitter with reasonable Fig. 9 also demonstrates that the quasi-static FM approach is suited
accuracy. to describing the jitter introduced by substrate noise. Furthermore,
it suggests that a substantial fraction of the jitter results from the
VII. JITTER DUE TO SUBSTRATE NOISE voltage dependence of Cdb and Csb .

Substrate noise can be treated in the same fashion as supply noise.


For the numerical simulation of substrate noise, the bulk terminal of VIII. OSCILLATOR DESIGN FOR LOW JITTER
the transistors is driven by a noise source (Fig. 8). Fig. 9 shows the The simulation results presented thus far indicate the superior
calculated jitter of the DRO as a function of the noise frequency. performance of differential oscillators with respect to single-ended
Comparison with Fig. 7 indicates that for the DRO, a supply voltage topologies. Nonetheless, even differential configurations have a wide
perturbation is almost equivalent to a substrate voltage perturbation of design space; device size, voltage swings, power dissipation, and the
opposite sign. To understand this, note from Fig. 10 that, with an ideal number of stages in a ring oscillator influence the overall sensitivity
1
tail current source, a change of V in VDD is equivalent to a change to supply and substrate noise.
60 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

Fig. 11. Jitter of the DRO versus gate width.


Fig. 9. Cycle jitter and cycle-to-cycle jitter of the DRO versus substrate
voltage noise frequency. Solid lines represent the quasi-static FM expressions.
The empty symbols show the jitter with the drain-bulk and source-bulk
capacitances set to zero.

(a)

Fig. 10. Illustration of the equivalence of supply and substrate noise for the
DRO. (b)
Fig. 12. Illustration of the relationship between power consumption and
In this section, we study jitter as a function of three parameters: noise for (a) device electronic noise and (b) supply noise.
transistor gate width, power dissipation, and the number of stages. To
make meaningful comparisons, the circuit is modified in each case
By contrast, the effect of supply and substrate noise on the jitter of a
such that the frequency of oscillation remains constant. These param-
given oscillator topology is relatively independent of the power drain.
eters also affect the thermal jitter to some extent, but, considering the
This can be understood with the aid of the conceptual illustrations
vastly different designs reported in [5] and [6], we note that this type
in Fig. 12, where the output voltages of N identical oscillators are
of jitter still remains negligible.
added in phase. In Fig. 12(a), only the device electronic noise is
A. Effect of Transistor Gate Width considered [5]. Since thepnoise in each oscillator is uncorrelated, the
output noise voltage is N times that of each oscillator, whereas
the output signal voltage is N 2Vj . In Fig. 12(b), on the other hand,
The differential three-stage ring oscillator of Fig. 2 begins to
oscillate for W  30 m.
all oscillators are disturbed by the same noise source, thus exhibiting
Fig. 11 shows the effect of the gate width on the jitter, where the
completely correlated noise. That is, both the noise voltage and the
oscillation frequency is kept constant by adjusting CL in Fig. 2. The
signal voltage are increased by a factor of N .
jitter reaches a minimum for W  80 m. For large W , the value of
To confirm the above observation, the gate width and tail current
CL must be reduced so as to maintain the same oscillation frequency, were decreased while the load resistance was increased proportion-
yielding a larger voltage-dependent fraction due to drain and source
ally. Table I shows that the jitter is quite constant.
junctions of each device and hence a higher sensitivity to noise.

B. Effect of Power Consumption C. Effect of Number of Stages


The jitter resulting from device electronic noise generally exhibits In applications where the required oscillation frequency is con-
an inverse dependence upon the oscillator power dissipation [5], [10]. siderably lower than the maximum speed of the technology, a ring
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999 61

TABLE I
IMPACT OF POWER CONSUMPTION

TABLE II
THREE-STAGE VERSUS SIX-STAGE OSCILLATOR Fig. 14. Grounded shield used under the capacitor to block substrate noise.

an n-well, grounded by a low-resistance n+ ring, is placed under the


capacitor so as to block the noise produced in the substrate.

IX. CONCLUSION
We have investigated the timing jitter in oscillators subject to
supply and substrate noise. For digital timing applications, the effect
of supply and substrate noise on the jitter is typically much more
pronounced than that of thermal noise. For supply and substrate noise,
we have derived analytical relationships between the cycle-to-cycle
jitter and the low-frequency sensitivity of the oscillation frequency
to supply or substrate noise. These relationships have been verified
by means of numerical calculations for single-ended and differential
CMOS ring oscillators. For differential ring oscillators, we have
investigated the dependence of the jitter on the transistor gate width,
power consumption, and the number of stages. As a special result,
we have found that in applications where the required oscillation
frequency is lower than the maximum speed of the technology, a
three-stage ring oscillator with additional load capacitances gives the
lowest jitter.

APPENDIX
JITTER AND PHASE NOISE DUE TO THERMAL AND SHOT NOISE
The output voltage of an oscillator can be written as
V t ( ) = V0 cos[!0t + (t)] (18)
where V0 is the amplitude, !0 is the oscillation frequency, and  t ()
is the slowly varying excess phase. The excess frequency is

1!(t) = dtd (t) (19)


Fig. 13. Jitter of the three-stage and the six-stage DRO versus gate width
for an oscillation frequency of 500 MHz. and hence
t
( )=
 t 1!(u) du + (0): (20)
oscillator may incorporate more than three stages. Thus, the optimum 0
number of stages with respect to the jitter is of interest. Thermal and shot noise may be considered as white noise since
Shown in Table II and plotted in Fig. 13 is the jitter of three-stage their cutoff frequencies are typically much higher than the oscillation
and six-stage oscillators designed for a frequency of 500 MHz with frequency. White noise in the feedback loop of the oscillator results
constant tail current and voltage swings. We note that the minimum in phase diffusion, a phenomenon described by a Wiener process
values of cycle jitter and cycle-to-cycle jitter are smaller in a three- [11]. Extensive investigations of phase noise indicate that white noise
stage topology. This is because for the three-stage oscillator, the sources in all types of oscillators give rise to a phase noise power
reduction of the oscillation frequency to the desired value is obtained 1 (1 ) 1
spectrum proportional to = ! 2 , where ! is the offset frequency
by means of the fixed capacitances CL rather than by the voltage- with respect to the carrier frequency [4], [5]. This trend is valid for
dependent capacitances of the transistors. Hence, a smaller fraction offset frequencies as high as several percent of the carrier frequency.
of the total load capacitance is subject to variations with supply and 1 ()
Thus, frequency noise, ! t , can be assumed white in such a band.
substrate noise. 1 ()
The autocorrelation of ! t is given by
The addition of a fixed capacitor to each stage nonetheless entails
the issue of substrate noise coupling to the bottom plate of the
1!(t +  )1!(t) = 2D( ) (21)
capacitor. In order to minimize this effect, a grounded shield must where D is the diffusivity and  ( ) the Delta function. The
isolate the capacitor from the substrate, as illustrated in Fig. 14. Here, probability density of (t) represents a Gaussian distribution centered
62 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

at  (0) with the variance On the other hand, the phase noise can be expressed by the cycle-to-
2 = 2D t:
cycle jitter by inserting (32) in (25), yielding
(22)
As evident from (22), the variance diverges with time. The autocor-
!03 =4 1Tcc2
()
relation of V t is known [12] and reads S = :
(! 0 !0)2 + !03=8 1Tcc4
2 (35)
2
hV (t +  )V (t)i = V20 exp(0Dj j)cos(!0  ): (23)

Performing the Fourier transformation, we obtain the one-sided power A similar expression has been derived for ring oscillators in [10] and
spectral density reads in our notation

D
SV (!) = V02 f03 1Tc2
(! 0 !0)2 + D2 : (24)
S =
(f 0 f0 )2 (36)
This quantity is often normalized to V02 =2 and referred to as relative
phase noise with respect to the carrier [5] or as single-sideband phase
where f0 = 2
!0 = . Equation (36) turns out to be a special case of
(35) for ! 0 !0  D .
noise [8], given by

S (!) =
2D
(! 0 !0 )2 + D2 : (25) The absolute jitter increases proportionally to the square root of the
1
measurement interval t as evident from (22). Hence, the absolute
For ! 0 !0  D , we obtain from (25) phase jitter is

S (!) 
2D p
(! 0 !0 )2 : (26)
1abs = 2D 1t =  1t: (37)
Next, we will relate the cycle-to-cycle jitter and the single-sideband
phase noise to each other. Note that the stationary Wiener process Using (32), the proportionality constant  can be related to the
has no memory and the increments in different time intervals are cycle-to-cycle jitter according to
statistically independent [11]. Therefore, the rms mean increment of
()
the excess phase  t within one cycle, i.e., the cycle jitter of the p
()
phase, equals the increment of  t between t =0 =
and t T . Thus, = 2D = 2f03=2 1Tcc : (38)
from (22), we obtain the phase cycle jitter as

1c = 2DT: (27) REFERENCES

The excess phase change during the nth cycle is referred to as 1n . [1] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator
The nth oscillation period is defined by the relation with 5 to 110 MHz of lock range for microprocessors,” IEEE J. Solid-
State Circuits, vol. 27, pp. 1599–1607, Nov. 1992.
2f0Tn = 2 + 1n : (28) [2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, “A wide-
bandwidth low-voltage PLL for powerPCTM microprocessors,” IEEE
For the deviation of the nth period Tn from the mean period J. Solid-State Circuits, vol. 30, pp. 383–391, Apr. 1995.
T =1
=f0 , we then find [3] R. Bhagwan and A. Rogers, “A 1 GHz dual-loop microprocessor PLL


with instant frequency shifting,” in IEEE Proc. ISSCC, San Francisco,
1Tn = 21fn0 = 1n 2T : (29)
CA, Feb. 1997, pp. 336–337.
[4] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,”
in Proc. IEEE, pp. 329–330, Feb. 1966.
Hence, the cycle jitter 1Tc of the period during one cycle is related [5] B. Razavi, “A study of phase noise in CMOS oscillators,” IEEE J.
to 1c according to Solid-State Circuits, vol. 31, pp. 331–343, Mar. 1996.

 [6] T. Kwasniewski et al., “Inductorless oscillator design for personal


communications devices—A 1.2 m CMOS process case study,” in
1Tc = 1c 2T : (30) Proc. CICC, May 1995, pp. 327–330.
[7] F. X. Kärtner, “Analysis of white and f 0 noise in oscillators,” Int. J.
For white noise sources, two successive periods are uncorrelated. Circuits Theory, Appl., vol. 18, pp. 485–519, Sept. 1990.
Since cycle-to-cycle jitter represents the difference between two [8] W. Anzill and P. Russer, “A general method to simulate noise in oscil-
periods, the variance of cycle-to-cycle jitter is twice as large as the lators based on frequency domain techniques,” IEEE Trans. Microwave
Theory Tech., vol. 41, pp. 2256–2263, Dec. 1993.
variance of one period, yielding
p [9] J. A. McNeill, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits,
1Tcc = 21Tc : (31) vol. 32, pp. 870–879, June 1997.
[10] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in
Combining (27), (30), and (31), we obtain CMOS ring oscillators,” in Proc. IEEE Int. Symp. Circuits and Systems

1Tcc2 = 8!3 D
(ISCAS’94), London, U.K., June 1994, vol. 4, pp. 27–30.
[11] C. W. Gardiner, Handbook of Stochastic Methods. Berlin: Springer-
(32)
0 Verlag, 1983.
[12] R. L. Stratonovich, Topics in the Theory of Random Noise. New York:
with Gordon and Breach, 1967.

!0 =
2 :
T
(33)

The cycle-to-cycle jitter can now be expressed in terms of the single-


sideband phase noise by inserting (32) in (26) to give

1Tcc2  4!3 S (!)(! 0 !0 )2 : (34)


0
204 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

A 2-V 900-MHz Monolithic CMOS Dual-Loop


Frequency Synthesizer for GSM Receivers
William S. T. Yan and Howard C. Luong, Member, IEEE

Abstract—A 900-MHz monolithic CMOS dual-loop frequency


synthesizer suitable for GSM receivers is presented. Implemented
in a 0.5- m CMOS technology and at a 2-V supply voltage, the
dual-loop frequency synthesizer occupies a chip area of 2.64 mm2
and consumes a low power of 34 mW. The measured phase noise of
the synthesizer is 121.8 dBc/Hz at 600-kHz offset, and the mea-
sured spurious levels are 79.5 and 82.0 dBc at 1.6 and 11.3 MHz
offset, respectively.
Index Terms—Frequency synthesis, frequency synthesizer,
phase-locked loop, radio frequency, voltage-controlled oscillator.

I. INTRODUCTION Fig. 1. Block diagram of the GSM-receiver front-end.

M ODERN transceivers for wireless communication con-


sist of low-noise amplifiers, power amplifiers, mixers,
DSP chips, filters, and frequency synthesizers. These building
circuit implementation of critical building blocks is discussed.
Section V presents the measurement results of the synthesizer
including its phase noise, spurious level, and switching time
blocks have been realized using hybrid technologies and require
of the frequency synthesizer together with a comprehensive
interfacing circuits, which increases the power consumption and
performance evaluation.
limits the maximum operating speed of the transceivers. For
this reason, it has become increasingly attractive to design and
II. DESIGN SPECIFICATION
monolithically integrate all these building blocks on a single
chip. The performance of frequency synthesizers is mainly speci-
Designing fully integrated frequency synthesizers for this in- fied by their output frequency, phase noise, spurious level, and
tegration is always desirable but most challenging. The first switching time. This section derives the specifications of a fre-
requirement is to achieve high-frequency operation with rea- quency synthesizer for GSM receivers.
sonable power consumption. However, the most critical chal-
lenges for the frequency synthesizer are the phase-noise and A. Output Frequency
spurious-level performance. Finally, small chip area is essential In GSM-900 systems, the receiver-channel frequencies are
to monolithic system integration. expressed as follows:
In recent years, monolithic frequency synthesizers with
good phase-noise performance have been reported [1]–[3]. MHz (1)
However, those designs operate at supply voltages of at least
2.7 V and power consumption of more than 50 mW. Moreover, where is the channel number. To receive
fractional- frequency synthesizers suffer from fractional signals in different channels, a GSM-receiver front-end, shown
spurs which degrade their spurious-tone performance. in Fig. 1, is adopted. The receiver front-end consists of a low-
This paper presents a monolithic dual-loop frequency noise amplifier (LNA) and an RF filter for filtering out-of-band
synthesizer for GSM 900 system, which is implemented in a noise and blocking signals. The received signal is then mixed
0.5- m CMOS process, that achieves high operating frequency down to an IF frequency ( ) of 70 MHz for base-band signal
(935.2–959.8 MHz), low power consumption (34 mW), low processing. To extract information from the desired channel, the
phase noise ( 121.8 dBc/Hz at 600kHz), low spurious level local oscillator (LO) output frequency ( ) of the frequency
( 82.0 dBc at 11.3MHz), and fast switching time (830 s). synthesizer is changed accordingly, as follows:
Section II derives the design specification of the frequency – MHz (2)
synthesizer for GSM 900. Section III describes the archi-
tecture for the proposed dual-loop design. In Section IV, which is the output-frequency range of the frequency synthe-
sizer to be achieved.
Manuscript received December 29, 1999; revised October 5, 2000.
The authors are with the Department of Electrical and Electronic Engineering, B. Phase Noise
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,
Hong Kong (e-mail: eetak@ee.ust.hk; eeluong@ee.ust.hk). The blocking-signal specification for GSM 900 receivers is
Publisher Item Identifier S 0018-9200(01)00927-1. shown in Fig. 2, where the desired signal power can be as low
0018–9200/01$10.00 © 2001 IEEE
YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER 205

Fig. 2. SNR degradation due to the phase noise and spurious level.

as 102 dBm. At 600-kHz offset frequency, the power of the


blocking signal can be as high as 43 dBm [4]. With a correct
LO frequency, the desired channel signal is downconverted to IF
frequency. However, blocking signals are also downconverted
with the LO signal and its phase noise. Since the power of the
blocking signal is much larger than that of the desired signal,
the phase-noise power falls into the IF frequency and degrades
the signal to noise ratio (SNR). The phase-noise specification
can be expressed as follows:

SNR Fig. 3. GSM 900 receive and transmit time.

dBc/Hz at 600 kHz (3)


respectively. For system monitoring purposes, a time slot be-
where SNR of 9 dB is the SNR specification for the whole tween slot #6 and slot #7 is adopted. Therefore, the most critical
receiver, and dBm and dBm are switching time is from the transmission period (slot #4) to the
the power levels of the minimum desired signal and maximum system monitoring period (slot #6.5), which is equal to 865 s.
blocking signal, respectively. However, to take care of the settling time of the other compo-
nents, the switching time of the frequency synthesizer is recom-
C. Spurious Tones mended to be kept within one time slot (577 s) [5].

Because of the feedthrough and modulation of the reference III. DUAL-LOOP DESIGN
signal, two spurious tones appears at the away from the
desired output frequency, as shown in Fig. 2. The derivation of To reduce the switching time and the chip area of a synthe-
the spurious-tone specification is similar to that of the phase sizer, a high loop bandwidth and a high reference frequency
noise except that the channel bandwidth is not considered in this are desired. Moreover, to suppress the phase-noise contribu-
case. The spurious-tone specification can be expressed as tion of the reference signal and improve frequency-divider com-
follows: plexity, a lower frequency-division ratio is desirable. Therefore,
a dual-loop frequency synthesizer is proposed [6]. As shown in
SNR Fig. 4, the dual-loop design consists of two reference signals and
dBc at 1.6 MHz two phase-locked loops (PLLs) in cascade configuration. In the
(4) feedback path of the high-frequency loop, a mixer is adopted to
dBc for offset MHz.
provide the frequency shift. The output frequency of the synthe-
sizer is expressed as follows:
D. Switching Time
In GSM 900 systems, time-division multiple-access (TDMA) (5)
is adopted within each frequency channel. As shown in Fig. 3,
each frequency channel is divided into eight time slots. Sig- where and are frequencies of the two reference
nals are received and transmitted in time slot #1 and slot #4, signals, and , and are frequency division ratios.
206 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 4. Proposed dual-loop frequency synthesizer.

Due to the dual-loop architecture, the comparison frequen- IV. CIRCUIT IMPLEMENTATION
cies of the low-frequency and high-frequency loops are scaled This section discusses the design consideration and circuit
up from 200 kHz to 1.6 and 11.3 MHz, respectively. Therefore, implementation of the major building blocks that are unique and
the loop bandwidths of both PLLs can be increased so that the
critical to the proposed dual-loop synthesizer, namely the two
switching time and the chip area can be reduced. Compared VCOs, the frequency dividers, the charge pump, and the loop
to single-loop integer- designs, the frequency-division ratio filters. Detailed analysis and design of other building blocks will
of the programmable divider is reduced from 4236–4449
not be presented, either because they can be found somewhere
to 226–349. Such a reduction in the division ratio significantly
else or they are too obvious.
simplifies the frequency-divider design and reduces phase-noise
contribution of the input reference signal. A. Ring Oscillator VCO1
In the proposed dual-loop synthesizer, the divide-by-32 di-
vider and the high-frequency loop together greatly attenu- The schematic of the proposed two-stage ring oscillator and
ates the phase noise and the spurious tones of the low-frequency its delay cell to meet the required specification as described in
loop. As such, the low-frequency loop can be designed to have Section III are shown in Fig. 5(a) and (b), respectively. The delay
a larger loop bandwidth and a loop filter as small as one-fifth cell consists of nMOS transistors as input transconductors,
of the loop filter in the high-frequency loop. The low-frequency cross-coupled pMOS transistors for maintaining oscilla-
loop requires additional components, including the phase-fre- tion, diode-connected pMOS transistors , and a bias tran-
quency detector (PFD1), the charge pump (CP1), and the fre- sistor for frequency tuning. The source nodes of transistors
quency divider , but they are all quite small and have very are connected to supply to maximize its output amplitude
little impact on the chip area. In additional, VCO1 is imple- , which also helps suppress noise sources by turning them off
mented by a ring oscillator, which occupies a much smaller chip more often [7] and thus further enhances the phase-noise per-
area compared to VCO2. Altogether, the dual-loop design re- formance.
quires no more than 25% overhead in the chip area compared to The half circuit of the delay cell is shown in Fig. 5(c). By
a fraction- design with the same loop bandwidth. equating the delay-cell voltage gain to be unity, the oscillating
Although the input-reference frequency of the frequency of the ring oscillator can be expressed as follows:
low-frequency loop is scaled up by 8 times; the required
frequency range of the oscillator VCO1 in the low-frequency
loop is also scaled up from 25 to 200 MHz. On the other
hand, the phase-noise of the ring oscillator is attenuated by the (6)
frequency divider and is then amplified by the high-fre-
quency loop; the total phase-noise attenuation from VCO1 where is transconductance, is channel conductance, and
output to the synthesizer output is 18 dB. Consequently, this is the total capacitance at output node. Oscillation starts
voltage-controlled oscillator (VCO) requires a high operating when the is large enough to overcome the output load
frequency (600 MHz), a wide frequency range (200 MHz), and .
a low phase noise ( 103 dBc/Hz at 600 kHz). A novel ring When control voltage V, transistors are turned
VCO design that meets all of these tough specifications will be on to cancel , and the oscillator operates at maximum fre-
presented in the next section. quency. When control voltage , transistor are
YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER 207

Fig. 5. Circuit implementation of the ring oscillator VCO1. (a) Ring oscillator. (b) Delay cell. (c) Half circuit of delay cell.

turned off , and the oscillator operates at minimum


frequency. By (6), , and frequency range are
expressed as follows:

(7)

Since is proportional to , nMOS transistors


are adopted as the input devices to minimize power con-
sumption. From (7), 50% tuning range can be achieved when
.
Based on the approximate impulse-stimulus function (ISF)
and the analysis presented in [8], the phase noise of the oscil-
lator is estimated to be approximately 107 dBc/Hz at 600-kHz Fig. 6. Circuit implementation of the LC oscillator VCO2.
offset. On the other hand, using SpectreRF [9], the phase noise
is simulated to be 111.7 dBc/Hz at 600 kHz. sate the LC tank too much (only twice) to reduce phase-noise
contribution by transistors .
B. LC Oscillator VCO2 Based on the method described in [11], the phase noise of
As the far-offset phase noise is dominated by the VCO2, an the LC VCO is estimated to be 124.0 dBc/Hz at 600-kHz
LC oscillator is adopted to meet the stringent phase-noise speci- frequency offset, which agrees well with the simulation using
fication. Fig. 6 shows the schematic of the LC oscillator. Cross- SpectreRF.
coupled transistors are used to start and to maintain oscil-
C. Frequency Dividers
lation with lower parasitics. PN-junction varactors implemented
by p diffusion on the n-well are used for frequency-tuning pur- As the divide-by-4 frequency divider needs to convert
pose. The common-mode output voltage is designed at 1.1 V sinusoidal signals from the VCO2 output into square-wave
to enhance the driving of the frequency divider . To reduce signals, the first stage of the divider is implemented by
phase-noise contribution due to flicker noise, pMOS transistors pseudo-nMOS logic while the second divide-by-2 divider is
are used as the current source. implemented by the TSPC-logic divide-by-2 divider [12]. The
To design an LC oscillator which satisfies the phase-noise re- first divider is shown in Fig. 7 and consists of a pseudo-nMOS
quirement with minimum power consumption, inductors with amplifier and a divide-by-2 divider. Since the pseudo-nMOS
large inductance and small series resistance are desired. There- logic is a ratioed logic, the ratio between pMOS and nMOS
fore, two-layer inductors are adopted [10] for which the induc- transistors is designed to be less than 1.6 to make sure output
tance and the quality factor can be scaled up by 4 and 2 times, logic “0” turns off the next stage.
respectively. For the same reason, pn-junction varactors are in-
terdigitized with p islands surrounded by n-well contacts to D. Programmable-Frequency Divider
enhance the quality factor. Finally, the transconductance Fig. 8 shows the block diagram of the programmable-fre-
of transistor is designed so that it does not overcompen- quency divider [13]. At reset state, the prescaler divides
208 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

TABLE I
SYSTEM DESIGN OF PROGRAMMABLE-FREQUENCY DIVIDER N

control signal “1,” the gated inverter is by-passed,


and the prescaler is a divide-by-12 divider. When control signal
“0,” the final state “010” will be de-
tected, at which “0” and the input signal is delayed by
Fig. 7. Circuit implementation of the pseudo-nMOS divide-by-2 frequency one clock cycle. Thus the function of divide-by-13 is achieved.
divider.
The back-carrier-propagation approach allows low-frequency
signals (more significant bits) to switch to the final state much
earlier than high-frequency signals (less significant bits) and
thus reduces power consumption for a given speed.

E. Charge Pumps and Loop Filters


Fig. 10 shows the circuit implementation of the charge
pumps used in the two loops. Each charge pump consists of
two cascode-current sources for both the pull-up and pull-down
currents, four complementary switches, and a unity-gain
amplifier. By using high-swing cascode current sources, the
output impedance is increased for effective current injection.
Minimum-size complementary switches are adopted to mini-
mize clock feedthrough and charge injection of the switches.
Fig. 8. Block diagram of the programmable-frequency divider N . The unity-gain amplifier keeps the voltages of nodes VCO and
to be equal so that charge sharing between nodes VCO, ,
input signal by , and its output is counted by both and and can be minimized [15].
counters. After the counter has counted pulses, the The design of the loop filters in the two PLLs is a second-
counter changes the state of the modulus control line and order low-pass filter which is implemented using linear capac-
the prescaler divides input by . Then the counter counts itors and silicide-blocked polysilicon resistors. The values of
the remaining – cycles to reach overflow. As a whole, the capacitance, resistance, and charge-pump current are optimally
programmable divider generates one complete cycle for every designed to satisfy simultaneously the phase-noise, spurious-
input cycles. The operation tone, and switching-time requirements with minimum chip area
repeats after the counter is reset. [16]. The loop bandwidth of the low-frequency and high-fre-
As the frequency-division ratio (226 349) can be achieved quency loops are 40 and 27 kHz, respectively.
with different combinations of , , and , the most optimal
combination in terms of performance needs to be identified and F. Phase Noise of the Dual-Loop Frequency Synthesizer
chosen as the design. As the counter must finish before the Based on the linearized model shown in Fig. 11, the transfer
counter resets it, the division ratio of the counter should function from the input phase to output phase of both
be larger than that of the counter. To optimize the power con- the low-frequency and high-frequency loops can be expressed
sumption, the operating frequencies and number of bits of both as follows:
the and counters should be minimized.
Table I shows the different combinations of , , and
which can implement the desired division ratio. Case 1 requires
the highest operating frequencies and number of bits for and
counters, so it is not adopted. Case 4 has the problem that
the value is larger than the value. It seems that Case 3 is
the best one, but Case 2 is chosen as the final design because it
is much easier to implement an asynchronous divide-by-12 fre- (8)
quency divider than a divide-by-14 divider.
The dual-modulus prescaler is implemented by the back-car- where is the phase-detector gain, is the VCO gain,
rier-propagation approach as shown in Fig. 9 [14]. When and , and are the total capacitance, zero time constant,
YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER 209

Fig. 9. Circuit implementation of the dual-modulus prescaler.

Fig. 10. Circuit implementation of the charge pump and the loop filter.

and pole time constant of the corresponding loop filters, respec- charge-pump noise current to the output phase noise can be
tively. is the phase-detector gain, is the VCO gain, derived to be
and , and are the total capacitance, zero time constant,
and pole time constant of the corresponding loop filters, respec-
tively. Since the transfer function is a low-pass function, the
reference phase noise is highly attenuated at high offset fre-
quency. It also shows that the close-in phase noise of the ref-
erence signals is amplified by the frequency-division ratio. In
this work, the division ratio is reduced from 4449 to 349, and
the phase-noise contribution from the reference signals is sup-
pressed by dB.
Another important source of the close-in phase noise is
the charge-pump noise. The transfer functions between the (9)
210 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 11. Linearized model of the dual-loop frequency synthesizer.

It shows that small frequency-division ratio and large phase- which are high-pass functions. Therefore, the far-offset phase
detector gain or large charge-pump current are preferred for noise of the synthesizer is dominated by the VCO phase noise.
phase-noise consideration. Another factor not included in (9) Since the loop bandwidths of both PLLs are designed in a range
which also affects the charge-pump noise is its turn-on time. of tens of kilohertz to achieve spurious-level specification, the
In this proposed synthesizer, the charge-pump turn-on time is far-offset phase noise of the PLLs only depends on the VCO
designed to be equal to 1/10 of the input period so that it is phase noise itself.
long enough to eliminate the phase-frequency detector (PFD) To evaluate the overall phase-noise performance, the relation-
dead-zone problem, but at the same time is short enough to min- ship between the phase noise of the low-frequency loop and that
imize the charge-pump phase-noise contribution. of the synthesizer output can be written as
The phase-noise contribution of the loop-filter resistors in
both PLLs can be estimated using their equivalent noise cur-
rents as follows:

(12)
(10)
which shows that there exists dB close-in
which are bandpass functions with peaks appearing between the phase-noise suppression for the low-frequency loop.
zero and the pole of the loop filter. To suppress the phase-noise The estimated phase noise of the whole synthesizer is
peaking, large loop-filter capacitors are desired at the cost of 81.4 dBc/Hz at 20.9 kHz and 123.8 dBc/Hz at 600 kHz.
large chip area. The contribution of each component is shown in Fig. 12, which
For the phase-noise contribution of the VCOs, the transfer shows that the close-in phase noise ( 100 kHz) is dominated
function between the VCO phase noise and output phase noise by the charge pump CP1 and loop filter LF1, while the far-offset
can be found to be phase noise ( 100 kHz) is dominated by the LC oscillator.

V. EXPERIMENTAL RESULTS
The dual-loop frequency synthesizer is implemented in a
standard 0.5- m CMOS technology. Linear capacitors are put
(11) under all the bias pins to serve as on-chip bypass capacitors.
YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER 211

B. Measurement of Varactors
The measurement results of the pn-junction varactor at
900 MHz are shown in Fig. 15. As the p diffusion of the
varactors used in the LC oscillator are connected to the output
of the LC oscillator core, they are biased at 1.16 V, which is
the dc bias of the oscillator core during the measurement. The
measured capacitance is close to the estimated results in the
reverse-biased region. The series resistance is around 2
due to the minimum junction spacing and the nonminimum
junction width. The quality factor is around 30 in the operating
region of the oscillator.

C. Measurement of Ring Oscillator VCO1


The phase noise of the oscillators are measured by a direct-
phase-noise measurement [17]. First, the carrier power is deter-
mined at large video (VBW) and resolution bandwidths (RBW).
Then, the resolution bandwidth is reduced until the noise edges
and not the envelope of the resolution filter are displayed. Fi-
Fig. 12. Estimated phase noise of the whole dual-loop frequency synthesizer
and contribution of each components at the synthesizer output.
nally, the phase noise is measured at the corresponding fre-
quency offset from the carrier. To make sure that the measured
phase noise is valid, the displayed values must be at least 10 dB
above the intrinsic noise of the analyzer.
Fig. 16 shows the measurement results of the ring oscillator
VCO1. The operating frequency is measured to be between
324.0 and 642.2 MHz, over which the measured phase noise
is between 111 and 108 dBc/Hz at 600 kHz. The power
consumption is around 10 mW.

D. Measurement of LC Oscillator VCO2


Fig. 17 shows the measurement results of the LC oscillator
VCO2. Due to the quality-factor degradation of the spiral in-
ductor, the bias current of the oscillator is increased by 15%
above its designed value to achieve the phase noise specifica-
tion ( 121 dBc/Hz at 600 kHz). The measured operating fre-
quency range is between 725.0 and 940.5 MHz. The oscillation
stops when the VCO control voltage is below 0.6 V because the
varactors become forward-biased. Over the desired frequency
Fig. 13. Die photo of the dual-loop frequency synthesizer. range between 865.2 and 889.8 MHz, the achieved phase noise
is below 121 dBc/Hz at 600 kHz.
Fig. 13 shows the die photo of the dual-loop frequency synthe-
sizer, and the active area of the synthesizer is 2.64 mm . E. Measured Phase Noise of the Frequency Synthesizer
For characterization and measurement of passive devices, Fig. 18 shows the phase-noise measurement results of the
testing structures for spiral inductors and varactors are included dual-loop frequency synthesizer at 889.8 MHz. The measured
on the same die with the synthesizer and are measured by a phase noise is 121.8 dBc/Hz at 600 kHz which satisfies
network analyzer. To de-embed the probing-pad parasitics, an the GSM requirement. At offset frequencies between 10 and
open-pad structure is also measured. 100 Hz, the phase noise is mainly contributed by the flicker
noise of the charge pump. However, the peak phase noise of
A. Measurement of Inductors 65.67 dBc/Hz at 15 kHz is measured, which is 15 dB higher
Fig. 14 shows the inductance , series resistance , and than the estimation presented in Fig. 12.
quality factor of the on-chip spiral inductor. The measured At offset frequencies above 100 kHz, where the phase
inductance is close to simulation results and drops at frequen- noise should be dominated by VCO2 and should go down
cies close to the self-resonant frequency. However, the series by 20 dB/dec, the measured phase noise goes down at a
resistance (30.2 ) is almost three times larger than the ex- rate of 40 dB/dec. It is believed that the increase in the
pected value (11.6 ). The increase in series resistance is mainly close-in phase noise is mainly due to the charge pump CP2
caused by eddy current induced within substrate and n-well fin- and the loop filter LF2, for the following reasons. First,
gers [16]. As series resistance increases significantly, the port-1 according to Fig. 12, only CP2 and LF2 have the phase-noise
quality factor is limited to be 1.6 at 900 MHz. slope of 40 dB/dec above 100-kHz frequency offset. Second,
212 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 14. Measurement results and equivalent circuit model of the spiral inductors at 900 MHz.

Fig. 15. Measurement results and bias condition of the pn-junction varactors at 900 MHz.

the measured peak-to-flat close-in phase noise in Fig. 18 is F. Measured Spurious Tones of the Frequency Synthesizer
around 15 dB, which is quite close to that of the estimated Fig. 19 shows the measured spurious level of the dual-loop
value in Fig. 12. Lastly, it is observed experimentally that the frequency synthesizer at 865.2 MHz, which are 79.5 dBc at
close-in phase noise is changed as the charge-pump current 1.6 MHz, 82.0 dBc at 11.3 MHz, and 82.83 dBc at 16 MHz.
of the high-frequency loop is adjusted. Unfortunately, the At 11.3 MHz, the spurious level is only 6 dB above the require-
phase-noise contribution of CP2 and LF2 cannot be measured ment. However, the predicted spurious level at 1.6 MHz should
individually. be below 90 dBc and the one at 16 MHz should not exist [16].
YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER 213

Fig. 16. Measurement results of the ring oscillator VCO1.

Fig. 19. Measured spurious level of the proposed synthesizer.

G. Switching Time of the Frequency Synthesizer

To determine the worst-case switching time of the frequency


synthesizer, the frequency division ratio is switched from
226 to 349, and the control voltages of both VCO1 and VCO2
are measured. As shown in Fig. 20, the measurable switching
time of the proposed synthesizer is 830 s for a frequency
error of approximately 10 kHz due to the limited resolution of
our oscilloscope. Since the VCO gain is 160 MHz/V, in order
to achieve a measurement accuracy of a 100-Hz frequency
error, the resolution of the oscilloscope would need to be better
Fig. 17. Measurement results of the LC oscillator VCO2.
than 100 nV, which unfortunately is not obtainable with our
equipment. On the other hand, the synthesizer suffers from a
slew-rate problem during the channel switching due to a small
charge-pump current (1.6 A) and a large loop-filter capacitor
(1.1 nF) in the high-frequency loop.

H. Performance Evaluation

Table II summarizes the measured performance of the pro-


posed frequency synthesizer, and Table III lists the performance
of other fully integrated synthesizers for comparison. The pro-
posed synthesizer operates at a single 2-V supply while all other
designs require supply voltages of at least 2.7 V. Note that since
the designs [1] and [3] operate at higher frequencies, their power
consumption should be scaled down accordingly for a fair com-
parison. With this frequency normalization, the power consump-
tion of the proposed dual-loop synthesizer is still comparable to
that of the other designs.
Fig. 18. Phase-noise measurement results of the proposed synthesizer.
The synthesizer presented in [1] is a fractional- design
with a 26.6-MHz comparison frequency and a loop bandwidth
During the measurement, the 1.6-MHz reference signal is of 45 kHz. As comparison, the low-frequency loop of this
generated by a 16-MHz crystal oscillator and a decade counter. work has a comparison frequency of only 1.6 MHz but a loop
Therefore, the 16-MHz spur is caused by the substrate coupling bandwidth of up to 40 kHz because of the relaxed requirement
between the crystal oscillator and the synthesizer. To verify the of the spurious tones. On the other hand, the high-frequency
reason of the increased spurious level at 1.6 MHz, the low-fre- loop uses a 11.3-MHz comparison frequency but a loop
quency loop is disabled, and the spurious level is still 75.1 dBc bandwidth limited to 27 kHz, which is the real limiting factor
at 1.6 MHz, which implies that the increase in spurious level at of the switching-time performance. Therefore, it is believed
1.6 MHz is mainly caused by the substrate coupling. that a switching time close to that in [1] can be achieved by
214 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 20. Switching-time measurement results of the proposed synthesizer.

TABLE II respectively. For the same reason, the total loop-filter capaci-
PERFORMANCE SUMMARY OF THE PROPOSED SYNTHESIZER tance can be smaller than 60 pF, which greatly reduces chip area.
However, the situation would be much different if channel pro-
grammability is included.
Although the proposed synthesizer consists of two loop fil-
ters, but the chip area is just a little bit larger than that of the
design in [3] due to the use of linear capacitors and silicide-
blocked resistors. Compared to the designs in [1] and [3], the
spurious levels are between 75 and 85 dBc, which indicates
that CMOS designs suffer the same problem from the substrate
coupling between the reference signal and VCO. However, for
the close-in phase noise, the proposed dual-loop synthesizer suf-
fers from the 15-dB increase at the peak due to the charge pumps
and loop filters as discussed in Section V-E.

I. Generation of the Second Reference Sources


eliminating the slew-limiting problem in the high-frequency The main drawback of our dual-loop synthesizer is that it
loop. requires two reference sources. In reality, if a single reference
The work described in [2] is an integer- bipolar junction signal is preferred for the whole frequency synthesizer, the
transistor (BJT) design with channel spacing of 600 kHz, and second reference signal (204.8 MHz instead of 205 MHz) can
its comparison frequency and loop bandwidth are limited to be be generated from the 1.6-MHz reference signal by a third
600 and 4 kHz, respectively. With such a low bandwidth, the PLL with frequency division ratio of 128. Since this third PLL
settling time is still less than 600 s. It implies that if a larger has a fixed division frequency, its frequency divider can be
charge-pump current or an active loop filter is adopted in the implemented simply by cascading seven divide-by-2 dividers.
high-frequency loop of the dual-loop design, slew limiting can Without the divide-by-32 divider between the third PLL
be suppressed and the switching time performance can be en- output and the high-frequency loop, the close-in phase-noise
hanced. Since the density of the BJT transistors is not as good requirement of this PLL would be in fact 30 dB more stringent
as the CMOS transistor, the chip area is relatively large even ( 92 dBc/Hz) than that of the low-frequency loop. However,
though it does not include an on-chip loop filter. since the division ratio is only 128, it would offer a relaxation in
The synthesizer in [3] is also an integer- design but without requirement. The remaining 21.3 dB phase-noise suppression
channel programmability, and as such, its comparison frequency could be achieved by increasing the charge-pump current of
and loop bandwidth can be as high as 61.5 MHz and 200 kHz, the third PLL.
YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER 215

TABLE III
PERFORMANCE COMPARISON OF RECENT WORK ON FULLY INTEGRATED FREQUENCY SYNTHESIZERS

In addition to the close-in phase noise, the far-offset phase- and the spurious level is 82 dBc at 11.3 MHz. Due to the
noise requirement would also be more stringent by the same substrate coupling and testing setup, additional spurious levels
amount. Assuming that VCO1 is adopted in the third PLL, its are measured to be 79.5 dBc at 1.6 MHz and 82.8 dBc
phase noise could be improved to be 116 dBc/Hz at 600 kHz at 16 MHz. The chip area is less than 2.64 mm . Even if a
at a 204-MHz operation. With a 30.6-dB filtering effect of the third PLL is implemented to generate the second reference
high-frequency loop, the phase-noise contribution by the VCO frequency, the increase in the total power consumption and the
of the third PLL would become 146.6 dBc/Hz at 600 kHz, total chip area would be negligibly small.
which would have negligible effect on the overall phase noise.
Basically, the implementation of the third PLL would be sim- REFERENCES
ilar to that of the low-frequency loop, and its chip area would be [1] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800
less than 10% of the total area of the dual-loop synthesizer. Since frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp.
the third VCO operates at half the frequency of VCO1, its power 2054–2065, Dec. 1998.
[2] A. Ali and J. L. Tham, “A 900-MHz frequency synthesizer with in-
consumption would be 25% of that of VCO1 ( 2.5 mW). Simi- tegrated LC voltage-controlled oscillator,” Proc. IEEE Int. Solid-Stage
larly, since the divide-by-128 divider also operates at half of the Circuits Conf., vol. 1, pp. 390–391, 1996.
frequency, it would consume only half the power as compared [3] J. F. Parker and D. Ray, “A 1.6-GHz CMOS PLL with on-chip loop
filter,” IEEE J. Solid-State Circuits, vol. 33, pp. 337–343, Mar. 1998.
to the divider ( 0.3 mW). Although the charge-pump cur- +
[4] “Digital cellular telecommunications system (Phase 2 ); Radio trans-
rent should be increased by 100 times to 640 A, the average mission and reception (GSM 5.05),” European Telecommunications
power current would be only 64 A since the turn-on time is Standards Institute, 1996.
[5] D. Craninckx and D. Steyaert, Wireless CMOS Frequency Synthesizer
only around 1/10 of the input period. In conclusion, by intro- Design. Norwell, MA: Kluwer, 1998, pp. 201–202.
ducing the third PLL to generate the second reference signal, [6] T. Aytur and J. Khoury, “Advantages of dual-loop frequency synthe-
the additional power required would be less than 3 mW, and the sizers for GSM applications,” in Proc. IEEE Int. Symp. Circuits and Sys-
tems, 1997.
increase in the total chip area would still be less than 10%. [7] C. H. Park and B. Kim, “A low-noise 900-MHz VCO in 0.6-m CMOS,”
in Proc. Symp. VLSI Circuits, 1998.
[8] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Phase noise in multi-giga-
VI. CONCLUSION hertz CMOS ring oscillators,” in Proc. IEEE 1998 Custom Integrated
A 900-MHz monolithic CMOS dual-loop frequency synthe- Circuit Conf., 1998, pp. 49–52.
[9] “Oscillator noise analysis in SpectreRF, application note to SpectreRF,”
sizer with good phase-noise performance for GSM receivers is CADENCE, 1998.
presented. Compared to other fully integrated synthesizer de- [10] R. B. Merrill, T. W. Lee, H. You, R. Rasmussen, and L. A. Moberly, “Op-
signs, this proposed synthesizer operates at much lower supply timization of high-Q integrated inductors for multilevel metal CMOS,”
in Proc. Int. Electronic Device Meeting, 1995, pp. 983–986.
voltage and consumes approximately the same power with [11] A. Hajimiri and T. H. Lee, “A general theory of phase noise in electrical
frequency normalization. Implemented in a standard 0.5- m oscillators,” IEEE J. Solid-State Circuits, pp. 179–194, Feb. 1998.
CMOS technology and at 2-V supply voltage, the synthesizer [12] J. Yuan and C. Svenson, “High-speed CMOS circuit technique,” IEEE
J. Solid-State Circuits, vol. 24, pp. 62–70, Feb. 1989.
has a power consumption of 34 mW. At 900 MHz, the measured [13] B. Razavi, RF Microelectronics. Englewood Cliffs, NJ: Prentice Hall,
phase noise is 121.8 dBc/Hz at 600-kHz frequency offset, 1997.
216 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

[14] P. Larsson, “High-speed architecture for a programmable frequency di- Howard C. Luong (M’91) received the B.S. (high
vider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. honors), M.S., and Ph.D. degrees in electrical engi-
31, pp. 744–748, May 1996. neering and computer sciences from the University
[15] I. A. Young, J. K. Greason, and K. L. Wong, “PLL clock generator with of California, Berkeley, in 1988, 1990, and 1994,
5 to 110 MHz of lock range for microprocessors,” IEEE J. Solid-State respectively. For his Master’s thesis, he worked on
Circuits, vol. 27, pp. 1599–1607, Nov. 1992. MOS analog multipliers with scaling technologies.
[16] W. S. T. Yan, “A 2-V 900-MHz monolithic CMOS dual-loop For his Ph.D. dissertation, he designed and fabri-
frequency synthesizer for GSM receivers,” M.Phil. thesis, Hong cated a superconductive flash-type analog-to-digital
Kong University of Science and Technology. [Online.] Available: converter that operated at multi-gigahertz clock and
http://www.ee.ust.hk/~eetak, 1999. input frequencies.
[17] T. Fredrich, “Direct phase noise measurements using a modern spectrum Since September 1994, he has been with the elec-
analyzer,” Microwave J., vol. 35, pp. 94–114, Aug. 1992. trical and electronics engineering faculty at the Hong Kong University of Sci-
ence and Technology, where he has been the Faculty-In-Charge of the Analog
Research Lab and the Associate Director of the EEE Undergraduate Program
Committee. His research interests are in high-performance analog and RF inte-
William S. T. Yan received the Bachelor and grated circuits for wireless and portable communications.
Master’s degrees in electrical and electronics Dr. Luong has served as an Associate Editor for IEEE TRANSACTIONS ON
engineering from the Hong Kong University of CIRCUITS AND SYSTEMS II. He received the Faculty Teaching Excellence Ap-
Science and Technology, Hong Kong, in 1996 and preciation Award from the Hong Kong University of Science and Technology
1999, respectively. School of Engineering in 1995, 1996, and 2000.
He is currently with Maxim Integrated Products,
Sunnyvale, CA. His current interests are in the areas
of high-frequency integrated-circuit design.
2048 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

A 27-mW CMOS Fractional-


Synthesizer Using Digital Compensation
for 2.5-Mb/s GFSK Modulation
Michael H. Perrott, Student Member, IEEE, Theodore L. Tewksbury III, Member, IEEE,
and Charles G. Sodini, Fellow, IEEE

Abstract— A digital compensation method and key circuits


are presented that allow fractional-N synthesizers to be modu-
lated at data rates greatly exceeding their bandwidth. Using this
technique, a 1.8-GHz transmitter capable of digital frequency
modulation at 2.5 Mb/s can be achieved with only two compo-
nents: a frequency synthesizer and a digital transmit filter.
A prototype transmitter was constructed to provide proof
of concept of the method; its primary component is a custom (a)
fractional-N synthesizer fabricated in a 0.6-m CMOS process
that consumes 27 mW. Key circuits on the custom IC are an on-
chip loop filter that requires no tuning or external components, a
61
digital MASH – modulator that achieves low power operation
through pipelining, and an asynchronous, 64-modulus divider
(prescaler). Measurements from the prototype indicate that it
meets performance requirements of the digital enhanced cordless
telecommunications (DECT) standard. (b)

Index Terms— Compensation, continuous phase modulation,


digital radio, frequency modulation, frequency shift keying, fre-
quency synthesizers, phase locked loops, sigma–delta modulation,
transmitters.

(c)
I. INTRODUCTION
Fig. 1. Methods of frequency modulation upconversion: (a) mixer based, (b)

T HE use of wireless products has been rapidly increasing


the last few years, and there has been worldwide develop-
ment of new systems to meet the needs of this growing market.
direct modulation of VCO, and (c) indirect modulation of VCO.

be translated to a desired RF band. This paper will focus on


As a result, new radio architectures and circuit techniques are the issue of frequency translation, which can be accomplished
being actively sought that achieve high levels of integration in at least three different ways for frequency modulation. As
and low power operation while still meeting the stringent illustrated in Fig. 1, the modulation signal can be (a) multiplied
performance requirements of today’s radio systems. Our focus by a local oscillator (LO) frequency using a mixer, (b) fed into
is on the transmitter portion of this effort, with the objective of the input of a voltage controlled oscillator (VCO), or (c) fed
achieving over 1-Mb/s data rate using frequency modulation. into the input of a frequency synthesizer.
To achieve the goals of low power and high integration, Approach (a) can theoretically be accomplished with either
it seems appropriate to develop a transmitter architecture a heterodyne or homodyne approach. The heterodyne approach
that consists of the minimal topology that accomplishes the offers excellent radio performance but carries a high cost
required functionality. All digital, narrowband radio transmit- in implementation due to the current inability to integrate
ters that are spectrally efficient require two operations to be the high- , low-noise, low-distortion bandpass filters required
performed. The baseband modulation data must be filtered to at intermediate frequencies (IF) [1]. As a result, the direct
limit the extent of its spectrum, and the resulting signal must conversion approach has recently grown in popularity [2]–[4].
Manuscript received July 7, 1997; revised August 4, 1997. This work was In this case, two mixers and baseband A/D converters are
supported by DARPA Contract DAAL-01-95-K-3526. required to form in-phase/quadrature (I/Q) channels and a
M. H. Perrott was with the Microsystems Technology Laboratory, Mass- frequency synthesizer to obtain an accurate carrier frequency.
achusetts Institute of Technology, Cambridge, MA 02139 USA. He is now
with Hewlett-Packard Laboratories, Palo Alto, CA 94304-1392 USA. Approach (b) is referred to as direct modulation of a VCO
T. L. Tewksbury III was with Analog Devices, Wilmington, MA 01887 and has appeared in designs for the digital enhanced cordless
USA. He is now with IBM Microelectronics, Waltham, MA 02254 USA. telecommunications (DECT) standard [5], [6]. A frequency
C. G. Sodini is with the Microsystems Technology Laboratory, Massachu-
setts Institute of Technology, Cambridge, MA 02139 USA. synthesizer is used to achieve an accurate frequency setting
Publisher Item Identifier S 0018-9200(97)08270-X. and then disconnected so that modulation can be fed into the
0018–9200/97$10.00  1997 IEEE
PERROTT et al.: CMOS FRACTIONAL- SYNTHESIZER USING DIGITAL COMPENSATION 2049

VCO unperturbed by its dynamics. This technique allows a sig-


nificant reduction in components; no mixers are required since
the VCO performs the frequency translation, and only one D/A
converter is required to produce the modulation signal. Power
savings are thus achieved, as demonstrated by the fact that
the design in [5] appears to consume nearly half the power
of the mixer based designs in [2]–[4]. Unfortunately, since
the synthesizer is inactive during modulation, the nominal
frequency setting of the VCO tends to drift as a result of
leakage currents. In addition, undesired perturbations, such
as the turn-on transient of the power amp, can dramatically
shift the output frequency. As stated in [6], the isolation
requirements for this method exclude the possibility of a one-
chip solution. Therefore, while the approach offers a significant
advantage in terms of power dissipation, the goal of high
N modulator.
Fig. 2. A spectrally efficient, fractional-

integration is lost.
Finally, approach (c) can be viewed as indirect modulation and a pipelined, digital – modulator. Finally, experimental
of the VCO through appropriate control of a frequency syn- results are presented and conclusions made.
thesizer that sets the VCO frequency and yields the simplest
transmitter solution of those presented. The synthesizer has a
digital input which allows elimination of the D/A converter II. BACKGROUND
that is required when directly modulating the VCO. Since the The fractional- approach to frequency synthesis enables
synthesizer controls the VCO during modulation, the problem fast dynamics to be achieved within the phase-locked loop
of frequency drift during modulation is eliminated. Also, (PLL) by allowing a high reference frequency [8]; a very
isolation requirements at the VCO input are greatly reduced at useful benefit when attempting to modulate the synthesizer.
frequencies within the PLL bandwidth. The primary obstacle High resolution is achieved with this approach by allowing
faced with this architecture is that a severe constraint is placed noninteger divide values to be realized through dithering; it
on the maximum achievable data rate due to the reliance on has been shown that low spurious noise can be obtained by
feedback dynamics to perform modulation. using a high-order – modulator to perform this operation
This paper presents a compensation method and key circuits [8], [10], [11]. This approach leads to a simple synthesizer
that allow modulation of a frequency synthesizer at rates that structure that is primarily digital in nature, and is referred to
are over an order of magnitude faster than its bandwidth. as a fractional- synthesizer with noise shaping.
Application of the technique allows a high data rate ( 1 Mb/s) Using this fractional- approach, it is straightforward to
transmitter with good spectral efficiency to be realized with realize a transmitter that performs phase/frequency modula-
only two components: a frequency synthesizer and a digital tion in a continuous manner by direct modulation of the
transmit filter. By avoiding additional components such as synthesizer. Fig. 2 illustrates a simple transmitter capable of
mixers and D/A converters in the modulation path, a low Gaussian minimum shift keying (GMSK) modulation [9]. The
power transmitter architecture is achieved. Since off-chip fil- binary data stream is first convolved with a digital finite
ters are not required, high integration is accomplished as well. impulse response (FIR) filter that has a Gaussian shape.
The technique can be used in transmitter applications where (Physical implementation of this filter can be accomplished
frequency modulation is desired, and a moderate tolerance is with a ROM whose address lines are controlled by consecutive
allowed on the modulation index. (When using compensation, samples of the data and time information generated by a
the accuracy of the modulation index, which is defined as the counter.) The digital output of this filter is then summed with
ratio of the peak-to-peak frequency deviation of the transmitter a nominal divide value and fed into the input of a digital
output to its data rate, is limited by variations in the open- – converter, the output of which controls the instantaneous
loop gain of the PLL [7].) To provide proof of concept of the divide value of the PLL. The nominal divide value sets the
technique, we present results from a 1.8-GHz prototype that carrier frequency, and variation of the divide value causes
supports Gaussian frequency shift keying (GFSK) modulation, the output frequency to be modulated according to the input
the same modulation method used in DECT, at data rates in data. Assuming that the PLL dynamics have sufficiently high
excess of 2.5 Mb/s. bandwidth, the characteristics of the modulation waveform
We begin by reviewing a fractional- synthesizer method are determined primarily by the digital FIR filter and thus
presented in [8]–[11] that provides a convenient structure with accurately set.
which to apply the technique. It is shown that high data rates Fig. 3 depicts a linearized model of the synthesizer dynam-
and good noise performance are difficult to achieve with this ics in the frequency domain. The digital transmit filter confines
topology. A method is proposed to overcome these problems, the modulation data to low frequencies, the – modulator
followed by discussion of issues that ensue from its use. A adds quantization noise that is shaped to high frequencies, and
description of key circuits in the prototype is then given, the PLL acts as a low-pass filter that passes the input but
which include an on-chip loop filter, a 64-modulus divider, attenuates the – quantization noise. In the figure, is
2050 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Fig. 3. Linearized model of fractional- N modulator.

calculated as

(1)

where and are the loop filter transfer func-


tion, the VCO gain (in Hz/V), and the nominal divide value,
respectively. (See [7] for modeling details.) An analogy be- Fig. 4. Achievable data rates versus PLL order and 6–1 sample rate when
tween the fractional- modulator and a – D/A converter 61 0
noise from – is 136 dBc/Hz at 5 MHz offset.
can be made by treating the output frequency of the PLL as
an analog voltage.
output due to quantization noise is expressed as
A key issue in the system is that the – modulator adds
quantization noise at high frequency offsets from the carrier.
(2)
In general, the noise requirements for a transmitter are very
strict in this range to avoid interfering with users in adjacent
channels. In the case of the DECT standard, the phase noise where is the – sample period, and a multistage (MASH)
density can be no higher than 131 dBc/Hz at a 5-MHz structure [12] of order is assumed for the modulator. By
offset [6]. Noise at low frequency offsets is less critical for choosing the order of the MASH – to be the same as
a transmitter and need only be below the modulation signal the order of , the rolloff of (2) and the VCO noise are
by enough margin to insure an adequate signal-to-noise ratio. matched at high frequencies ( 20 dB/dec). Fig. 4 displays the
Sufficient reduction of the – quantization noise can be resulting parameters at different data rates; these values were
accomplished through proper choice of the – sample rate, calculated by setting (2) to 136 dBc/Hz at 5 MHz and the
which is assumed to be equal to the reference frequency, and ratio to 0.7, where is the data rate. (The noise
the PLL transfer function, (Note that this problem is specification was chosen to achieve less than 131 dBc/Hz at
analogous to that encountered in the design of – D/A 5 MHz offset after adding in VCO phase noise.)
converters, except that the noise spectral density at high fre- The figure reveals that the achievement of high data rates
quencies, rather than the overall signal-to-noise ratio, is the key and low noise must come at the cost of power dissipation
parameter.) One way of achieving a low spectral density for and complexity when attempting direct modulation of the
the noise is to use a high sample rate for the – so that the synthesizer. In particular, the power consumed by the digital
quantization noise is distributed over a wide frequency range circuitry is increased at a high – sample rate by virtue
and its spectral density reduced. Alternatively, the attenuation of the increased clock rate of the – modulator and the
offered by can be increased; this is accomplished by digital FIR filter, The power consumed by the
decreasing its cutoff frequency, , or increasing its order, analog section is increased for high values of PLL order since
Unfortunately, a low value of carries a penalty of lower- additional poles and zeros must be implemented. This issue is
ing the achievable data rate of the transmitter. This fact can be aggravated by the need to set these additional time constants
observed from Fig. 3; the modulation data must pass through with high accuracy in order to avoid stability problems in the
the dynamics of the PLL so that its bandwidth is restricted by PLL. If tuning circuits are used to achieve such accuracy [13],
that of Given this constraint, the achievement of low spurious noise problems can also be an issue.
noise must be achieved through proper setting of the –
sample rate and PLL order. III. PROPOSED METHOD
It is worthwhile to quantify required values of these param- The obstacles of high data rate modulation discussed above
eters for a given data rate and noise specification. To do so, are greatly mitigated if the modulation bandwidth is allowed
we first choose to be a Butterworth response of order to exceed that of the PLL. In this case, the bandwidth of
can be set sufficiently low that an excessively high PLL order
or reference frequency is not necessary to achieve the required
noise performance. Fig. 5 illustrates the proposed method that
The above expression is chosen for the sake of simplicity in achieves this goal. By cascading a compensation filter, ,
calculations; other filter responses could certainly be imple- with the digital FIR filter, the transfer function seen by the
mented. The spectral density of the noise at the transmitter modulation data can be made flat by setting
PERROTT et al.: CMOS FRACTIONAL- SYNTHESIZER USING DIGITAL COMPENSATION 2051

TABLE I
THEORETICALLY ACHIEVABLE DATA RATES
USING COMPENSATION FOR SECOND-ORDER PLL

Fig. 5. Proposed compensation method.

This new filter is simple to implement in a digital manner—by


combining it with the FIR filter, we need only alter the ROM
storage values. In fact, savings in area and power of the ROM
can be achieved over the uncompensated method since the
number of time samples that need to be stored are dramatically
reduced [7]. Fig. 6. The effect of mismatch.
To illustrate the technique, we consider the case where
is chosen to implement GFSK modulation with
as used in the DECT standard, and is second signal. If the order of is increased to the resulting
order. Under these assumptions, the time domain version of signal swings will be amplified according to
is described as samples of The achievable data rates using compensation are limited by
the ability of the PLL to accommodate this increased signal
(3) swing. PLL components that are particularly affected are the
– modulator, the divider, and the charge pump. Assuming
where “ ” is the convolution operator, is the – sample an appropriate multibit – structure and multimodulus di-
period, is the period of the data stream, and vider topology are used, the bottleneck in dynamic range will
equals for and zero elsewhere. Since be set by the limited duty cycle range of the charge pump.
is the inverse of , we write Table I displays the achievable data rates at different –
sample rates using compensation; the noise specification was
(4) identical to that used to generate Fig. 4. In light of the signal
swing limitation and our goal of simplicity, we have restricted
In the time domain, the digital compensated FIR filter is our attention to second-order PLL dynamics. Calculations were
then calculated by taking samples of the expression based on the assumption that the duty cycle of the charge pump
is limited only by its transient response, which was assumed
to be 5 ns. Comparison of this information with Fig. 4 reveals
(5) that compensation allows high data rates to be achieved with
For described in (3), these derivatives are well defined relatively low power and complexity. In the actual prototype,
and can be calculated analytically. A final form is derived by data rates as high as 2.85 Mb/s are achieved with a second-
substituting (3) into (5) to yield order PLL with kHz, and a – sample rate of
20 MHz.

B. Matching Issues
(6) In practice, mismatch will occur between the compensation
filter and PLL dynamics. While the compensation filter is
digital and therefore fixed, the PLL dynamics are analog in
A. Achievable Data Rates nature and sensitive to process and temperature variations.
Equation (6) reveals that the signal swing of increases Fig. 6 illustrates that a parasitic pole/zero pair occurs when
in proportion to for large values of the bandwidth of the PLL is too high; a similar situation
Since is the ratio of the modulation data rate to occurs when its bandwidth is too low. As will be seen in the
the bandwidth of the PLL, we see that high data rates lead results sections, the parasitic pole/zero pair causes intersymbol
to large signal swings of the modulation signal when using interference (ISI) and modulation deviation error. To mitigate
compensation. Intuitively, this behavior makes sense since the this problem in the prototype, an on-chip loop filter with
attenuation of must be overcome by the compensated accurate time constants was implemented, and open-loop gain
2052 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Fig. 7. Prototype system.

control was used to accurately place the overall pole and zero
positions of the PLL transfer function.
An additional issue related to be mismatch arises from prac-
tical concerns in the PLL implementation. The achievement
of a large dynamic range in the charge pump is aided by
including an integrator in the loop filter (see Section IV-B2),
which yields an overall PLL transfer function as

(7)

A parasitic pole/zero pair, and , is now added that occurs


well below in frequency. Unfortunately, taking the inverse
of (7) leads to a compensation filter that is IIR in nature and
cannot be implemented with a ROM. To avoid such difficulties,
we can ignore the parasitic pole/zero pair and use as
described in (4). The resulting ISI is negligible since and
are close to each other and low in value. However, the
digital compensation filter must be modified to be samples of Fig. 8. Die photo.
to accommodate the increased gain of at
frequencies greater than TABLE II
POWER DISSIPATION OF IC CIRCUITS

IV. IMPLEMENTATION
To show proof of concept of the proposed compensation
method, the system depicted in Fig. 7 was built using a
custom CMOS fractional- synthesizer that contains several
key circuits. Included are an on-chip, continuous-time filter Fig. 8 displays a die photograph of the custom IC, which
that requires no tuning or external components, a digital was fabricated in a 0.6- m, double-poly, double-metal, CMOS
MASH – modulator with six output bits that achieves process with threshold voltages of V and
low power operation through pipelining, and a 64-modulus V. The entire die is 3 mm by 3 mm, and its power
divider that supports any divide value between 32 and 63.5 dissipation is 27 mW. Table II lists the power consumed by
in half cycle increments. An external divide-by-two prescaler individual circuits. The power supply values given in Table
is used so that the CMOS divider input operates at half the II were chosen to be as low as possible to minimize power
VCO frequency, which modifies the range of divide values to dissipation; at the cost of higher power dissipation, all circuits
include all integers between 64 and 127. could be powered by a single 3.3-V supply.
PERROTT et al.: CMOS FRACTIONAL- SYNTHESIZER USING DIGITAL COMPENSATION 2053

Fig. 10. An asynchronous, 64-modulus divider implementation.

Fig. 9. An asynchronous, eight-modulus divider topology.

The 64-modulus divider and six-output-bit – modulator


provide a dynamic range for the compensated modulation Fig. 11. PFD, charge pump, and loop filter.
data that is wide enough to support data rates in excess of
2.5 Mb/s. The on-chip loop filter allows an accurate PLL
transfer function to be achieved by tuning just one PLL also advocated in [15], is that it allows a minimal number of
parameter—the open-loop gain. A brief overview of each of components to operate at high frequencies—the first two stages
these components is now presented. are simply divide-by-two circuits, not state machines. Also, the
fact that control signals are not fed into the first divide-by-two
A. Divider circuit allows it to be placed off-chip in the prototype.

To achieve a low-power design, it is desirable to use an


asynchronous divider structure to minimize the amount of B. Analog Section
circuitry operating at high frequencies. As such, a multimod- The achievement of accurate PLL dynamics is accomplished
ulus divider structure was designed that consists of cascaded in the prototype system with the variable gain loop filter
divide-by-2/3 sections [14]; this architecture is an extension of topology depicted in Fig. 11. The input to the filter is the
the common dual-modulus topology [15]. The eight-modulus instantaneous phase error between the reference frequency and
example in Fig. 9 shows the proposed structure which allows divider output and is manifested as the deviation of the phase
a wide range of divide values to be achieved by allowing a frequency detector (PFD) output duty cycle from its nominal
variable number of input cycles to be “swallowed” per output value of 50%. As modulation data is applied, the duty cycle
cycle. Each divide-by-2/3 stage normally divides its input by is swept across a range of values; the shaded region in the
two in frequency, but will swallow an extra cycle per OUT figure corresponds to the deviation that occurs when GFSK
period when its control input, , is set to one. As shown modulation at 2.5 Mb/s is applied. A 50% nominal duty cycle
for the case where all control bits are set to one, the number is desired to avoid the dead-zone of the PFD and thus reduce
of IN cycles swallowed per OUT period is binary weighted distortion of the modulation signal. The prototype used a PFD
according to the stage position. For instance, setting design from [16] to achieve this characteristic.
causes one cycle of IN to be swallowed, while setting To produce a signal that is a filtered version of the phase
causes four cycles of IN to be swallowed. Proper selection of error, the output of the PFD is converted to complementary
allows any integer divide value between 8 and 15 current waveforms by a charge pump before being sent into
to be achieved. the inputs of an on-chip loop filter. The conversion to current
The 64-modulus divider that was developed for the proto- allows the filtering operation to be performed without resistors
type system uses a similar principle to that discussed above, and also provides a convenient means of performing gain
but has a modified first stage to achieve high-speed operation. control of the resulting transfer function. An integrator is
Specifically, the implemented architecture consists of a high- included in the loop filter which forces the average current
speed divide-by-4/5/6/7 state machine followed by a cascaded from the charge pump to be zero and the nominal duty cycle
chain of divide-by-2/3 state machines as illustrated in Fig. 10. to be, ideally, 50% when the PLL is locked.
The divide-by-4/5/6/7 stage accomplishes cycle swallowing A PFD design with 50% nominal duty cycle is seldom used
by shifting between four phases of a divide-by-two circuit. in PLL circuits due to power consumption and spurious noise
Each of the four phases is staggered by one IN cycle, which issues—the charge pump is always driving current into the
allows single cycle pulse swallowing resolution despite the fact loop filter under such conditions, and spurs at multiples of
that two cascaded divide-by-two structures are used; details of the reference frequency are produced due to the square wave
this approach are discussed at length in [7]. The important output of the PFD. Fortunately, these problems are greatly
point to make about the phase shifting approach, which is mitigated in the prototype transmitter since the charge pump
2054 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

(a)

(b)

Fig. 12. Loop filter implementation. (c)


Fig. 13. Effect of transient time and mismatch on duty cycle range.

output current is very small (at its largest setting, it toggles


between 3.5 and 3.5 A), and the loop filter bandwidth is differential since the output of the opamp is connected directly
very low (84 kHz) in comparison to the reference frequency to the varactor of an LC-based VCO, which is inherently single
(20 MHz). The resulting spur at the transmitter output is less ended. Fortunately, measured eye diagrams and spectral plots
than 60 dBc at 20 MHz when measuring the transmitter in an presented at the end of this paper conform to calculations
unmodulated state without an RF bandpass filter at its output. that exclude substrate noise, thereby showing that it has
When modulated, this spur is convolved with the modulation negligible impact on the modulation and noise performance
signal and thus turned into phase noise [7]; it is reasonable to of the prototype system. However, as even higher levels of
assume that this noise is reduced to a negligible level when the integration are sought in future radio systems, the impact of
RF bandpass filter is included due to its high frequency offset. substrate noise will need to be carefully considered.
1) Loop Filter: The on-chip loop filter uses an opamp to The limited dynamics of the opamp prevent it from fol-
integrate one of the currents and add it to a first-order filtered lowing the fast transitions of its input current waveforms.
version of the other current. This topology, shown in Fig. 12, To prevent these waveforms from adversely affecting the
realizes the transfer function performance of the opamp, the voltage swing that appears at its
input terminals is reduced to a low amplitude (less than 40 mV
peak-to-peak) by capacitors and In the case of , this
capacitor also serves as part of the switched capacitor filter.
kHz kHz (8)
2) Charge Pump: Proper design of the charge pump is
The open loop gain, , is adjusted by varying the charge critical for the achievement of high data rates since it forms the
pump output current, The first-order pole is created using bottleneck in dynamic range that is available to the modulation
a switched capacitor technique, which reduces its sensitivity signal. Fig. 13 illustrates the fundamental issues that need to be
to thermal and process variations and removes any need considered in its design. To avoid distortion of the modulation
for tuning. Note that, although this time constant is formed signal, the variation in duty cycle should be limited to a range
through a sampling operation, the output of the switched that allows the output of the charge pump to settle close to
capacitor filter is a continuous-time signal. Finally, the value of its final value following all positive and negative transitions.
the zero, is determined primarily by the ratio of capacitors Fig. 13(a) shows the dynamic range available for a well-
and under the assumption that the complementary designed charge pump; the nominal duty cycle is 50% and the
charge pump currents are matched. transition times are fast. Fig. 13(b) demonstrates the reduction
A particular advantage of the filter topology is that the rate in dynamic range that occurs when the nominal duty cycle is
of sampling and can be set high since it is independent offset from 50%. This offset is caused by a mismatch between
of the settling dynamics of the opamp. As such, and positive and negative currents produced by the charge pump.
are set to the PFD output frequency, 20 MHz, to avoid (The type II PLL dynamics force an average current of zero.)
aliasing problems. Finally, Fig. 13(c) illustrates a case in which the charge pump
The opamp is realized with a single-ended, two-stage topol- has slow transition times, the result again being a reduction
ogy chosen for its simplicity and wide output swing. Its unity in dynamic range.
gain frequency was designed to be 6 MHz; this value is The charge pump topology was designed with the above is-
sufficiently higher than the bandwidth of the GFSK modu- sues in mind and is illustrated in Fig. 14. The core component
lation signal at 2.5 Mb/s to avoid significantly affecting it. of the architecture is a differential pair and that is
It is recognized that the single-ended structure has higher fed from the top by two current sources, and , and from
sensitivity to substrate noise than a differential counterpart. the bottom by a tail current, Ideally, and are equal
However, little would be gained in this case by making it fully to and to where is adjusted by a 5-b D/A that
PERROTT et al.: CMOS FRACTIONAL- SYNTHESIZER USING DIGITAL COMPENSATION 2055

Fig. 14. Charge pump implementation.

Fig. 15. A second-order, digital MASH structure.

controls the node Transistors and are switched


on and off according to which ideally causes and
to switch between and
To achieve a close match between the positive and negative
currents of each charge pump output, the design strives to
set and In the first case,
and are implemented as cascoded PMOS devices whose
layout is optimized to achieve high levels of device matching.
Unfortunately, device matching cannot be used to achieve a
close match between and since they are generated
by different types of devices. To circumvent this obstacle, a Fig. 16. A pipelined adder topology.
feedback stage is used to adjust by comparing currents
produced by a replica stage. This technique allows to C. – Modulator
be matched to to the extent that the replica stage is
matched to the core circuit. Fig. 15 shows the second-order MASH – topology used
A low transient time in the charge pump response is in the prototype. This structure is well known [12] and has
obtained by careful design of signal and device characteristics properties that are well suited to our transmitter application.
at the source nodes of and First, the parasitic The MASH topology is unconditionally stable over its entire
capacitance at this node is minimized by using appropriate input range and is readily pipelined by using a technique
layout techniques to reduce the source capacitance of described in this section.
and , the drain capacitance of , and the interconnect The spectral density at the output of a second-order MASH
capacitance between each of the devices. Second, the voltage – modulator is described by the equation
deviation is minimized at this node that occurs when (9)
switches. The level converter depicted in Fig. 14 accomplishes
this task by reducing the voltage variation at nodes and In the presence of a sufficiently active input, can
to less than 350 mV and setting an appropriate dc bias. be considered a white noise source with spectral density
2056 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Fig. 17. A pipelined, second-order, digital MASH structure.

Fig. 18. Pipelined digital data path to divider input.

This assumption is reasonable while the filter , which is implemented with two pipelined adders
modulation signal is applied; we have found that setting the and a delay element, A delay is inserted between these
least significant bit (LSB) of the modulator high also helps two adders in order to pipeline their sum path, which requires
to achieve this condition by forcing the internal states of the a matching delay in the path above for time alignment.
MASH structure to constantly change. Also, a delay must also be included in the output path of
A fact that does not appear to have been appreciated in the first – stage to compensate for the time delay incurred
the literature is that the digital MASH – structure is through the second stage. Since a signal once placed in the
highly amenable to pipelining. This is a useful technique when “pipe shifted domain” can be sent through any number of
seeking a low power implementation since it allows the supply cascaded, pipelined adders and/or integrators, only one pipe
voltage to be reduced by virtue of the fact that the required shift and align shift are needed in the entire structure.
throughput can be achieved with lower circuit speed. Fig. 18 illustrates the implementation of the overall digital
To pipeline the MASH structure, we apply a well-known path using pipelining. To save area, the circuits were pipelined
technique that has been used for adders and accumulators [17], every two bits as opposed to one, and pipe shifting was not
[18]. Fig. 16 illustrates a 3-b example. Since the critical path applied to the carrier frequency signal since it is constant
in these structures is their carry chain, registers are inserted in during modulation. To achieve flexibility, the compensated
this path. To achieve time alignment between the input and the digital transmit filter was implemented in software, as opposed
delayed carry information, registers are also used to skew the to a ROM, and the resulting digital data stream fed into the
input bits. As indicated in the figure, we refer to this operation custom CMOS IC.
as “pipe shifting” the input. The adder output is realigned
in time by performing an “align shift” of its bits as shown. V. MEASURED PERFORMANCE
(Note that shading is applied to the adder block in Fig. 16 as a The primary performance criteria by which a transmitter is
reminder that its bits are skewed in time.) The same pipelining judged are its accuracy in modulation and its noise perfor-
approach can be applied to digital accumulators since there is mance. We now describe the characterization of the prototype
no feedback from higher to lower bits. in relation to these issues.
Since its basic building blocks are adders and accumulators, Fig. 19 shows measured eye diagrams from the prototype
a MASH – modulator of any order can be pipelined using using an HP 89441A modulation analyzer. To illustrate the
this technique. Using the symbols introduced in the previous impact of mismatch between the compensation filter and PLL
two figures, Fig. 17 depicts a pipelined, second-order MASH dynamics, measurements were taken under three different
topology. Each first-order – is realized as a pipelined values of open-loop gain. These results indicate that the
accumulator with feedback removed from the most significant modulation performance of the transmitter is quite good even
bits in its output. The output of the second stage is fed into the when the open-loop gain is in error by 25%; the effects of
PERROTT et al.: CMOS FRACTIONAL- SYNTHESIZER USING DIGITAL COMPENSATION 2057

(a) (b) (c)


Fig. 19. Measured eye diagrams at 2.5 Mb/s for three different open-loop gain settings: (a) 025% gain error, (b) 0% gain error, and (c) 25% gain error.

TABLE III
VALUES OF NOISE SOURCES WITHIN PLL

where is 30 MHz/V. The value of the kT/C noise cur-


rent produced by the switched capacitor operation, is
calculated as

(11)
Fig. 20. Expanded view of PLL system. where is Boltzmann’s constant, and is temperature in
degrees Kelvin.
this gain error are to produce a moderate amount of ISI and Assuming that each of the noise sources in Fig. 20 are
an error in the modulation deviation. independent of each other, we can express the overall phase
An explanation of the observed ISI and deviation error is noise spectral density at the transmitter output as
given in [7]. In brief, the resulting mismatch creates a parasitic
pole/zero pair that occurs near the cutoff frequency of the PLL (12)
(84 kHz in this case); the resulting transfer function seen by
where and are the noise contributions
the data can be viewed as the sum of a low-pass and an all-pass
from the dominant voltage, current, and quantization noise
filter. ISI is introduced as data excites the impulse response
sources. Based on the values in Table III and the model in
of the low-pass filter, and modulation deviation error occurs
Fig. 20, we obtain
since the magnitude of the all-pass is changed according to
the amount of mismatch present.
Fig. 20 displays the dominant sources of noise in the
(13)
prototype; their values are displayed in Table III. Many of
these values were obtained through ac simulation of the
where
relevant circuits in HSPICE. Note that all noise sources other
than are assumed to be white, so that the values of their
variance suffice for their description. This assumption is only
approximate for the VCO noise in the prototype, as will be In the case of the division of by two is an
seen in the measured data. approximation based on the fact that the dominant charge
Based on measurements, the input referred noise of the VCO pump noise source is switched in and out at each opamp input
was calculated in the table from the expression with a nominal duty cycle of 50%. Note that is given
by (2).
dBc/Hz at MHz A plot of the spectra in (13) is shown in Fig. 21(a).
(10) Computation of these spectra assumed the parameter values
2058 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

(a)

(b)
Fig. 21. Noise spectra of synthesizer: (a) calculated: (1) charge pump induced, S8 ()
f ; (2) VCO and opamp induced, S 8 ()
f ; (3) 6–1 induced, S 8 ()
f ;

()
(4) overall, S8 f and (b) measured synthesizer and open-loop VCO noise.

listed in Fig. 20 and Table III, and described by (7) with system. (The spurious content of the – modulator was
reduced to negligible levels by feeding a binary data stream
kHz kHz into the LSB of the modulation path so that the internal states
kHz of the – were randomized; the binary data stream was
designed to have relatively flat spectral characteristics and
As seen in this diagram, the noise from the charge pump negligible levels of spurious energy at frequencies greater
dominates at low frequencies, and the influence of the – than 10 kHz.) The resulting spectrum compares quite well
quantization noise dominates at high frequencies. with the calculated curve in Fig. 21(a), especially at high
Fig. 21(b) shows measured noise results from the transmitter frequency offsets close to 5 MHz. At lower frequencies in the
prototype taken with an HP 3048A phase noise measurement range of 100 kHz, the measured noise is within about 3 dB
PERROTT et al.: CMOS FRACTIONAL- SYNTHESIZER USING DIGITAL COMPENSATION 2059

of the predicted value; the higher discrepancy in this region [8] T. A. Riley, M. A. Copeland, and T. A. Kwasniewski, “Delta-sigma
might be attributed to the fact that was calculated without modulation in fractional-N frequency synthesis,” IEEE J. Solid-State
Circuits, vol. 28, pp. 553–559, May 1995.
considering the offset or transient response of the charge pump [9] T. A. Riley and M. A. Copeland, “A simplified continuous phase mod-
and/or the possible inaccuracy of the HSPICE device models ulator technique,” IEEE Trans. Circuits Syst. II, vol. 41, pp. 321–328,
May 1994.
at low currents. Note that the spur at 20-MHz offset (the [10] B. Miller and B. Conley, “A multiple modulator fractional divider,” in
reference frequency), which is due to the 50% nominal duty Proc. 44th Annual Symp. on Frequency Control, May 1990, pp. 559–567.
cycle of the PFD, is less than 60 dBc. [11] B. Miller and B. Conley, “A multiple modulator fractional divider,”
IEEE Trans. Instrum. Meas., vol. 40, pp. 578–583, June 1991.
Fig. 21(b) demonstrates that the unmodulated transmitter [12] J. Candy and G. Temes, Oversampling Delta-Sigma Data Converters.
has an output spectrum of 132 dBc/Hz at 5-MHz New York: IEEE Press, 1992.
offset from the carrier. At this frequency offset, simulations [13] Y. Tsividis and J. Voorman, Integrated Continuous-Time Filters. New
York: IEEE Press, 1993.
reveal that the output spectrum of the modulated transmitter [14] T. Kamoto, N. Adachi, and K. Yamashita, “High-speed multi-modulus
is equal to when its data rate is close to the DECT rate prescaler IC,” in 1995 Fourth IEEE Int. Conf. Universal Personal
Communications. Record. Gateway to the 21st Century, 1995, pp. 991,
of 1 Mb/s [7]. This being the case, the transmitter satisfies the 325-8.
DECT noise specification of 131 dBc/Hz at 5-MHz offset; [15] J. Craninckx and M. S. Steyaert, “A 1.75-GHz/3-V dual-modulus divide-
eye diagrams for data rates close to 1 Mb/s are found in [7]. by-128/129 prescaler in 0.7-m CMOS,” IEEE J. Solid-State Circuits,
vol. 31, pp. 890–897, July 1996.
[16] M. Thamsirianunt and T. A. Kwasniewski, “A 1.2 m CMOS implemen-
tation of a low-power 900-MHz mobile radio frequency synthesizer,” in
VI. CONCLUSION IEEE Custom IC Conf., 1994, pp. 16.2/1-4.
[17] S.-J. Jou, C.-Y. Chen, E.-C. Yang, and C.-C. Su, “A pipelined multiplier-
A digital compensation method and key circuits were pre- accumulator using a high-speed, low-power static and dynamic full
sented that allow modulation of a frequency synthesizer at adder design,” IEEE J. Solid-State Circuits, vol. 32, pp. 114–118, Jan.
rates over an order of magnitude faster than its bandwidth. 1997.
[18] F. Lu and H. Samueli, “A 200-MHz CMOS pipelined multiplier-
Using this technique, a transmitter prototype was built that accumulator using a quasidomino dynamic full-adder cell design,” IEEE
achieves 2.5-Mb/s data rate modulation using GFSK modu- J. Solid-State Circuits, vol. 28, pp. 123–132, Feb. 1993.
lation at a carrier frequency of 1.8 GHz. Measured results
indicate that the architecture can achieve the modulation and
noise performance required by the DECT standard with a
structure that is highly integrated and has low power dis-
sipation. In particular, the mostly digital design requires no
off-chip filters, no mixers, and no D/A converters in the Michael H. Perrott (S’97) was born in Austin, TX,
in 1967. He received the B.S.E.E. degree from New
modulation path. Further, the structure contains only the core Mexico State University, Las Cruces, in 1988, and
components required of a narrowband, spectrally efficient the M.S. and Ph.D. degrees in electrical engineering
transmitter: a frequency synthesizer and a digital transmit filter. and computer science from Massachusetts Institute
of Technology, Cambridge, in 1992 and 1997, re-
spectively.
ACKNOWLEDGMENT He currently works at Hewlett-Packard Labora-
tories, Palo Alto, CA. His interests include signal
The authors thank G. Dawe and J. Mourant for guidance processing and circuit design applied to communi-
cation systems.
in RF issues, A. Chandrakasan for discussion on low power
methods, R. Weiner for bonding the die, B. Broughton for aid
in phase noise measurements, and M. Trott, P. Ferguson, P.
Katzin, Z. Zvonar, and D. Fague for advice.

REFERENCES
Theodore L. Tewksbury III (S’86–M’87) received
[1] P. Gray and R. Meyer, “Future directions in silicon IC’s for RF personal the S.B. degree in architecture in 1983 and the
communications,” in IEEE Custom IC Conf., 1995, pp. 83–90. M.S. and Ph.D. degrees in electrical engineering and
[2] T. Stetzler, I. Post, J. Havens, and M. Koyama, “A 2.7–4.5 V single-chip computer science in 1987 and 1992, respectively,
GSM transceiver RF integrated circuit,” in Proc. IEEE Int. Solid-State all from the Massachusetts Institute of Technology,
Circuits Conf., Feb. 1995, pp. 150–151. Cambridge. His doctoral dissertation consisted of
[3] J. Min, A. Rofougaran, H. Samueli, and A. A. Abidi, “An all-CMOS ar- an experimental and theoretical investigation of the
chitecture for a low-power frequency-hopped 900 MHz spread spectrum effects of oxide traps on the large-signal transient
transceiver,” in IEEE Custom IC Conf., 1994, pp. 16.1/1-4. performance of analog MOS circuits.
[4] S. Sheng, L. Lynn, J. Peroulas, K. Stone, I. O’Donnell, and R. Brodersen, He joined Analog Devices, Inc., in 1987 as De-
“A low-power CMOS chipset for spread-spectrum communications,” in sign Engineer for the Converter Group, where he
Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1996, pp. 346–347. worked on high-speed, high-resolution data acquisition circuits for video,
[5] S. Heinen, S. Beyer, and J. Fenk, “A 3.0 V 2 GHz transmitter IC for instrumentation, and medical applications. From 1992 to 1994, as Senior Char-
digital radio communication with integrated VCO’s,” in Proc. IEEE Int. acterization Engineer, he was involved in the development of high-accuracy
Solid-State Circuits Conf., Feb. 1995, pp. 150–151. analog models for advanced bipolar, BiCMOS, and CMOS processes, with
[6] S. Heinen, K. Hadjizada, U. Matter, W. Geppert, V. Thomas, S. Weber, emphasis on the statistical modeling of manufacturing variations. In December
S. Beyer, J. Fenk, and E. Matschke, “A 2.7 V 2.5 GHz bipolar chipset for 1994, he joined the newly formed Communications Division at Analog
digital wireless communication,” in Proc. IEEE Int. Solid-State Circuits Devices as RF Design Engineer. He is presently involved in the design of
Conf., Feb. 1997, pp. 306–307. RF integrated circuits for wireless communications, including GSM, DECT,
[7] M. H. Perrott, “Techniques for high data rate modulation and low power and DBS. He is also actively involved in the development and modeling of
operation of fractional-N frequency synthesizers,” Ph.D. dissertation, advanced semiconductor technologies for RF applications, including ADRF
MIT, 1997. (Analog Devices bipolar RF process) and silicon germanium.
2060 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Charles G. Sodini (S’80–M’82–SM’90–F’95) was


born in Pittsburgh, PA, in 1952. He received the
B.S.E.E. degree from Purdue University, Lafayette,
IN, in 1974 and the M.S.E.E. and the Ph.D. degrees
from the University of California, Berkeley, in 1981
and 1982, respectively.
He was a Member of the Technical Staff at
Hewlett-Packard Laboratories from 1974 to 1982,
where he worked on the design of MOS memory
and later, on the development of MOS devices with
very thin gate dielectrics. He joined the faculty of
the Massachusetts Institute of Technology, Cambridge, in 1983, where he
is currently a Professor in the Department of Electrical Engineering and
Computer Science. His research interests are focused on IC fabrication, device
modeling, and device level circuit design, with emphasis on analog and
memory circuits and systems.
Dr. Sodini held the Analog Devices Career Development Professorship of
Massachusetts Institute of Technology’s Department of Electrical Engineering
and Computer Science and was awarded the IBM Faculty Development
Award from 1985–1987. He has served on a variety of IEEE Conference
Committees, including the International Electron Device Meeting where he
was the 1989 General Chairman. He was the Technical Program Co-Chairman
for the 1992 Symposium on VLSI Circuits and the 1993-1994 Co-Chairman of
the Symposium. He has served on the Electron Device Society Administrative
Committee from 1988–94 and is currently a member of the Solid-State Circuits
Council.
780 IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

A CMOS Frequency Synthesizer with


an Injection-Locked Frequency Divider
for a 5-GHz Wireless LAN Receiver
Hamid R. Rategh, Student Member, IEEE, Hirad Samavati, Student Member, IEEE, and Thomas H. Lee, Member, IEEE

Abstract—A fully integrated 5-GHz phase-locked loop (PLL)


based frequency synthesizer is designed in a 0 24 m CMOS
technology. The power consumption of the synthesizer is signif-
icantly reduced by using a tracking injection-locked frequency
divider (ILFD) as the first frequency divider in the PLL feedback
loop. On-chip spiral inductors with patterned ground shields are
also optimized to reduce the VCO and ILFD power consumption
and to maximize the locking range of the ILFD. The synthesizer
consumes 25 mW of power of which only 3.8 mW is consumed
by the VCO and the ILFD combined. The PLL has a bandwidth
of 280 kHz and a phase noise of 101 dBc/Hz at 1 MHz offset
frequency. The spurious sidebands at the center of adjacent
channels are less than 54 dBc.
Index Terms—CMOS RF circuits, frequency synthesizers, injec- Fig. 1. (a) U-NII and HIPERLAN frequency bands and (b) channel allocation
tion-locked frequency dividers, wireless LAN. in our U-NII band WLAN system.

I. INTRODUCTION we present our proposed architecture of the frequency synthe-


sizer which takes advantage of an injection-locked frequency

T HE DEMAND for wireless local area network (WLAN)


systems which can support data rates in excess of 20 Mb/s
with very low cost and low power consumption is rapidly in-
divider (ILFD) to reduce the overall power consumption.
Section IV-A is dedicated to the design of the VCO and
demonstrates how on-chip spiral inductors can be optimized to
creasing. The newly released unlicensed national information reduce the VCO power consumption and to improve the phase
infrastructure (U-NII) frequency band in the United States is pri-
noise performance at the same time. Section IV-B describes
marily intended for wideband WLAN and provides 300 MHz of
the design issues of ILFD’s as well as the optimization of
spectrum at 5 GHz [Fig. 1(a)]. The lower 200 MHz of this band on-chip spiral inductors for wide-locking-range and low-power
(5.15–5.35 GHz) overlaps the European high-performance radio
ILFD’s. The pulse swallow frequency divider, charge pump,
LAN (HIPERLAN) frequency band. The upper 100 MHz of the
and loop filter are the subjects of Sections IV-C, IV-D, and
spectrum which overlaps the industrial, scientific, and medical
IV-E, respectively. The measurement results are presented in
(ISM) band is not used in our system. To stay compatible with Section V and conclusions are made in Section VI.
HIPERLAN the lower 200 MHz of the spectrum is divided into
eight channels which are 23.5 MHz wide [Fig. 1(b)]. The min-
imum signal level at the receiver is 70 dBm while the max- II. FREQUENCY SYNTHESIZERS
imum strength of the received signal is 20 dBm. The large Frequency synthesizers are an essential part of wireless re-
dynamic range and wide channel bandwidths set very stringing ceivers and often consume a large percentage (20–30%) of the
requirements for the synthesizer phase noise and spurious side- total power (Table I). A typical PLL-based frequency synthe-
band levels. sizer comprises both high and low frequency blocks. The high
In this paper we describe the design of a fully integrated frequency blocks, mainly the VCO and first stage of the fre-
integer- frequency synthesizer as a local oscillator (LO) for quency dividers, are the main power consuming blocks, espe-
a U-NII band WLAN receiver. The front end of the receiver is cially in a CMOS implementation. Therefore, BiCMOS tech-
described in [9]. nology has often been chosen over CMOS, where the VCO and
Section II describes some of the synthesizer design chal- the prescaler are designed with bipolar transistors and the low
lenges and reviews previously existing solutions. In Section III frequency blocks are CMOS [1]. Off-chip VCO’s and dividers
have also been used as an alternative [4]. However, because of
Manuscript received August 2, 1999; revised November 29, 1999. This work the increased cost neither of these two solutions is suitable for
was supported by the Stanford Graduate Fellowship program and IBM Corpo- many applications, and a fully integrated CMOS solution is fa-
ration. vorable. A dividerless frequency synthesizer [11] which elimi-
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305 USA (e-mail: hamid@smirc.stanford.edu). nates power–hungry frequency dividers is one solution for such
Publisher Item Identifier S 0018-9200(00)02988-7. low-power and fully integrated systems. In this technique an
0018–9200/00$10.00 © 2000 IEEE
RATEGH et al.: CMOS FREQUENCY SYNTHESIZER 781

TABLE I
POWER CONSUMPTION OF FULLY INTEGRATED
WIRELESS RECEIVERS

aperture phase detector is used to compare the phase of the ref-


erence signal and the VCO output at every rising edge of the ref-
erence signal for only a time window which is a small fraction
of the reference period. Thus no frequency divider is required
in this PLL. The idea of a dividerless frequency synthesizer, al-
though suitable for systems such as a GPS receiver where only
one LO signal is required, is not readily applied to wireless sys-
tems which require multiple LO frequencies with a small fre-
quency separation.
Fig. 2. Frequency synthesizer block diagram.
III. PROPOSED SYNTHESIZER ARCHITECTURE
Our proposed architecture (Fig. 2) is an integer- frequency main source of noise and a better design strategy is to maximize
synthesizer with an initial low power divide-by-two in the PLL the effective parallel impedance of the RLC tank at resonance.
feedback loop. The prescaler follows the fixed frequency divider This choice increases the oscillation amplitude for a given
and operates at half the output frequency, thus, its power con- power consumption and hence reduces the phase noise caused
sumption is reduced significantly. Furthermore, the first divider by the noise injection from the active devices. Since inductors
is an injection-locked frequency divider [6], [7] which takes ad- are the main source of loss in the tank, the product should
vantage of the narrowband nature of the system and trades off be maximized to maximize the effective parallel impedance of
bandwidth for power via the use of resonators. To further reduce the tank at resonance, where is the inductance and is the
the power consumption, optimization techniques are used to de- quality factor of the spiral inductors. It is important to realize
sign the on-chip spiral inductors of the VCO and ILFD. that maximizing alone does not necessarily maximize the
Because of the fixed initial divide-by-two in the loop the ref- product, and it is the latter that matters here.
erence frequency in our system is half of the LO spacing and is To design the spiral inductors, we use the same inductor
11 MHz. Consequently, the loop bandwidth is reduced to main- model reported in [14]. The inductance is first approximated
tain the loop stability. This bandwidth reduction helps to filter with a monomial expression as in [3]. Optimization is used
harmonics of the reference signal, mainly the second harmonic, next to find the inductor with the maximum product. The
which generate spurs in the middle of the adjacent channels. inductors in this design are 2.26 nH each with an estimated
The drawbacks of a reduced loop bandwidth are an increased quality factor of 5.8 at 5 GHz. It is worth mentioning that at
settling time and a higher in-band VCO phase noise. The higher 5 GHz, the magnetic loss in the highly doped substrate of the
in-band VCO phase noise is not a limiting factor as the in-band epi process reduces the inductor quality factor significantly.
noise is dominated by the upconverted noise of the reference Approximate calculations show that substrate inductive loss
signal. The slower settling time is only a problem in very fast is proportional to the cube of the inductor’s outer diameter.
frequency-hopped systems. Therefore, a multilayer stacked inductor which has a smaller
The synthesized LO frequency in our system is 16/17 of the area compared to a single-layer inductor with the same induc-
received carrier frequency. This choice of LO frequency not tance may achieve a larger quality factor. We should mention
only eases the issue of image rejection in the receiver [9], but that in our design, inductors are laid out using only the top-most
also facilitates the generation of the second LO, which is 1/16 metal layer.
of the first LO, with the same synthesizer. The varactors in Fig. 3 are accumulation-mode MOS capac-
itors [5], [12]. The quality factor of these varactors can be sub-
stantially degraded by gate resistance if they are not laid out
IV. SYNTHESIZER BUILDING BLOCKS
properly. In our design each varactor is laid out with 14 fingers
A. Voltage-Controlled Oscillator which are 3 m wide and 0.5 m long. The quality factor of this
Fig. 3 shows the schematic of the VCO. Two cross-coupled varactor at 5 GHz is estimated to exceed 60. The losses of the
transistors M1 and M2 generate the negative impedance RLC tank are thus dominated by the inductors, as expected.
required to cancel the losses of the RLC tank. On-chip spiral
inductors with patterned ground shields [15] are used in this B. Injection-Locked Frequency Divider
design. The two main requirements for the VCO are low phase Fig. 4 shows the schematic of the voltage–controlled ILFD
noise and low power consumption. If the inductors were the used in the frequency synthesizer. The incident signal (the VCO
main source of noise, maximizing their quality factor would output) is injected into the gate of M3 and is delivered with
improve the phase noise significantly. However, in multi-GHz a subunity voltage gain to Vx, the common source connection
VCO’s with short channel transistors, inductors are not the of M1 and M2. Transistor M4 is used to provide a symmetric
782 IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 3. Schematic of the VCO.


Fig. 4. Schematic of the differential ILFD.

As in the VCO design, on-chip spiral inductors with patterned


load for the VCO. The output signal is fed back to the gates of
ground shields are used in the ILFD, but with a different opti-
M1 and M2 and is summed with the incident signal across the
mization objective. As mentioned earlier the largest practical
gates and sources of M1 and M2. The nonlinearity of M1 and
inductance maximizes the locking range. However, reduc-
M2 generates intermodulation products which allow sustained
tion of power consumption demands maximization of the
oscillation at a fraction of the input frequency [6]. As shown
product. The inductor has its largest value when the total ca-
in [6] in the special case of a divide-by-two and a third-order
pacitance that resonates with it is minimized. To reduce its par-
nonlinearity, the phase-limited locking range of an ILFD
asitic bottom plate capacitance the inductor should be laid out
can be expressed as
with narrow topmost metal lines. However, the large series resis-
tance of narrow metal strips degrades the inductor quality factor
and reduces the product significantly. Therefore, both and
(1)
the product may not be maximized simultaneously for an
on-chip spiral inductor resonating with a fixed capacitance. Op-
where timization is thus used to design for the maximum inductance
free–running oscillation frequency; such that the product is large enough to satisfy the speci-
frequency offset from ; fied power budget. The inductors resulting from this trade-off
incident amplitude; are 9.5 nH each with an estimated quality factor of 4.2 at the
impedance of the RLC tank at resonance; divider output frequency (2.5 GHz).
quality factor of the RLC tank;
second-order coefficient of the nonlinearity. C. Pulse Swallow Frequency Divider
As (1) suggests, a larger incident amplitude as well as a larger The pulse swallow frequency divider ( ) consists of a
result in a larger achievable which we refer to as the prescaler followed by a program and pulse swallow
locking range. In an oscillator , so the largest counter. Only one CMOS logic ripple counter is used for both
practical inductance should be used to maximize the locking program and pulse swallow counters. The program counter
range. generates one output pulse for every ten input pulses. The
A larger quadratic nonlinearity ( ) also increases the locking output of the pulse swallow counter is controlled by three
range. So a circuit architecture with a large second-order non- channel select bits. The overall division ratio is 220–227. At
linearity is favorable for a divide-by-two ILFD and in fact the the beginning of the cycle the prescaler divides by 23. As soon
circuit in Fig. 4 has such a characteristic. The common source as the first three bits of the ripple counter match the channel
connection node of the differential pair moves at twice the fre- select bits, the prescaler begins to divide by 22. The next cycle
quency of the output signal even in the absence of the incident starts after the ripple counter counts to ten.
signal. So this circuit has a natural tendency for divide-by-two The prescaler consists of three dual-modulus divide-by-2/3
operation when the incident signal is effectively injected into and one divide-by-2 frequency divider made of source-coupled
node Vx. logic (SCL) flip-flops and gates (Fig. 5). The modulus control
To further extend the locking range, the ILFD is designed (MC) input selects between divide-by-22 and divide-by-23. Ex-
such that the resonant frequency of its output tank tracks the cept for the second dual modulus all other dividers including
input frequency. Accumulation mode MOS varactors are used to the CMOS counters are triggered by the falling edges of their
tune the ILFD and its control voltage is tied to the VCO control input clocks, allowing a delay of as much as half the period of
voltage (Fig. 2). The locking range of the ILFD therefore does the input of each divider. With this arrangement we guarantee
not limit the tuning range of the PLL beyond what is determined overlap between , and (Fig. 5) and prevent a race
by the VCO. condition.
RATEGH et al.: CMOS FREQUENCY SYNTHESIZER 783

Fig. 6. Simplified schematic of the charge pump and loop filter.

Fig. 5. Block diagram of the prescaler.

D. Charge Pump
Fig. 6 shows the circuit diagram of the charge pump and loop
filter. The charge pump has a differential architecture. However,
only a single output node, , drives the loop filter. To pre-
vent the node from drifting to the rails when neither of the
Fig. 7. Linearized PLL model.
up and down signals (U and D) is active, the unity gain buffer
shown in Fig. 6 is placed between the two output nodes. This
buffer keeps the two output nodes at the same potential and where is the crossover frequency. By differentiating (4) with
thus reduces the charge pump offset. The power of the spurious respect to it can be shown that the maximum phase margin
sidebands in the synthesized output signal is thereby reduced. is achieved at
In this charge pump the current sources are always on and the
PMOS and NMOS switches are used to steer the current from (5)
one branch of the charge pump to the other.
and the maximum phase margin is
E. Loop Filter
Resistor and capacitor in the loop filter (Fig. 6) gen-
erate a pole at the origin and a zero at . Capacitor (6)
and the combination of and are used to add extra poles at
frequencies higher than the PLL bandwidth to reduce reference
feedthrough and decrease the spurious sidebands at harmonics Notice that the maximum phase margin is only a function of
of the reference frequency. The thermal noise of and , (ratio of and ) and for less than 1 the phase margin is
although filtered by the loop, directly modulates the VCO con- less than 20 which makes the loop practically unstable.
trol voltage and can cause substantial phase noise in the VCO To complete our loop analysis we force to
if the resistors are not sized properly. The capacitors and resis- be the crossover frequency of the loop and get
tors of the loop filter should be properly chosen to perform the
required filtering function and maintain the stability of the loop
without introducing too much noise. Fig. 7 shows a linearized (7)
phased-locked loop model. In a third-order loop, the loop filter
contains only , and and its impedance can be written
as Now we can define a loop filter design recipe as follows.
1) Find from the VCO simulation.
(2) 2) Choose a desired phase margin and find from (6).
3) Choose the loop bandwidth and find from (5).
where and . The open loop transfer 4) Select and such that they satisfy (7).
function of the third-order PLL is 5) Calculate the noise contribution of . If the calculated
noise is negligible the design is complete, otherwise go
(3) back to step four and increase .
The same loop analysis can be repeated for a fourth-order
where is the VCO gain constant and is the charge pump loop. In this case the phase margin is
current. The phase margin of the loop is

(4) (8)
784 IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 8. Die micrograph.

where 3) Choose the loop bandwidth and find from (9).


4) Select and such that they satisfy (10).
5) Calculate the noise contribution of and .If their
noise contribution is negligible the design is complete,
otherwise go back to step four and increase .
Notice that in a fourth-order loop there are two degrees of
freedom in choosing and to achieve a de-
sired phase margin. Therefore, the suppression of the spurious
sidebands can be improved without reducing the phase margin
or the loop bandwidth.
In our system the maximum VCO gain constant is
and . The crossover frequency for the maximum 500 MHz/V. With this VCO gain, and loop filter values
phase margin is shown in (9), at the bottom of the page. of k pF pF
Finally for to be the crossover frequency it should satisfy k pF, and A, the crossover frequency
(10), shown at the bottom of the page. is about 280 kHz with a 46 phase margin. The calculated
As in the third-order loop the maximum phase margin is not a contribution to VCO phase noise at 10 MHz offset frequency
function of the absolute values of the ’s and ’s and is only a is 137 dBc/Hz, which is negligible compared to the intrinsic
function of their ratios ( , and ). The loop noise of the VCO.
filter design recipe for the fourth-order loop is modified as fol-
lows. V. MEASUREMENT RESULTS
1) Find from the VCO simulation. The frequency synthesizer is designed in a 0.24- m CMOS
2) Choose a desired phase margin and find , technology. Fig. 8 shows the die micrograph of the synthesizer
and from (8) and (9). with an area of 1 mm 1.6 mm, including pads.

(9)

(10)
RATEGH et al.: CMOS FREQUENCY SYNTHESIZER 785

Fig. 11. ILFD phase noise measurements.

Fig. 9. VCO tuning range.

Fig. 12. Phase noise of the synthesizer output signal.

Fig. 10. ILFD locking range and power consumption as a function of incident TABLE II
amplitude. ILFD PERFORMANCE SUMMARY

The analog blocks (VCO, ILFD, and prescaler) are supplied


by 1.5 V while the digital portions of the synthesizer are sup-
plied by 2 V. The reason for this choice of supplies is to achieve a
larger tuning range for the VCO. The accumulation mode MOS
capacitors in this technology have a flatband voltage ( )
around zero volts. Thus to get the full range of capacitor varia-
tion the control voltage should exceed the VCO supply to pro-
duce a net negative voltage across the varactors in Fig. 3. To amplitude is less than 0.8 mW while the locking range exceeds
eliminate a need for multiple supplies the VCO can be biased 1000 MHz ( of the center frequency).
with a PMOS current source, and by connecting the sources of The ILFD phase noise measurement results are shown in
M1 and M2 to ground. More than 500 MHz (10% of the center Fig. 11. The solid line shows the phase noise of the HP83732B
frequency) of VCO tuning range is achieved for a 1.5-V control signal generator used as the incident signal. The dashed line is
voltage variation (Fig. 9). the phase noise of the free–running ILFD. The two other curves
The free-running oscillation frequency of the ILFD changes are the phase noise of the ILFD when locked to two different
more than 110 MHz ( of the center frequency) for a 1.5-V incident frequencies. The curve marked as middle frequency
control voltage variation. is measured when the incident frequency is in the middle of
Fig. 10 shows the locking range of the ILFD as a function the locking range and the edge frequency curve is measured at
of the incident amplitude for two different control voltages. As the lower edge of the locking range. At low offset frequencies
expected, changing the control voltage only changes the opera- the output of the frequency divider follows the phase noise of
tion frequencies and not the locking range. The ILFD’s average the incident signal and is 6 dB lower due to the divide-by-two
power consumption is also shown on the same figure. Increasing operation. However, at larger offset frequencies the added noise
the incident amplitude increases the locking range and the av- from the divider itself, the external amplifier, and measurement
erage power consumption. The average power at 1-V incident tools reduces the 6 dB difference between the incident and
786 IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

TABLE III frequency divider loaded with the same capacitance as in the
MEASURED SYNTHESIZER PERFORMANCE ILFD consumes almost an order of magnitude more power than
the ILFD with a 600-MHz locking range. The measurement re-
sults of a fast flip-flop based divider in an advanced 0.1- m
CMOS technology show a power consumption of 2.6 mW at
5 GHz [8] which is more than four times the power of the ILFD
with a 600 MHz locking range.
Table III summarizes the performance of the synthesizer. The
spurious sidebands at offset frequencies of twice the reference
signal are more than 54 dB below the carrier. The spurs are
mainly due to charge injection from the and signals to
the loop, and can be reduced significantly by using a cascode
structure for transistors M1–M4 (Fig. 6). Better matching be-
tween the up and down current sources also improves the side-
band spurs. Of the 25-mW total power consumption, less than
3.8 mW is consumed by the VCO and ILFD combined. This low
power consumption is achieved by the optimized design of the
spiral inductors in the VCO and ILFD. The prescaler operates
at 2.5 GHz and consumes 19 mW, of which about 40% is con-
sumed in the first 2/3 dual modulus divider. Therefore the ILFD,
which takes advantage of narrowband resonators, consumes an
order of magnitude less power than the first 2/3 dual modulus
divider, while operating at twice the frequency.
output phase noise. The ILFD phase noise measurements for
offset frequencies higher than 200 kHz are not accurate due to ACKNOWLEDGMENT
the dominance of noise from the external amplifier.
The authors would like to thank Dr. M. Hershenson, Dr. S.
The spurious tones at 11-MHz offset frequency from the
Mohan, and T. Soorapanth for their valuable technical discus-
center frequency are more than 45 dB below the carrier. The
sions and help. They also thank National Semiconductor for fab-
spurs at the 22-MHz offset frequency are at 54 dBc. Since
ricating the chip.
the LO spacing is twice the reference frequency, the spurs at
11-MHz offset frequency fall at the edge of each channel and
are less critical than the 22-MHz spurs which are located at REFERENCES
the center of adjacent channels. With the 54 dBc spurs at
[1] T. S. Aytur and B. Razavi, “A 2-GHz, 6 mW BiCMOS frequency synthe-
22 MHz offset frequency, an undesired adjacent channel may sizer,” IEEE J. Solid-State Circuits, vol. 30, pp. 1457–1462, Dec. 1995.
be 44 dB stronger than the desired channel for a minimum 10 [2] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800 fre-
dB signal-to-interference ratio. quency synthesizer,” in ISSCC Dig., 1998, pp. 372–373.
[3] M. Hershenson, S. S. Mohan, S. P. Boyd, and T. H. Lee, “Optimization
Phase noise measurements of the complete synthesizer output of inductor circuits via geometric programming,” in Design Automation
signal are shown in Fig. 12. The phase noise at small offset fre- Conf. Dig., June 1999, pp. 994–998.
quencies is mainly determined by the phase noise of the ref- [4] C. G. S. M. H. Perrott and T. L. Tewksbury, “A 27-mW CMOS frac-
tional-N synthesizer using digital compensation for 2.5-Mb/s GFSK
erence signal. The phase noise measured at offset frequencies modulation,” IEEE J. Solid-State Circuits, vol. 32, pp. 2048–2059, Dec.
beyond the PLL bandwidth is the inherent VCO phase noise. 1997.
The phase noise at 1-MHz offset frequency is measured to be [5] A. S. Porret, T. Melly, and C. C. Enz, “Design of high-Q varactors for
low-power wireless applications using a standard CMOS process,” in
101 dBc/Hz. The phase noise at 22 MHz offset frequency is Custom Integrated Circuits Conf. Dig., May 1999, pp. 641–644.
extrapolated to be 127.5 dBc/Hz. Therefore the signal in the [6] H. R. Rategh and T. H. Lee, “Superharmonic injection-locked frequency
adjacent channel can be 43 dB stronger than that of the desired dividers,” IEEE J. Solid-State Circuits, vol. 34, pp. 813–821, June 1999.
[7] H. R. Rategh, H. Samavati, and T. H. Lee, “A 5GHz, 1mW CMOS
channel for a 10 dB signal–to–interference ratio. voltage controlled differential injection-locked frequency divider,” in
Custom Integrated Circuits Conf. Dig., May 1999, pp. 517–520.
[8] B. Razavi, K. F. Lee, and R. H. Yan, “Design of high-speed, low-power
VI. CONCLUSION frequency dividers and phase-locked loops in deep submicron CMOS,”
In this work we demonstrate the design of a fully integrated, IEEE J. Solid-State Circuits, vol. 30, pp. 101–109, Feb. 1995.
[9] H. Samavati, H. R. Rategh, and T. H. Lee, “A 5GHz CMOS wire-
5-GHz CMOS frequency synthesizer designed for a U-NII band less-LAN receiver front-end,” IEEE J. Solid-State Circuits, vol. 35, pp.
WLAN system. The tracking injection-locked frequency divider xxx–xxx, May 2000.
used as the first divider in the PLL feedback loop reduces the [10] D. Shaeffer, A. Shahani, S. Mohan, H. Samavati, H. Rategh, M. Her-
shenson, M. Xu, C. Yue, D. Eddleman, and T. Lee, “A 115-mW, 0.5-m
power consumption considerably without limiting the perfor- CMOS GPS receiver with wide dynamic-range active filters,” IEEE J.
mance of the PLL. Table II summarizes the performance of the Solid-State Circuits, vol. 33, pp. 2219–2231, Dec. 1998.
ILFD. The power consumption of two flip-flop based frequency [11] A. Shahani, D. Shaeffer, S. Mohan, H. Samavati, H. Rategh, M. Her-
shenson, M. Xu, C. Yue, D. Eddleman, and T. Lee, “Low-power di-
dividers at 5 GHz are also listed for comparison purposes. In viderless frequency synthesis using aperture phase detector,” IEEE J.
a 0.24- m CMOS technology a simulated SCL flip-flop based Solid-State Circuits, vol. 33, pp. 2232–2239, Dec. 1998.
RATEGH et al.: CMOS FREQUENCY SYNTHESIZER 787

[12] T. Soorapanth, C. P. Yue, D. K. Shaeffer, T. H. Lee, and S. S. Wong, Hirad Samavati (S’99) received the B.S. degree
“Analysis and optimization of accumulation-mode varactor for RF ICs,” in electrical engineering from Sharif University of
in Symp. VLSI Circuits Dig., 1998, pp. 32–33. Technology, Tehran, Iran, in 1994, and the M.S.
[13] M. Steyaert, M. Borremans, J. Janssens, B. D. Muer, N. Itoh, J. degree in electrical engineering from Stanford
Craninckx, J. Crols, E. Morifuji, H. S. Momose, and W. Sansen, “A University, Stanford, CA, in 1996. He currently is
single-chip CMOS transceiver for DCS-1800 wireless communica- pursuing the Ph.D. degree at Stanford University.
tions,” in ISSCC Dig., 1998, pp. 48–49. During the summer of 1996, he was with Maxim
[14] C. P. Yue, C. Ryu, J. Lau, T. H. Lee, and S. S. Wong, “A physical model Integrated Products, where he designed building
for planar spiral inductors on silicon,” in IEDM Tech. Dig., 1996, pp. blocks for a low-power infrared transceiver IC. His
6.5.1–6.5.4. research interests include RF circuits and analog
[15] C. P. Yue and S. S. Wong, “On-chip spiral inductors with patterned and mixed-signal VLSI, particularly integrated
ground shields for Si-Based RF IC’s,” in Symp. VLSI Circuits Dig., 1997, transceivers for wireless communications.
pp. 85–86. Mr. Samavati received a departmental fellowship from Stanford University in
1995 and a fellowship from the IBM Corporation in 1998. He is the winner of
the ISSCC Jack Kilby outstanding student paper award for the paper “Fractal
Capacitors” in 1998.

Thomas H. Lee (M’96) received the S.B., S.M. and


Sc.D. degrees in electrical engineering, all from the
Massachusetts Institute of Technology (MIT), Cam-
bridge, in 1983, 1985, and 1990, respectively.
He joined Analog Devices in 1990, where he
was primarily engaged in the design of high-speed
Hamid R. Rategh (S’99) was born in Shiraz, Iran clock recovery devices. In 1992, he joined Rambus,
in 1972. He received the B.S. degree in electrical Inc., Mountain View, CA, where he developed
engineering from Sharif University of Technology, high-speed analog circuitry for 500 megabyte/s
Tehran, Iran, in 1994 and the M.S. degree in CMOS DRAM’s. He has also contributed to the
biomedical engineering from Case Western Reserve development of PLL’s in the StrongARM, Alpha,
University, Cleveland, OH, in 1996. He is currently and K6/K7 microprocessors. Since 1994, he has been an Assistant Professor
pursuing the Ph.D. degree in the Department of electrical engineering at Stanford University, where his research focus
of Electrical Engineering, Stanford University, has been on gigahertz-speed wireline and wireless integrated circuits built
Stanford, CA. in conventional silicon technologies, particularly CMOS. He holds 12 U.S.
During the summer of 1997, he was with Rockwell patents and is the author of a textbook, The Design of CMOS Radio-Frequency
Semiconductor System, Newport Beach, CA, where Integrated Circuits (Cambridge, MA: Cambridge Press, 1998), and is a
he was involved in the design of a CMOS dual-band, GSM/DCS1800, direct coauthor of two additional books on RF circuit design. He is also a cofounder
conversion receiver. His current research interests are in low-power radio fre- of Matrix Semiconductor.
quency (RF) integrated circuits design for high-data-rate wireless local area net- Dr. Lee has twice received the “Best Paper” award at the International Solid-
work systems. State Circuits Conference, was coauthor of a “Best Student Paper” at ISSCC,
Mr. Rategh received the Stanford Graduate Fellowship in 1997. He was and recently won a Packard Foundation Fellowship. He is a Distinguished Lec-
a member of the Iranian team in the 21st International Physics Olympiad, turer of the IEEE Solid-State Circuits Society, and was recently named a Dis-
Groningen, the Netherlands. tinguished Microwave Lecturer.
536 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

A Fully Integrated CMOS Frequency Synthesizer


With Charge-Averaging Charge Pump and Dual-Path
Loop Filter for PCS- and Cellular-CDMA
Wireless Systems
Yido Koo, Hyungki Huh, Yongsik Cho, Jeongwoo Lee, Joonbae Park, Kyeongho Lee, Deog-Kyoon Jeong, and
Wonchan Kim

Abstract—A fully integrated CMOS frequency synthesizer


for PCS- and cellular-CDMA systems is integrated in a 0.35- m
CMOS technology. The proposed charge-averaging charge pump
scheme suppresses fractional spurs to the level of noise, and
the improved architecture of the dual-path loop filter makes
it possible to implement a large time constant on a chip. With
current-feedback bias and coarse tuning, a voltage-controlled
oscillator (VCO) enables constant power and low gain of the VCO.
Power dissipation is 60 mW with a 3.0-V supply. The proposed
frequency synthesizer provides 10-kHz channel spacing with
phase noise of 121 dBc/Hz in the PCS band and 127 dBc/Hz
in the cellular band, both at 1-MHz offset frequency.
Index Terms—Bonding-wire inductor, CMOS RF, coarse tuning,
dual-path loop filter, fractional- -type prescaler, frequency syn-
thesizer, phase noise, phase-locked loop.
Fig. 1. Dual-band CDMA RF transceiver.

I. INTRODUCTION
specification [3]. The lower the phase noise of the LO signal is,
W IRELESS systems, such as PCS-CDMA, cellular
CDMA, and JSTD-018PCS, require the frequency
synthesizer to have precise channel spacing and low phase
the less unwanted signal around the carrier is modulated within
the in-band channel.
Table I shows worldwide mobile frequency standards and RF
noise to meet the overall noise specification and to prevent un-
phase-locked loop (PLL) requirements. CDMA systems require
wanted signal mixing of the interferer. Most existing frequency
fast switching time with precise accuracy of the channel fre-
synthesizers are implemented in silicon germanium (SiGe) or
quency. The channel raster is 30 kHz in cellular and 50 kHz
bipolar technologies, and use several external devices such as
in PCS systems and, to support the dual-band solution of the
temperature-compensated crystal oscillator (TCXO) and loop
CDMA system, the frequency resolution of the synthesizer must
filter. Because of cost and power consumption requirements,
be 10 kHz. This is a major limiting factor in the reduction of
fully integrated CMOS RF building blocks are crucial and have
the locking time and root mean square (rms) phase error. It also
been widely explored [1], [2].
makes it difficult to achieve single-chip integration due to the
Fig. 1 shows an example of a dual-band RF transceiver ar-
loop filter that has a large time constant.
chitecture for PCS- and cellular CDMA. A local oscillator (LO)
In Section II, the special features of the proposed frequency
signal from a dual-band frequency synthesizer is fed to the first
synthesizer are discussed. Section III describes several building
mixer of the receiver for downconversion and it is also used in
blocks of the synthesizer. The measurement results are given in
the transmitter for upconversion. The noise requirement of the
Section IV, and conclusions are presented in Section V.
frequency synthesizer is determined by the blocking profile of
the system, which is calculated from the power of signal and in-
terferer, minimum signal-to-noise ratio (SNR), and bandwidth II. SYSTEM ARCHITECTURE
The proposed PLL is a monolithic integrated circuit that per-
forms dual-band RF synthesis for CDMA wireless communica-
Manuscript received July 28, 2001; revised November 9, 2001.
Y. Koo, H. Huh, Y. Cho, D.-K. Jeong, and W. Kim are with the School of tion applications without any external device. Fig. 2 shows the
Electrical Engineering and Computer Science, Seoul National University, Seoul block diagram of the fractional- -type frequency synthesizer
151-742, Korea (e-mail: ydkoo@iclab.snu.ac.kr). architecture. The external reference frequency is mainly
J. Lee, J. Park, and K. Lee are with GCT Semiconductor, Inc., San Jose, CA
95131 USA. 19.68 MHz, and 19.8 and 19.2 MHz are also supported. The
Publisher Item Identifier S 0018-9200(02)03676-4. voltage-controlled oscillator (VCO) oscillates at 1.7 GHz for
0018-9200/02$17.00 © 2002 IEEE
KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER 537

TABLE I
WORLDWIDE MOBILE FREQUENCY STANDARDS AND RF-PLL REQUIREMENTS

Fig. 3. Timing diagram of reference and VCO inputs of PFD in locked state
when the fraction is 1/3.

between reference and VCO inputs at every cycle of operation


Fig. 2. N frequency synthesizer architecture.
Fractional- of the PFD. For example, to achieve a locking at 1/3 fractional
frequency, as shown in Fig. 3, the prescaler division factor is
the PCS band, and at 900 MHz for the cellular band, and em- in one cycle and in the other two cycles in every suc-
ploys bonding wire as the inductor of the inductance–capaci- cessive three cycles. It produces voltage ripples in the control
tance (LC) tank. signal of the VCO and therefore a fractional spur occurs. How-
To meet the frequency resolution of 10 kHz, is 10 kHz ever, the average phase error during one circulation is zero in
and loop bandwidth is only 1 kHz with an integer- -type the locked state. As seen in Fig. 3, the sum of successive phase
prescaler. To reduce rms phase error, it is very important for the errors during three clock cycles is zero. This is the motivation
PLL to have a wide bandwidth. The fractional- architecture, of the proposed scheme.
compared with its integer- counterpart, has a wider loop Fig. 4 shows the proposed charge-averaging scheme and its
bandwidth with the same frequency resolution. However, it operation. The charge pump is composed of four current sources
suffers from a major drawback called fractional spur. and four sampling capacitors. The same up/dn signals are fed to
The fractional- structure is employed for the prescaler in each current source, which has 1/3 of the total current. Since
the proposed architecture. Channel selection and other control the fractional coefficient of fractional- is three in this design,
signals are provided through a serial interface. To suppress the we use four pairs of switches and four capacitors in the charge
fractional spur, a special type of charge pump is designed. The pump. Each pair of switches – and capacitor stores
new scheme of the dual-path loop filter enables flexible filter the charge that is injected from the pump during and then
design for on-chip integration. The VCO combined with cur- dumps it in the next period in turn. In the locked state, the sum
rent-feedback bias and coarse tuning enables constant power of the phase errors in the three cycles is zero; therefore, the
and low gain of the VCO. voltage of after charge summing during is the same
as . This results in no voltage ripple in the dump mode. The
III. SYNTHESIZER BUILDING BLOCKS switching noise due to charge sharing and clock feedthrough
may affect the amount of charge in the capacitors. These could
A. Charge Pump With Charge-Averaging Scheme be static errors. However, they influence each capacitor equally.
While the reference spur occurs in integer- synthesis due This type of static dc error does not cause voltage ripples, i.e., a
to charge-pump mismatch, fractional- synthesis suffers from fractional spur.
the fractional spur caused by the phase difference in the locked Fig. 5 shows the behavioral simulation results of the conven-
state as well as charge-pump mismatch. Fig. 3 shows the timing tional fractional- architecture and the proposed charge-aver-
diagram of the reference and the VCO inputs of the phase/fre- aging scheme. In simulation, the VCO gain is set as perfectly
quency detector (PFD) in the locked state. linear and neither current mismatch nor clock feedthrough in
The fractional spur mainly stems from the variation of the the switch is assumed. Other characteristics are the same in both
prescaler division factor, in other words, the phase difference cases, except for the charge pump. In the conventional scheme
538 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

Fig. 6. Mode change of charge-pump operation.

Fig. 4. Charge-averaging scheme. (a) Block diagram. (b) Its operation.

N
Fig. 5. Behavioral simulation of spur in fractional- architecture. (a) With
conventional scheme. (b) With proposed charge-averaging scheme.

shown in Fig. 5(a), approximately 50 dBc of fractional spur is


found, but in the proposed scheme shown in Fig. 5(b), the frac-
tional spur is suppressed to the noise level. Fig. 7. Loop filter characteristics. (a) Conventional second-order loop filter.
With respect to the locking time, the charge-averaging (b) Dual-path loop filter.
scheme can exhibit an undesirable effect. If the size of the
sampling capacitor is very small compared to that of the loop
filter capacitor, the locking time is increased. To solve this Therefore, there exists a tradeoff between loop bandwidth and
problem, in the acquisition mode, the charge pump operates in loop stability.
the same way as the normal charge pump, which means that
the charge pump is directly connected to the loop filter. After B. Dual-Path Loop Filter
locking to the desired frequency, the charge-averaging mode is Most PLLs use a second-order loop filter to suppress the
employed (Fig. 6). Therefore, there is no additional burden in control voltage ripple and to guarantee an appropriate phase
locking time. margin. Fig. 7(a) shows a conventional second-order loop filter
In the ac analysis of the loop characteristic, time delay de- and the expression for control voltage in ac analysis. As de-
grades the phase margin. As the time delay goes larger, the scribed above, the frequency resolution is 10 kHz and the band-
system becomes more unstable. The averaging method in this width of the PLL is a few kilohertz. This means that more than
scheme results in an added time delay in the loop characteristic. 1 nF of capacitance should be integrated on a chip, which is
KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER 539

Fig. 8. Proposed architecture of dual-path loop filter.

the major limiting factor for on-chip integration. In addition,


the thermal noise generated in a large resistor is modulated to
phase noise via the control signal. The dual-path loop filter in
Fig. 7(b) can be a solution for this problem. It separates the
loop with and , so it is possible to design the loop filter
more freely in conjunction with the pump current while keeping Fig. 9. Bonding wire for inductor of VCO. (a) Pad and lead frame. (b)
the loop transfer function nearly the same as that of a normal Modeling.
second-order loop filter.
An example of the dual-path loop filter was proposed by
Craninckx and Steyaert [4]. In spite of many advantages in-
herent in this architecture, it has two active devices, an amplifier
and a current adder. These inject additional noise into the con-
trol voltage, and after modulation in the VCO, the phase noise
may increase. In addition, the floating capacitor across the am-
plifier is implemented with the metal-to-metal capacitor, so it
requires a large area.
Fig. 8 shows the new dual-path loop filter implementation that
is proposed in this paper. A unity-gain buffer is inserted between
and to separate and . If continuously follows ,
the operation is the same as that of the normal second-order loop
filter. It is less noisy because there is only one active device and
requires a smaller area since there is no floating capacitor.
and are implemented using two pMOS transistors, whose
source, drain, and bulk are tied to a separate, quiet supply.

C. Voltage-Controlled Oscillator
Two major issues in the design of the VCO are low phase
noise to meet the overall noise figure criteria and high gain
linearity for robust stability. Phase noise is mainly dependent
on the quality factor of the LC tank [5]. Although an
on-chip spiral inductor has recently been widely explored [6], a Fig. 10. Circuit diagram of (a) VCO and (b) bias circuit.
bonding-wire inductor is superior to a spiral inductor in terms
of resistance, i.e., quality factor. In addition, the bonding-wire variable capacitor. There are two methods that have been pre-
inductor has constant inductance over a wide frequency range. viously reported for implementation of the fixed capacitor. One
Fig. 9 shows bonding-wire inductor modeling. Two pads are is a metal-to-metal capacitor [1] and the other is a MOS tran-
connected to the differential output of the VCO and the ends sistor [7]. The former is used since it is superior in terms of VCO
of two lead frames are connected as a short or by external pushing characteristics. Metal-to-metal capacitors and switches
inductance, according to the operating band, PCS or cellular. are used for coarse tuning, which are controlled by the coarse
The factor of the inductance is expressed as tuning controller. The size of the switch should be
with parasitic components ignored. If the parameters of the sufficiently large in order to avoid degradation of the factor
bonding wire are nH, , pH, of the LC tank. As a variable capacitor, an accumulation-mode
m , which are typical values in the QFN20 package, MOS transistor is used for fine tuning. The MOS capacitor has
the factor of the inductor at 1.7 GHz is 43. an inherent nonlinear capacitance. But, with the coarse tuning
Fig. 10 shows the circuit diagram of the VCO. The oscilla- scheme, the control voltage moves within 0.2 V around half
tion frequency is controlled by the combination of fixed and of the supply voltage, thereby obtaining almost linear gain.
540 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

Fig. 11. Output swing of VCO versus variations of inductance.

Fig. 13. Coarse tuning controller. (a) Block diagram. (b) Timing diagram of
operation.

Fig. 14. Die microphotograph.

of 40 MHz/V, and 30 MHz/V is typical. The total range is di-


vided into 64 levels by coarse tuning, and the frequency spacing
of two adjacent curves is approximately 7 MHz in the PCS band.

D. Coarse Tuning Architecture


Fig. 12. VCO tuning range of (a) cellular band, and (b) PCS band.
Small VCO gain is important for reduction of both spur and
phase noise. The frequency spur is directly proportional to the
The output swing level of the VCO is another key issue, VCO gain. Also, as the VCO gain is reduced, the fluctuation
since the receiver and transmitter chips expect an LO signal of of control and supply voltages are less modulated to the phase
constant power. Generally bonding-wire inductance varies by noise of the VCO. To meet the requirement of the wide fre-
10 and to compensate, the total capacitance varies accord- quency range and the small VCO gain, the coarse tuning con-
ingly. This produces the variation of the VCO swing level. The troller, shown in Fig. 13, is designed to control the fixed capac-
bias circuit in Fig. 10(b) is designed to have a constant output itor in the VCO.
swing regardless of capacitance. It monitors the operating status The coarse tuning controller, shown in Fig. 13, is composed
of the fixed and variable capacitors and provides current in the of an edge detector, counter/comparator, lock filter, and shift
direction of compensating for the output swing. Fig. 11 shows register. During coarse tuning, the rising edges of are
the VCO swing variation for the conventional and proposed counted in one period of and the result is compared
scheme when the inductance varies. to a predetermined value of desired frequency. Starting from
Fig. 12 shows the measured frequency tuning range in the the center frequency, a proper level is found by a successive
cellular and PCS band. More than 25% of the tuning range is approach. The lock-detection filter determines when to start the
obtained in each case. It is sufficient to compensate for the vari- fine tuning process by monitoring the up/dn signal. The total
ation of bonding-wire inductance. The VCO gain is a maximum elapsed time in coarse tuning is less than 100 s.
KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER 541

TABLE II
SUMMARY OF SYNTHESIZER PERFORMANCE

Fig. 15. Measured carrier spectrum.

at one output of the VCO is 1.05 nH. All control signals are
fed through a serial interface. The VCO at the bottom and the
loop filter on the left have a common analog supply and ground,
and the others are all connected to the digital supply and ground.
The power consumption of the total chip is 60 mW and the VCO
alone dissipates 12.3 mW.
Fig. 15 shows the measured carrier spectrum with the center
frequency of 980 MHz. The output power is 1.2 dBm with an
inductive load, which is sufficient for the output power require-
ments. Fig. 16 shows the measured phase noise in the cellular
and PCS band. Phase noise is 106 dBc/Hz at 100-kHz offset
and 127 dBc/Hz at 1-MHz offset in the cellular band, and
104 dBc/Hz at 100-kHz offset and 121 dBc/Hz at 1-MHz
offset in the PCS band. Fractional spurs are suppressed to the
phase noise level. Table II shows the performance summary.

V. CONCLUSION
In this paper, we demonstrate a fully integrated CMOS
frequency synthesizer designed for PCS- and cellular-CDMA
wireless systems. A charge-averaging scheme for reducing
fractional spurs and a dual-path loop filter architecture are
proposed. The new bias circuit of the VCO compensates for the
variation of output swing of the VCO caused by the variation
of bonding-wire inductance, and the proposed coarse tuning
technique achieves a small VCO gain and a wide operating
frequency range of the VCO simultaneously. The frequency
synthesizer fabricated in a 0.35- m CMOS technology offers
127-dBc/Hz and 121-dBc/Hz phase noise at 1-MHz offset
with 980 MHz and 1.76 GHz of carrier frequency, respectively.
Fig. 16. Measured PLL output phase noise. (a) Cellular band. (b) PCS band.
REFERENCES
IV. EXPERIMENTAL RESULTS AND SUMMARY [1] A. Kral, F. Behbahani, and A. A. Abidi, “RF-CMOS oscillators with
switched tuning,” in Proc. IEEE Custom Integrated Circuits Conf., May
The proposed frequency synthesizer has been fabricated in 1998, pp. 555–558.
a 0.35- m CMOS technology. Fig. 14 shows the die photo- [2] C. Lam and B. Razavi, “A 2.6-GHz/5.2-GHz frequency synthesizer in
0.4-m CMOS technology,” in Symp. VLSI Circuits Dig. Tech. Papers,
graph of the synthesizer with an area of 2.5 mm 2.0 mm in- June 1999, pp. 117–120.
cluding pads. The circuit has been measured with a nominal [3] B. Razavi, RF Microelectronics. Upper Saddle River, NJ: Prentice
3.0-V supply and a 2.7-V worst case. The bonding wire of the Hall, 1998.
[4] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800
QFN20 package used in the VCO has 1.36 nH of self-induc- frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp.
tance and 0.31 nH of mutual inductance, so the total inductance 2054–2065, Dec. 1998.
542 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

[5] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,” Joonbae Park received the B.S. and M.S. degrees
Proc. IEEE, vol. 54, pp. 329–330, Feb. 1966. in electronics engineering and the Ph.D. degree in
[6] S. Mohan, M. Hershenson, S. Boyd, and T. H. Lee, “Simple accurate electrical engineering from Seoul National Univer-
expressions for planar spiral inductances,” IEEE J. Solid-State Circuits, sity, Seoul, Korea, in 1993, 1995, and 2000, respec-
vol. 34, pp. 1419–1424, Oct. 1999. tively.
[7] J.-M. Mourant, J. Imbornone, and T. Tewksbury, “A low phase noise In 1998, he joined GCT Semiconductor Inc., San
monolithic VCO in SiGe BiCMOS,” in IEEE Radio Frequency Inte- Jose, CA, as Director of the Analog Division. He is
grated Circuits (RFIC) Symp. Dig., June 2000, pp. 65–68. currently involved in the development of CMOS RF
chip sets for WLL, W-CDMA, and wireless LAN.
His other research interests include data converters
and high-speed communication interfaces.
Dr. Park received the Best Paper Award of VLSI Design’99, Goa, India.
Yido Koo was born in Seoul, Korea, in 1973. He re-
ceived the B.S. and M.S. degrees from the School
of Electrical Engineering, Seoul National University,
Seoul, Korea, in 1996 and 1998, respectively, where
he is currently working toward the Ph.D. degree. Kyeongho Lee was born in Seoul, Korea, in 1969.
His research interests include RF building blocks He received the B.S. and M.S. degrees in electronics
and systems for wireless communication and high- engineering and the Ph.D. degree in electrical
speed interface for data communications. Currently, engineering from Seoul National University, Seoul,
he is developing a low-noise frequency synthesizer Korea, in 1993, 1995, and 2000, respectively.
for CDMA and GSM applications. He was with Silicon Image, Inc., Sunnyvale, CA,
as a Member of Technical Staff, where he worked on
CMOS high-bandwidth low-EMI transceivers. He is
currently with GCT Semiconductor Inc., San Jose,
CA, as a Co-Chief Executive Officer. His research in-
terests include various CMOS high-speed circuits for
Hyungki Huh was born in Seoul, Korea. He wire/wireless communication systems and integrated CMOS RF systems.
received the B.S. and M.S. degree in electrical
engineering from Seoul National University, Seoul,
Korea, in 1998 and 2001, respectively, where he
is currently working toward the Ph.D. degree in
electrical engineering. Deog-Kyoon Jeong received the B.S. and M.S. de-
His research interests are in the area of RF cir- grees in electronics engineering from Seoul National
cuits and systems with emphasis on the fractional fre- University, Seoul, Korea, in 1981 and 1984, respec-
quency synthesizer. tively, and the Ph.D. degree in electrical engineering
and computer sciences from the University of Cali-
fornia, Berkeley, in 1989.
From 1989 to 1991, he was with Texas Instruments
Inc., Dallas, TX, where he was a Member of Tech-
nical Staff and worked on the modeling and design of
BiCMOS gates and the single-chip implementation
Yongsik Cho was born in Daegu, Korea. He received of the SPARC architecture. He joined the faculty of
the B.S. degree in electrical engineering from Seoul the Department of Electronics Engineering and Inter-University Semiconductor
National University, Seoul, Korea, in 2000, where he Research Center, Seoul National University, as an Assistant Professor in 1991.
is currently working toward the M.S. degree in elec- He is currently an Associate Professor of the School of Electrical Engineering,
trical engineering. Seoul National University. His main research interests include high-speed I/O
His research interests are in the area of RF circuits circuits, VLSI systems design, microprocessor architectures, and memory sys-
and systems. tems.

Wonchan Kim was born in Seoul, Korea, on


December 11, 1945. He received the B.S. degree in
electronics engineering from Seoul National Univer-
sity, Korea, in 1972. He received the Dip.-Ing. and
Jeongwoo Lee received the B.S. and M.S. degrees Dr.-Ing. degrees in electrical engineering from the
in electronics engineering and the Ph.D. degree in Technische Hochschule Aachen, Aachen, Germany,
electrical engineering from Seoul National Univer- in 1976 and 1981, respectively.
sity, Seoul, Korea, in 1994, 1996, and 2000, respec- In 1972, he was with Fairchild Semiconductor
tively. Korea as a Process Engineer. From 1976 to 1982, he
He is currently a Manager with the W-CDMA team was with the Institut für Theoretische Electrotecnik
of GCT Semiconductor Inc., San Jose, CA. His cur- RWTH Aachen. Since 1982, he has been with the
rent research interests include CMOS transceiver cir- School of Electrical Engineering, Seoul National University, where he is cur-
cuitry for highly integrated radio applications. rently a Professor. His research interests include development of semiconductor
devices and design of analog/digital circuits.
1028 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

A Modeling Approach for – Fractional-N 61


Frequency Synthesizers Allowing
Straightforward Noise Analysis
Michael H. Perrott, Mitchell D. Trott, Member, IEEE, and Charles G. Sodini, Fellow, IEEE

Abstract—A general model of phase-locked loops (PLLs) is de-


rived which incorporates the influence of divide value variations.

61
The proposed model allows straightforward noise and dynamic
analyses of – fractional- frequency synthesizers and other
PLL applications in which the divide value is varied in time. Based
on the derived model, a general parameterization is presented that

61
further simplifies noise calculations. The framework is used to an-
alyze the noise performance of a custom – synthesizer imple-
mented in a 0.6- m CMOS process, and accurately predicts the
measured phase noise to within 3 dB over the entire frequency
offset range spanning 25 kHz to 10 MHz.
Index Terms—Delta, dithering, divider, fractional- , fre-
quency, modeling, noise, phase-locked loop, PLL, quantization
noise, sigma, synthesizer.
Fig. 1. Block diagram of a 6–1 frequency synthesizer.
I. INTRODUCTION
of the divide value variations is often treated in isolation of other

T HE USE OF wireless products has been rapidly increasing


in the last decade, and there has been worldwide develop-
ment of new systems to meet the needs of this growing market.
influences on the PLL [1], such as noise in the phase detector
and voltage-controlled oscillator (VCO), and overall analysis of
the synthesizer becomes cumbersome.
As a result, new radio architectures and circuit techniques are In this paper, we develop a simple model for the – syn-
being actively sought that achieve high levels of integration and thesizer that allows straightforward analysis of its dynamic and
low-power operation while still meeting the stringent perfor- noise performance. The predictions of the model compare ex-
mance requirements of today’s radio systems. One such tech- tremely well to simulated and experimental results of imple-
nique is the use of – modulation to achieve high-resolution mented – synthesizers [9], [10], [13]. In addition, we present
frequency synthesizers that have relatively fast settling times, as a PLL parameterization that simplifies calculation of the PLL
described by Riley et al. in [1], Copeland in [2], and Miller and dynamics and assessment of the synthesizer noise performance.
Conley in [3], [4]. This method has now been used in a variety To develop the – synthesizer model, we first derive a gen-
of applications ranging from accurate frequency generation [1], eral model of the PLL that incorporates the influence of divide
[5]–[7] to direct frequency modulation for transmitter applica- value variations. The derivation is done in the time domain and
tions [8]–[12]. then converted to a frequency-domain block diagram. We pa-
However, despite its increasing use, a general model of – rameterize the resulting PLL model in terms of a single func-
fractional- synthesizers to encompass dynamic and noise per- tion and illustrate its usefulness in determining the noise
formance has not previously been presented. The primary ob- performance of the PLL. The – modulator is then included
stacle to deriving such a model is that, in contrast to classical in the generalized PLL model and its impact on the PLL is an-
phase-locked loop (PLL) systems, a – synthesizer dynami- alyzed. Finally, the modeling approach is used to calculate the
cally varies the divide value in the PLL according to the output noise performance of a custom – synthesizer integrated in a
of a – modulator. Traditional methods of PLL analysis as- 0.6- m CMOS process and then compared to measured results.
sume a static divide value, and the step toward allowing for dy-
namic variations is not straightforward. As a result, the impact II. BACKGROUND
Fig. 1 displays a block diagram of a – frequency syn-
Manuscript received November 14, 2000; revised March 14, 2002. This work thesizer, along with a snapshot of the signals associated with
was supported in part by the Defense Advanced Research Projects Agency under
Contract DAAL-01-95-K-3526. various nodes in this system. A PLL in essence, the synthe-
M. H. Perrott and C. G. Sodini are with the Microsystems Technology Lab- sizer achieves accurate setting of its output frequency by locking
oratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA to a reference frequency. This locking action is accomplished
(e-mail: perrott@mit.edu).
M. D. Trott is with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA. through feedback by dividing down the VCO output frequency
Publisher Item Identifier 10.1109/JSSC.2002.800925. and comparing its phase to the phase of the reference source
0018-9200/02$17.00 © 2002 IEEE
PERROTT et al.: MODELING APPROACH FOR – FRACTIONAL- FREQUENCY SYNTHESIZERS 1029

to produce an error signal. The phase comparison operation is


done through the use of a phase/frequency detector (PFD) which
also acts as a frequency discriminator when the PLL is out of
lock. The loop filter attenuates high-frequency components in
the PFD output so that a smoothed error signal is sent to the
VCO input. It consists of an active or passive network, and is
typically fed by a charge pump which converts the error signal
to a current waveform. The charge pump is not necessary, but
provides a convenient means of setting the gain of the loop filter
and simplifies implementation of an integrator when required.
As illustrated in the figure, a key characteristic of – syn-
thesizers is that the divide value is dynamically changed in time
according to the output of a – modulator. By doing so, much
higher frequency resolution can be achieved for a given PLL
bandwidth setting than possible with classical integer- fre-
quency synthesizers [1].

III. TIME-DOMAIN PLL MODEL


We now derive time-domain models for each individual PLL
block shown in Fig. 1. The primary focus of our effort is on
obtaining a divider model incorporating dynamic changes to its
value. However, the derivation of this model requires careful at-
tention to the way we model the PFD. In particular, we will pa- Fig. 2. Tristate phase-frequency detector and associated signals.
rameterize signals associated with a tristate PFD with sequences
that can be directly related to the divider operation. This ap- ever, the pulsed behavior of the PFD output adds some com-
proach is extended to an XOR-based PFD by relating its output plexity in deriving the value of that gain, so our derivation will
to that of a tristate PFD. Following a brief derivation of the VCO consist of two steps. The first step relates the input phase dif-
model, we then obtain the divider model by relating its opera- ference to the sequence. The second step relates the
tion to the VCO model and the PFD sequences discussed above. sequence to an impulse approximation of the waveform.
Finally, the charge pump and loop filter models are described, The relationship of to the phase difference,
and the overall PLL model constructed. , is defined as
A. Tristate PFD
(1)
The tristate PFD and its associated signals are shown in Fig. 2.
The output of the detector, , is characterized as a series of To verify the above definition, one observes from Fig. 2 that a
pulses whose widths are a function of the relative phase differ- phase error of causes to be .
ence between rising edges of and . We param- The impact of the sequence on the PLL dynamics is cum-
eterize the phase difference between and with bersome to model analytically since the pulse-width modulated
the discrete-time sequences and , respectively. PFD output has a nonlinear influence on the PLL dynamics.
is nominally zero, and is defined in (1). The se- However, a simple approximation greatly eases our efforts—we
ries of pulses that form are parameterized by the following simply represent the PFD output as an impulse sequence rather
discrete-time sequences. than a modulated pulse sequence. Fig. 3 illustrates this approx-
• : time instants at which the rising edges of the reference imation; pulses in are represented as impulses with area
clock occur. equal to their corresponding pulse, as described by
• : time instants at which the rising edges of the
divider output occur. (2)
• : time difference between rising edges of and
. We discuss the significance of the above expression when we
Assuming a constant reference frequency, consecutive values derive the frequency-domain model of the PLL in Section IV.
for are related for all as Our justification for the impulse approximation is
heuristic—each PFD output pulse has much smaller width
than the loop filter impulse response, and therefore acts like an
where is the reference period. We will make use of the pa- impulse when the two are convolved together. Obviously, the
rameterization in deriving the PFD model; the other sequences accuracy of this approximation depends on how much smaller
will be used when deriving the divider model. the PFD output pulse widths are compared to the dominant
Since phase detection is a memoryless operation, its influence time constant of the loop filter. Since the PFD pulses must
on the PLL dynamics is sufficiently modeled by its gain. How- be smaller than a reference period, high accuracy is achieved
1030 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 3. Impulse sequence approximation of PFD output.

when the reference frequency is much higher than the loop


filter (PLL) bandwidth. Fortunately, this condition is satisfied
when dealing with – synthesizers since a high reference
frequency to PLL bandwidth ratio is required to adequately
suppress the – quantization noise. For additional discussion
on this issue, see [13].

B. XOR-Based PFD Fig. 4. XOR-based PFD, associated signals, and E (t) decomposition.
An XOR-based PFD is shown in Fig. 4 [13]–[15], along with
associated signals that will be discussed later. Assuming the tical model to that of the tristate topology except that its gain is
PFD is not performing frequency acquisition, the signal is increased by a factor of 2.
simply passed to the output, , so that the detector operates
C. Voltage-Controlled Oscillator
as an XOR phase detector. As such, the detector outputs an
average error of zero when and are in quadrature, For our purposes, only two equations are needed to model
and is nominally a two-level square wave rather than the the VCO. The first relates deviations in the VCO phase, defined
trilevel short-pulse waveform obtained with the tristate design. as , to changes in the VCO input voltage, . Since
The combination of having wide pulses and only two output VCO phase is the integral of VCO frequency, and deviations in
levels allows the XOR-based PFD to achieve high linearity, VCO frequency are calculated as , where is in units
which is desirable for – synthesizer applications to avoid of hertz per volt, we have
folding down – quantization noise [13].
To model the XOR-based PFD, we simply relate its associ- (3)
ated signals to the tristate detector so that the previous results
The second equation relates the absolute VCO phase, defined as
can be readily applied. Fig. 4 displays the signals associated
, to deviations in the VCO phase and the nominal VCO
with this PFD, and reveals that the output can be decom-
frequency :
posed into the sum of a square wave, , and a trilevel
pulse waveform, . The first component is independent of (4)
the input phase difference to the detector and presents a spurious
Our modeling efforts will be primarily focused on deviations
noise signal to the PLL; its influence can be made negligible
in the VCO phase, so that (3) is of the most interest. However,
with proper design. The second component, , captures the
(4) is required in the divider derivation that follows.
impact of the input phase difference, , on the
PFD output, and can be parameterized according to the width of D. Divider
its pulses, where
Modeling of the divider will be accomplished by first re-
lating the PFD pulse widths, , to the VCO phase deviations,
, and the divide value sequence, . Given this rela-
As with the tristate detector, the impulse approximation can be tionship, the divider model is “backed out” using the PFD gain
applied to obtain expression in (1).
We begin by noting that the divider output edges occur
whenever the absolute VCO phase, , completes
radian increments of phase. As stated in (4), is
composed of a ramp in time, , and phase variations,
which, if we ignore , is the tristate expression multi- . These statements are collectively illustrated in Fig. 5.
plied by a factor of 2. Thus, if we ignore the phase offset of Note that changes in occur at the rising edges of the
and the square wave , the XOR-based PFD has an iden- divider.
PERROTT et al.: MODELING APPROACH FOR – FRACTIONAL- FREQUENCY SYNTHESIZERS 1031

Fig. 6. Time-domain model of PLL.

Carrying out the summation operation, we obtain


Fig. 5. Relationship of divider edges to instantaneous VCO phase, 8 (t).

Now, we can relate to the VCO phase signal and divider


sequence using (4) and Fig. 5. The first of two key equations is
derived from Fig. 5 as
Assuming initial conditions are zero, this last expression be-
(5) comes
The second key equation is obtained by evaluating (4) at time
instants and and subtracting the resulting
expressions: (8)

The final form of the desired equation is obtained by modi-


fying (8) according to the following statements:
• Define , ,
.
which, since and , is equivalently • Approximate .
written as As such, we obtain

(6) (9)
We obtain the desired divider model by replacing with the
We combine the two key equations into one formulation by sub- PFD gain expression in (1) and assuming is zero.
stitution of (6) into (5):
(10)

It is important to note that the only approximation made in de-


riving (10) is that . Essentially, we
Rearrangement of this last expression then produces are ignoring the nonuniform time sampling of the VCO phase
deviations. As discussed in [13] and verified by actual imple-
mentations [9], [10], this approximation is quite accurate in
practice even when the PLL is modulated.
(7)
E. Charge Pump and Loop Filter
Equation (7) is a difference equation relating all variables of The charge pump and loop filter relate the PFD output
interest; to remove the differences we sum the formulation over to the VCO input . We model the charge pump as a simple
all positive time samples up to sample : scaling operation on of value . The time domain model
of the loop filter is characterized by its impulse response, .

F. Overall Model
We now combine the results of Section III-A–E to obtain
the overall time-domain PLL model shown in Fig. 6. The PFD
model is obtained from (1) and (2), the divider model from
(10), and the VCO model from (3). As discussed earlier, the
1032 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

XOR-based PFD has a factor of two larger gain than the tris-
tate design, which is captured by the factor in the PFD model.
For convenience in analysis to follow, we also define an abstract
signal, , as the output of the divider accumulation action.
Some observations are in order. First, the divider effectively
samples the continuous-time output phase deviation of the
VCO, , and then divides its value by . The output
phase of the divider, , is influenced by the integration
of deviations in the divider value, . The integration of
is a consequence of the fact that the divider output is a
phase signal, whereas causes an incremental change in
the frequency of the divider output. Second, the PFD, charge
pump, and loop filter translate the discrete-time error signal
formed by and to the continuous-time input of
the VCO, . These elements, along with the divider, also
act as a D/A converter for mapping changes in to .

IV. FREQUENCY-DOMAIN PLL MODEL


Derivation of a frequency-domain model of the PLL is
Fig. 7. Pseudocontinuous method of modeling a sampling operation in the
complicated by the sampling operation and impulse train frequency domain.
modulator shown in Fig. 6. We discuss a simple approximation
for the sampling operation and impulse train modulator that
results in a linear time-invariant PLL model. This method,
known as pseudocontinuous analysis [16], takes advantage of
the fact that the impulsive output of the PFD is low-pass filtered
in continuous time by the loop filter.

A. Pseudocontinuous Approximation
Consider a signal that is sampled with period and then
converted to an impulse sequence , as described by

Fig. 8. Frequency-domain model of PLL.

where . The frequency-domain relationship


copies of within except for the baseband copy,
between and is found by taking the Fourier transform
which allows us to approximate the relationship between
of the above expression, which leads to
and in the frequency domain as a simple scaling operation
of . In so doing, we ignore aliasing effects that will occur
if there is frequency content in at frequencies beyond
the range of to . However, our analysis will
This expression reveals that the Fourier transform of , be reasonably accurate when performing closed-loop analysis
, is composed of multiple copies of the Fourier transform for most frequencies of interest in our application. The double
of , , that are scaled in magnitude by and shifted outline of the box in the figure is meant to serve as a reminder
in frequency from one another with spacing . We assume that a sampling operation is taking place.
that the frequency content of is confined to frequencies
between and , so that negligible aliasing B. Resulting Model
occurs between the copies of within . The time-domain block diagram in Fig. 6 is now readily
Developing a frequency-domain model relating to converted to the frequency domain by taking the -transform
is complicated by the many copies of in of the discrete-time blocks, the Fourier transform of the
that occur due to the sampling operation. However, if we continuous-time blocks, and by applying the approximation
assume that is fed into a continuous-time low-pass filter of the sampling operation discussed above. Fig. 8 displays
with sufficiently low bandwidth, we can obtain a simple the resulting model. Note that all blocks are parameterized
approximation of the relationship between and . by the common variable , which denotes frequency in hertz,
Fig. 7 graphically illustrates a frequency-domain view of the under the assumption that all discrete-time sequences interact
sampling operation and the impact of following it with a with the continuous-time blocks as modulated impulse trains
continuous-time low-pass filter of bandwidth less than . of period . Also note that all the signals in the PLL are
The low-pass filter significantly attenuates all of the replicated still denoted in the time domain even though they interact
PERROTT et al.: MODELING APPROACH FOR – FRACTIONAL- FREQUENCY SYNTHESIZERS 1033

Fig. 9. Detailed view of PLL noise sources and examples of their respective
spectral densities.

through frequency-domain blocks. The reason for this notation


convention is that, in practice, these signals are stochastic and
do not have defined Fourier transforms, but rather are described
by their power spectral densities.
Fig. 10. Parameterized model of PLL for dynamic response and noise
V. PARAMETERIZATION OF PLL calculations.

We now parameterize the PLL dynamics depicted in Fig. 8 in


Divider/reference jitter, , corresponds to noise-induced
terms of a single function which we will call . Using this
variations in the transition times of the Reference or Divider
parameterization, we then develop a general noise model for fre-
output waveforms. A periodic reference spur is caused
quency synthesizers in which all the relevant transfer functions
by use of the XOR-based PFD, or by the tristate PFD when its
are described in terms of .
output duty cycle is nonzero. Charge-pump noise is caused by
A. Derivation noise produced in the transistors that compose the charge-pump
circuit. Finally, VCO noise includes the intrinsic noise of the
To parameterize the PLL dynamics, it is convenient to define VCO and voltage noise at the output of the loop filter. For con-
a base function that provides a simple description of all the PLL venience in later discussion, we have lumped these noise sources
transfer functions of interest. It turns out that the following def- into two categories, VCO noise and detector noise, as shown in
inition works well for this purpose. Fig. 9.
Fig. 10 displays the transfer function relationships from each
(11)
of the above noise sources to the synthesizer output. The deriva-
tion of these transfer functions is straightforward based on Fig. 9
where is the open-loop transfer function of the PLL:
and the parameterization derived earlier. Note that two
different parameterizations are shown to describe the impact of
(12)
divide value variations on the PLL output phase. The alternate
model relates changes in the divide value, , more directly to
Since is low pass in nature with infinite gain at dc,
the PLL output frequency. Its derivation follows by noting that
has the following properties:
the order of linear time-invariant blocks can be switched, and
as that
as (13)
implying that is a low-pass filter with a low frequency gain
of one.
One may try to tie an intrinsic meaning to in terms of
PLL behavior. However, it is meant only as a convenient vehicle for
for compactly describing the PLL transfer functions of interest,
as will be shown later in this section. Note that the validity of the dynamic model, and its alternate,
presented in Fig. 10, has been verified in previous work dis-
B. Application to Noise Analysis cussed in [9], [13]. The validity of the noise model will be ver-
The derived parameterization allows straightforward calcula- ified in Section VII.
tion of the noise performance of a synthesizer as a function of Calculation of spectral noise densities using Fig. 10 is
various noise sources in the PLL, which are shown in Fig. 9. complicated by the fact that both discrete-time (DT) and
1034 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

continuous-time (CT) signals are present. Three cases are of


significance, and their respective spectral noise calculations are
as follows [17]:
Case 1) CT input fed into CT filter to produce a
CT output :

(14)

Case 2) DT input fed into DT filter to pro-


duce a DT output :

(15)

Case 3) DT input fed into CT filter to produce a


CT output :

(16)

In Case (3), we assume that the DT input interacts with the CT


filter as a modulated impulse train of period .
The above spectral density calculations and Fig. 10 allow us Fig. 11. Illustration of dithering action of 6–1 modulator.
to accurately calculate the influence of the various noise sources
on the PLL output. A few qualitative observations are also in this application due to the precise matching offered by digital
order. Detector noise is low-pass filtered by the PLL dynamics, circuits.
while VCO noise is high-pass filtered by the PLL dynamics. The In general, modeling of a – modulator is accomplished by
overall noise power in the PLL output, whose integral over fre- assuming its quantization noise is independent of its input [19].
quency corresponds to the time-domain jitter of the PLL output, This leads to a linear time-invariant model that is parameterized
is a function of the PLL bandwidth. If the PLL bandwidth is very by transfer functions from the input and quantization noise to the
low, VCO noise will dominate over a wide frequency range due output. For instance, a MASH – modulator structure [19] of
to the abundant suppression of detector noise. Likewise, a high order , input , and output is described by
PLL bandwidth will suppress VCO noise over a wide frequency
(17)
range at the expense of allowing more detector noise through.
Thus, the modulator passes its input to the output along with
VI. – SYNTHESIZER MODEL quantization noise, , that is shaped by the filter .
Ideally, is white and uniformly distributed between 0 and 1
We are now ready to incorporate the – modulator into the
general PLL model. We do so by first providing a brief descrip- so that its spectrum is flat and of magnitude [20], [21].
It is convenient to parameterize the – modulator in terms
tion of – modulator fundamentals, and then provide intu-
of two transfer functions. The signal transfer function (STF) of
ition to the means by which they increase the frequency resolu-
the – modulator is defined from the input to output ,
tion of a synthesizer compared to a classical implementation in
while the noise transfer function (NTF) is defined from the base
which the divider value is held constant. Finally, we present a
quantization noise to the output. Inspection of (17) reveals
frequency-domain model of the – synthesizer and use it to
that a MASH structure of order is parameterized as
calculate the impact of the – quantization noise on the PLL
output phase. STF:
NTF:
A. – Modulator
A – modulator achieves a high-resolution signal using
only a few output levels. To do this, the modulator dithers B. Application to PLL
its output at a high rate such that the “average” value of the To understand the impact of using a – modulator to con-
dithered sequence corresponds to a high-resolution input signal trol the divide value in a frequency synthesizer, Fig. 11 contrasts
whose energy is confined to low frequencies. Appropriate the way the divide value is varied in classical versus – frac-
filtering of the output sequence removes quantization noise tional- frequency synthesizers based on the alternate model
produced by the dithering, which yields a high-resolution in Fig. 10. Note that the divide value variations are cast as con-
signal closely matching that of the input. tinuous-time signals to get the proper scale factor such that a
In – synthesizer applications, it is important to note that unit change in divide value yields an output frequency change
the – modulator is purely digital in its implementation. Thus, of Hz. In the classical case, the divide value is static except
– structures that are difficult to implement in the analog when the output frequency is changed, and the PLL output fre-
world due to high matching requirements, such as the MASH quency responds to the change according to the low-pass nature
(or cascaded) architecture [18], [19], are trivial to implement in of the PLL dynamics . In contrast, a – fractional-
PERROTT et al.: MODELING APPROACH FOR – FRACTIONAL- FREQUENCY SYNTHESIZERS 1035

Fig. 12. Parameterized model of a 6–1 synthesizer.

synthesizer constantly dithers the divide value at a high rate Fig. 13. Block diagram of prototype system.
compared to the bandwidth of such that extracts
out its low-frequency content. The low frequency content of the which is also expressed as
– output is, in turn, set by the – input , which can
have arbitrarily high resolution. Thus, the – modulator al-
lows the PLL output frequency to be controlled to a very high (18)
resolution independent of the reference frequency—a high ref- If the quantization noise spectra of is white, then
erence frequency can be used while simultaneously achieving
high-frequency resolution.
as previously discussed. In many cases, is not white and
C. Frequency-Domain Model
must be computed numerically by simulating the – modu-
To obtain the frequency-domain model of a – synthesizer, lator at a given value of .
we simply extend the PLL model in Fig. 10 to include the – Equation (18) shows that the – quantization noise is
modulator, as shown in Fig. 12. This figure depicts a general reduced in order by one due to the integrating action of the
model of a – modulator which is characterized by its STF divider. Assuming is white, the shaped noise rises at
and NTF. The base quantization noise is assumed ideal (i.e., dB/decade for frequencies . Therefore, if
white) in the illustration. the order of is chosen to be the same as the order of the
Fig. 12 offers several insights to the fundamentals of – – , the quantization noise seen at the PLL output will roll
frequency synthesis. First, we see that the shaped – quanti- off at 20 dB/decade outside the PLL bandwidth. This rolloff
zation noise passes through a digital accumulator and then the characteristic matches that of the VCO noise.
PLL dynamics, , before impacting the output phase of the
PLL. The digital accumulator, a consequence of the integrating VII. RESULTS
nature of the divider, effectively reduces the noise-shaping order
of the – by one. The PLL dynamics, , act to remove the The above methodology is now used to analyze the noise
high-frequency quantization noise produced by the – mod- performance of a prototype system described in [9], [13].
ulator. The – quantization noise adds an additional noise Fig. 13 displays a block diagram of the prototype, which
source to those already present in the PLL, but the relationship consists of a custom CMOS fractional- synthesizer IC that
from each noise source to the output phase remains purely a includes an XOR-based PFD, an on-chip loop filter that uses
function of and the nominal divide value. switched capacitors to set its time constant, a second-order dig-
ital MASH – modulator, and an asynchronous 64-modulus
D. Quantization Noise Impact on PLL divider that supports any divide value between 32 and 63.5
in half-cycle increments. An external divide-by-2 prescaler is
As Fig. 12 reveals, a – synthesizer’s noise performance used so that the CMOS divider input operates at half the VCO
is impacted by the – quantization noise in addition to the frequency, which modifies the range of divide values to include
intrinsic detector and VCO noise sources found in the classical all integers between 64 and 127. A computer interface is used
PLL. Calculation of this impact is straightforward using the pre- to set the digital frequency value that is fed into the input of the
sented modeling approach. For example, given the NTF of an – modulator.
th order MASH structure is , we calculate the im-
pact of its quantization noise on the PLL output using Fig. 12 A. Modeling
and (16) as
A linearized frequency-domain model of the prototype
system is shown in Fig. 14. The open-loop transfer function of
the system consists of two integrators, a pole at and a zero
at . Additional poles and zeros occur in the system due to
the effects of finite opamp bandwidth and other nonidealities,
1036 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 14. Linearized frequency-domain model of prototype system.

but are not significant for the analysis to follow. The


parameterization is calculated from Fig. 14 and (11) as

(19)
Fig. 15. Expanded view of PLL System.

The parameters of the system were set such that the PLL had a TABLE I
bandwidth of 84 kHz: VALUES OF NOISE SOURCES WITHIN PLL

kHz
kHz
kHz
(20)

Fig. 15 expands the block diagram of the prototype to indi-


cate the circuits of relevance and their respective noise contribu- The input-referred noise of the VCO was calculated from an
tions. A few comments are in order. First, a reference frequency open-loop VCO phase noise measurement (shown in Fig. 17) at
of 20 MHz was chosen to achieve an acceptably low impact 5-MHz frequency offset as
of – quantization noise while still allowing low-power im-
plementation of the digital logic. This choice of reference fre-
quency, in turn, required that to achieve an output dBc/Hz
carrier frequency of 1.84 GHz. The value of was set to
at MHz (21)
30 MHz/V by the external VCO. The value of was chosen
as large as practical in order to obtain good noise performance; where is 30 MHz/V. The value of the noise current
it was constrained to 30 pF due to area constraints on the die of produced by the switched-capacitor operation was calcu-
the custom IC. lated as

B. Noise Analysis (22)


Table I displays the value of each noise source shown
in Fig. 15. Many of these values were obtained through ac where is Boltzmann’s constant, and is temperature
simulation of the relevant circuits in HSPICE. Note that all in degrees Kelvin. Finally, the spectral density of the –
noise sources other than are assumed to be white, so that quantization noise was calculated as
the values of their variance suffice for their description. This
assumption holds for the input-referred VCO noise, , (23)
provided that the output phase noise of the VCO rolls off at
20 dB/dec [22], [23]; the 20 dB/dec rolloff is achieved in where is the order of the – modulator.
the model since , which has a flat spectral density, The noise sources in Table I can be classified as either
passes through the integrating action of the VCO. The actual charge-pump noise, VCO noise, or – quantization noise,
VCO deviates from the 20 dB/dec rolloff at low frequencies which we denote as , , and , respectively. For
due to noise, and at high frequencies due to a finite noise convenience, we will assume that is referred to the
floor. However, the assumption of 20 dB/dec rolloff suffices input of the VCO, so that it passes through the transfer function
for the frequency offsets of interest. before influencing the VCO output phase. Given the
PERROTT et al.: MODELING APPROACH FOR – FRACTIONAL- FREQUENCY SYNTHESIZERS 1037

values of these sources, the overall noise spectral density at the


synthesizer output is described as

(24)

where , , and are the contributions


from , , and , respectively. is given by
(18) with . and are calculated from
Fig. 10 and (14) as

(25)

Note that we have assumed that and are white, and


that since an XOR-based PFD is used. Fig. 16. Calculated noise spectra of synthesizer compared to measured results.
The task that remains is to determine the values of and
. Examination of Fig. 15 reveals that charge-pump noise
is a function of the following noise sources:

(26)

while VCO noise is a function of the noise sources

(27)

We will quickly infer the value of the functions and


in this paper; the reader is referred to [13] for more detail.
Let us first determine . Examination of Table I reveals
that is an order of magnitude larger than , , and Fig. 17. Measured closed-loop synthesizer noise and open-loop VCO noise.
. Since the noise source is switched alternately between
the positive and negative terminals of OP1, its contribution to
Fig. 17 shows measured plots of and the open-loop
will be pulsed in nature. At a nominal duty cycle of 50%,
phase noise of the VCO from the synthesizer prototype; the plots
we would expect the energy of to be split equally between
were obtained from an HP 3048A phase-noise measurement
the positive and negative terminals of OP1. As such, is then
system. It should be noted that the LSB of the – modulator
. This intuitive argument was verified using a detailed was dithered to reduce spurious content, which was necessary
C simulation of the PLL [24]. Note that a more accurate esti- due to the low order of the – modulator. The resulting spectra
mate of will take into account any offset in the nominal duty compare quite well with the calculated curve in Fig. 16 over the
cycle of the phase detector output, and the transient response of frequency offset range of 25 kHz to 10 MHz. Above 10 MHz,
the charge pump. the phase-noise measurement was limited by the sensitivity of
Now let us determine . Since Table I reveals that the measurement equipment. Note that the 60 dBc spur at
is of the same order of , we simply add these components 20-MHz offset is due to the 50% nominal duty cycle of the PFD;
to obtain . This expression is accurate at no effort was made to reduce it below this level during the design
frequencies less than the unity gain bandwidth of OP1; the process since it was acceptable for the intended application of
noise source is passed to its output with a gain of approximately the prototype.
one in this region. At frequencies beyond OP1’s bandwidth, the
expression is conservatively high since is attenuated in this
VIII. CONCLUSION
frequency range.
Based on the above information, plots of the spectra in (24) In this paper, we developed a general model of a PLL that
are shown in Fig. 16. For convenience, we have also overlapped incorporates the influence of divide value variations. A model
measured results from Fig. 17 for easy comparison, which will for – fractional- synthesizers was obtained by simply
be discussed shortly. As shown in Fig. 16, the influence of de- incorporating a – modulator model into this framework.
tector noise dominates at low frequencies, and the influence of The PLL model was parameterized by a single transfer func-
VCO and – quantization noise dominate at high frequencies. tion , which further simplifies noise calculations. The
Note that the calculations use described by (19) with the framework was used to calculate the noise performance of a
parameter values specified in (20). custom – synthesizer, and was shown to accurately predict
1038 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

measured results within 3 dB over a frequency offset range [24] M. H. Perrott, “Fast and accurate behavioral simulation of fractional-N
from 25 kHz to 10 MHz. frequency synthesizers and other PLL/DLL circuits,” in Proc. Design
Automation Conf. (DAC), June 2002, pp. 498–503.

ACKNOWLEDGMENT
The authors would like to thank the Hong Kong University of
Michael H. Perrott received the B.S. degree in elec-
Science and Technology, and in particular, J. Lau, P. Chan, and trical engineering from New Mexico State University,
P. Ko, for their support in the writing of this paper. Las Cruces, in 1988, and the M.S. and Ph.D. degrees
in electrical engineering and computer science from
the Massachusetts Institute of Technology (M.I.T.),
REFERENCES Cambridge, in 1992 and 1997, respectively.
From 1997 to 1998, he was with Hewlett-Packard
[1] T. A. Riley, M. A. Copeland, and T. A. Kwasniewski, “Delta–sigma
N
modulation in fractional- frequency synthesis,” IEEE J. Solid State
Laboratories, Palo Alto, CA, working on high-speed
61
circuit techniques for – synthesizers. In 1999,
Circuits, vol. 28, pp. 553–559, May 1993.
he was a visiting Assistant Professor at the Hong
[2] M. A. Copeland, “VLSI for analog/digital communications,” IEEE
Kong University of Science and Technology, where
Commun. Mag., vol. 29, pp. 25–30, May 1991.
he taught a course on the theory and implementation of frequency synthesizers.
[3] B. Miller and B. Conley, “A multiple modulator fractional divider,” in
From 1999 to 2001, he was with Silicon Laboratories, Austin, TX, where he de-
Proc. 44th Annu. Symp. Frequency Control, May 1990, pp. 559–567.
veloped circuit and signal-processing techniques to achieve high-performance
[4] , “A multiple modulator fractional divider,” IEEE Trans. Instrum.
clock and data recovery circuits. He is currently an Assistant Professor in the
Meas., vol. 40, pp. 578–583, June 1991.
[5] W. Rhee, B.-S. Song, and A. Ali, “A 1.1-GHz CMOS fractional- fre- N Department of Electrical Engineering and Computer Science at M.I.T., where
his research focuses on high-speed circuit and signal processing techniques for
quency synthesizer with 3-b third-order sigma–delta modulator,” IEEE
data links and wireless applications.
J. Solid-State Circuits, vol. 35, pp. 1453–1460, Oct. 2000.
[6] B. Miller, “Technique enhances the performance of PLL synthesizers,”
Microw. RF, pp. 59–65, Jan. 1993.
[7] T. Kenny, T. Riley, N. Filiol, and M. Copeland, “Design and realiza-
N
tion of a digital delta–sigma modulator for fractional- frequency syn- Mitchell D. Trott (S’90–M’92) received the B.S.
thesis,” IEEE Trans. Veh. Technol., vol. 48, pp. 510–521, Mar. 1999. and M.S. degrees in systems engineering from Case
[8] T. A. Riley and M. A. Copeland, “A simplified continuous phase mod- Western Reserve University, Cleveland, OH, in
ulator technique,” IEEE Trans. Circuits Syst. II, vol. 41, pp. 321–328, 1987 and 1988, respectively, and the Ph.D. degree
May 1994. in electrical engineering from Stanford University,
[9] M. Perrott, T. Tewksbury, and C. Sodini, “A 27-mW CMOS frac-
N
tional- synthesizer using digital compensation for 2.5-Mb/s GFSM
Stanford, CA, in 1992.
He was an Assistant and Associate Professor in the
modulation,” IEEE J. Solid-State Circuits, vol. 32, pp. 2048–2060, Department of Electrical Engineering and Computer
Dec. 1997. Science at the Massachusetts Institute of Technology,
[10] S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek, and W. McFar- Cambridge, from 1992 until 1998. He was Director
land, “An integrated 2.5-GHz sigma–delta frequency synthesizer with of Research with ArrayComm, Inc., San Jose, CA,
5 microseconds settling and 2-Mb/s closed-loop modulation,” in Proc. from 1998 to 2002. He is currently with Hewlett-Packard Laboratories, Palo
IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2000, pp. 200–201. Alto, CA. His research interests include multiuser communication, information
[11] N. Filiol, T. Riley, C. Plett, and M. Copeland, “An agile ISM band theory, and coding theory.
frequency synthesizer with built-in GMSK data modulation,” IEEE J.
Solid-State Circuits, vol. 33, pp. 998–1008, July 1998.
[12] N. Filiol, C. Plett, T. Riley, and M. Copeland, “An interpolated
frequency-hopping spread-spectrum transceiver,” IEEE Trans. Circuits
Syst. II, vol. 45, pp. 3–12, Jan. 1998. Charles G. Sodini (S’80–M’82–SM’90–F’94) was
[13] M. H. Perrott, “Techniques for high data rate modulation and low power born in Pittsburgh, PA, in 1952. He received the
N
operation of fractional- frequency synthesizers with noise shaping,” B.S.E.E. degree from Purdue University, Lafayette,
Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, MA, 1997. IN, in 1974, and the M.S.E.E. and Ph.D. degrees
[14] A. Hill and A. Surber, “The PLL dead zone and how to avoid it,” RF from the University of California, Berkeley, in 1981
Design, pp. 131–134, Mar. 1992. and 1982, respectively.
[15] M. Thamsirianunt and T. A. Kwasniewski, “A 1.2-m CMOS imple- He was a Member of the Technical Staff with
mentation of a low-power 900-MHz mobile radio frequency synthe- Hewlett-Packard Laboratories from 1974 to 1982,
sizer,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 1994, where he worked on the design of MOS memory
p. 16.2. and, later, on the development of MOS devices with
[16] J. A. Crawford, Frequency Synthesizer Handbook. Norwood, MA: very thin gate dielectrics. He joined the faculty of
Artech, 1994. the Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, in 1983,
[17] E. A. Lee and D. G. Messerschmitt, Digital Communication, 2nd where he is currently a Professor in the Department of Electrical Engineering
ed. Norwell, MA: Kluwer, 1994. and Computer Science. His research interests are focused on integrated circuit
[18] J. Candy and G. Temes, Oversampling Delta–Sigma Data Con- and system design with emphasis on analog, RF, and memory circuits and
verters. New York: IEEE Press, 1992. systems. Along with Prof. R. T. Howe, he is a coauthor of an undergraduate
[19] S. Norsworthy, R. Schreier, and G. Temes, Delta–Sigma Data Con- text on integrated circuits and devices entitled Microelectronics: An Integrated
verters: Theory, Design, and Simulation. New York: IEEE Press, Approach (Englewood Cliffs, NJ: Prentice-Hall, 1996).
1997. Dr. Sodini held the Analog Devices Career Development Professorship at
[20] A. Sripad and D. Snyder, “A necessary and sufficient condition for quan- M.I.T.’s Department of Electrical Engineering and Computer Science and was
tization errors to be uniform and white,” IEEE Trans. Acoust. Speech awarded the IBM Faculty Development Award from 1985 to 1987. He has served
Signal Proc., vol. ASSP-25, pp. 442–448, Oct. 1977. on a variety of IEEE Conference Committees, including the International Elec-
[21] W. Bennett, “Spectra of quantized signals,” Bell Syst. Tech. J., vol. 27, tron Device Meeting, of which he was the 1989 General Chairman. He was the
pp. 446–472, July 1948. Technical Program Co-Chairman in 1992 and the Co-Chairman for 1993–1994
[22] D. Leeson, “A simple model of feedback oscillator noise spectrum,” of the Symposium on VLSI Circuits. He served on the Electron Device Society
Proc. IEEE, vol. 54, pp. 329–330, Feb. 1966. Administrative Committee from 1988 to 1994. He has been a member of the
[23] A. Hajimiri and T. Lee, “A general theory of phase noise in electrical os- Solid-State Circuits Society (SSCS) Administrative Committee since 1993 and
cillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–194, Feb. 1998. is currently President of the SSCS.
888 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

A Stabilization Technique for Phase-Locked


Frequency Synthesizers
Tai-Cheng Lee and Behzad Razavi, Fellow, IEEE

Abstract—A stabilization technique is presented that relaxes the


tradeoff between the settling speed and the magnitude of output
sidebands in phase-locked frequency synthesizers. The method in-
troduces a zero in the open-loop transfer function through the use
of a discrete-time delay cell, obviating the need for resistors in
the loop filter. A 2.4-GHz CMOS frequency synthesizer employing
the technique settles in approximately 60 s with 1-MHz channel
spacing while exhibiting a sideband magnitude of 58.7 dBc. De-
signed for Bluetooth applications and fabricated in a 0.25- m dig-
ital CMOS technology, the synthesizer achieves a phase noise of
112 dBc/Hz at 1-MHz offset and consumes 20 mW from a 2.5-V
supply.
Index Terms—Charge pumps, feedforward, loop stability, oscil-
lators, phase-locked loops (PLLs), prescalers, synthesizers.

I. INTRODUCTION

T HE design of phase-locked loops (PLLs) must generally


deal with a tight tradeoff between the settling time and the
amplitude of the ripple on the oscillator control line. For phase-
locked RF synthesizers, this tradeoff limits the performance in
terms of the channel switching speed and the magnitude of the
Fig. 1. (a) Conventional PLL architecture. (b) Proposed PLL architecture with
reference sidebands that appear at the output. delayed charge pump circuit.
This paper describes a loop stabilization technique that yields
a small ripple while achieving fast settling [1]. Using a dis-
crete-time delay cell, the PLL architecture creates a zero in the phase/frequency detector (PFD). Resistor provides the stabi-
open-loop transfer function. Another important advantage of the lizing zero and capacitor suppresses the glitch generated by
technique is that it uses no resistors in the loop filter, lending it- the charge pump at every phase comparison instant. The glitch
self to digital CMOS technologies. Also, it “amplifies” the value arises from: 1) the mismatch between the arrival times of the
of the loop filter capacitor, thus saving a great deal of silicon Up and Down pulses; 2) the mismatch between the widths of
area. Realized in a 2.4-GHz CMOS synthesizer, the proposed the Up and Down pulses; 3) the mismatch between the charge
method provides a settling time of approximately 60 reference pump current sources (both random and due to channel-length
cycles with an output sideband level of 59 dBc. modulation); and 4) the mismatch between the charge injection
Section II of the paper develops the foundation for the pro- and clock feedthrough of the pMOS and nMOS switches in the
posed technique. Section III describes the 2.4-GHz synthesizer charge pumps. Charge sharing also exacerbates the ripple [2].
architecture and the design of its building blocks and Section IV The principal limitation of this architecture is that deter-
proposes fast simulation techniques for RF synthesizers. Sec- mines the settling whereas lowers the ripple on the control
tion V summarizes the experimental results. voltage. Since must remain below by roughly a factor of
10 so as to avoid underdamped settling, the loop must inevitably
II. STABILIZATION TECHNIQUE be slowed down by a large if is to sufficiently suppress
the ripple. It is, therefore, desirable to seek methods of creating
Consider the PLL shown in Fig. 1(a), where a voltage-con-
the stabilizing zero without the resistor so that the capacitor that
trolled oscillator (VCO) is driven by a charge pump (CP) and a
defines the switching speed also directly suppresses the ripple.
A number of approaches to realizing a zero in PLLs have been
Manuscript received April 23, 2002; revised February 18, 2003. reported [3]–[5]. The circuits in [3] and [4] require a transcon-
T.-C. Lee was with the Department of Electrical Engineering, University
of California, Los Angeles, CA 90095 USA. He is now with the Department ductance amplifier, whose design for large output swings (nec-
of Electrical Engineering, National Taiwan University, Taipei 10617, Taiwan, essary for maximizing the tuning range of LC VCOs) and low
R.O.C. flicker noise becomes difficult. The synthesizer in [5] employs
B. Razavi is with the Department of Electrical Engineering, University of
California, Los Angeles, CA 90095 USA (e-mail: razavi@ee.ucla.edu) a voltage-controlled delay line but it mandates a large delay and
Digital Object Identifier 10.1109/JSSC.2003.811879 a nearly rail-to-rail control voltage.
0018-9200/03$17.00 © 2003 IEEE
LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS 889

It is important to note that the problem of ripple becomes in- each stage is wide enough to support such pulses, then a very
creasingly more serious as the supply voltage is scaled down large number of stages is required to obtain the necessary ,
and/or the operating frequency goes up. The relative magnitude demanding a high power dissipation.
of the primary sidebands at the output of the VCO is given by The second issue relates to the variation of with process
where is the peak amplitude of the and temperature. Since is directly proportionally to , such
first harmonic of the ripple, is the gain of the VCO, and variations can greatly affect the loop stability.
is the synthesizer reference frequency. For a given relative To resolve the above difficulties, the architecture is modified
tuning range (e.g., 10 ), the gain of LC VCOs must increase as shown in Fig. 2(a), where a discrete-time analog delay line
if the supply voltage goes down. If MHz/V and is placed after and . The delay network is realized as
MHz, then the fundamental ripple amplitude must be depicted in Fig. 2(b), consisting of two interleaved master-slave
less than 63 V to guarantee sidebands 60 dB below the carrier. sample-and-hold branches operating at half of the reference fre-
In order to arrive at the stabilization technique, consider the quency. The circuit emulates as follows. When is
PLL architecture shown in Fig. 1(b). Here, the primary charge high, shares a charge packet corresponding to the previous
pump, , drives a single capacitor while a secondary phase comparison with while samples a level propor-
charge pump, , injects charge after some delay . The tional to the present phase difference. In the next period, and
total current flowing through is thus equal to exchange roles. The interleaved sampling network, there-
fore, provides a delay equal to the reference period .
(1) The discrete-time delay technique of Fig. 2 allows a precise
(2) definition of the zero frequency without the use of resistors.
To quantify the behavior of a PLL incorporating this method,
where is assumed to be much smaller than the loop
we assume the loop settling time is much greater than
time constant. Consequently, the transfer function of the
so that the delay network can be represented by the contin-
PFD/CP/LPF combination can be expressed as
uous-time model shown in Fig. 2(b). Here,
(3) approximates the interleaved branches. Equation (4) can then be
rewritten as
Assuming , we have
(8)
(4) where it is assumed and the current through is ne-
glected. This equation exhibits two interesting properties. First,
obtaining a zero at
if , then and the value
of is “amplified” by . For example, if ,
(5)
then is multiplied by a factor of 10, saving substantial area.
Proper choice of can, therefore, stabilize the loop. Second, the zero frequency is equal to
The damping factor and the settling time of the loop can be
written, respectively, as (9)

a value independent of process and temperature. Assuming


(6)
, we obtain the damping factor and the
settling time constant of the loop as
(7)

In order to achieve a sufficiently low zero frequency, (10)


must be large or close to unity. Since the accuracy in the def-
inition of is limited by mismatches between the two charge (11)
pumps, must still be a large value. For example, if
MHz, pF, A, MHz/V, Note that the damping factor exhibits much less process and
, and , then a of approximately 500 ns temperature dependence than in the conventional loop of
is required to ensure a well-behaved loop response. Fig. 1(a). Interestingly, for , the proposed circuit resem-
The architecture of Fig. 1(b) suffers from two critical bles the topology of Fig. 1(a) but with the resistor replaced by
drawbacks. First, it requires that the delay stage provide a very a switched-capacitor network.
large and accommodate a wide range of Up and Down While providing insight and serving as design guidelines, the
pulsewidths. Specifically, when the loop is locked, the Up and above results are obtained by a continuous-time approximation
Down pulses are less than 500 ps wide. The tradeoff between of the loop and their validity must be verified. Simulations using
delay and bandwidth, therefore, makes the design of the delay the values A, MHz/V, pF,
line difficult. As depicted in Fig. 1(b), if the bandwidth of each and yield and s. Equations
stage in the delay line is reduced so as to yield a large delay, (10) and (11) predict these parameters to be 0.31 and 13 s,
then the narrow Up and Down pulses are heavily attenuated, respectively. Thus, the continuous-time approximation provides
giving rise to a dead zone. Conversely, if the bandwidth of a reasonably accurate estimate of the loop behavior even for
890 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

Fig. 2. Actual implementation of PLL with delay sampling circuit and continuous-time approximation of delay network.

underdamped settling (where the loop time constant is relatively III. SYNTHESIZER DESIGN
short).
A 2.4-GHz CMOS synthesizer targeting Bluetooth appli-
For RF synthesis, the delay network of Fig. 2(b) must be de-
cations has been designed using the stabilization technique
signed carefully so as to minimize ripple on the control voltage.
described above. This section presents the architecture and
Since in the locked condition, the voltages at nodes and are
building blocks of the synthesizer.
nearly equal, charge sharing between or and creates
Shown in Fig. 4, the synthesizer uses an integer- architec-
only a small ripple. Furthermore, the switches in the delay stage
ture with a feedback divider whose modulus is given by
are realized as small, complementary devices to introduce neg-
, where , , and – .
ligible charge injection and clock feedthrough.
With MHz, the output frequency covers the 2.4-GHz
Comparison With Conventional Architecture In order to
ISM band. The output of the swallow counter is pipelined by the
quantify the advantage of the proposed architecture over the
flip-flop to allow a relaxed design for the level converter
conventional PLL topology, we note that capacitor in
and the swallow counter. The buffer following the VCO sup-
Fig. 2 appears in parallel with or . Since the sampling
presses the kickback noise of the prescaler when the modulus
capacitors are typically two to three times larger than , they
changes. It also avoids limiting the tuning range of the VCO by
suppress the charge pump nonidealities by about 9 to 12 dB.
the input capacitance of the prescaler.
The behavioral model shown in Fig. 3(a) is simulated in
MATLAB for the two cases. As explained in Section IV, the
reference frequency and the divide ratio are scaled by a factor A. VCO Design
of 100 to speed up the simulation. The nonideality of the charge The VCO topology is shown in Fig. 5(a). To provide both
pump is modeled by a constant current mismatch that is negative and positive voltages across the MOS varactors, the
injected into the loop filter at each phase comparison instant. sources of and are grounded and the circuit is biased on
Fig. 3(b) depicts the settling behavior and the output spectrum top by . The inductors are realized as shown in Fig. 5(b),
for the two cases. (The plots are deliberately offset for clarity). with the bottom spiral moved down to metal 2 so as to reduce
For approximately equal settling times, the proposed topology the parasitic capacitance [7]. Each inductor is about 14 nH, oc-
(Type A) achieves 10 dB lower sidebands than the conventional cupies an area of 180 m 180 m, and exhibits a of 4 and
loop does. a parasitic capacitance of 100 fF.
LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS 891

Fig. 3. (a) MATLAB behavioral simulations for the ripples on the control lines. (b) Time-domain settling and VCO output spectrum during lock for Type A
(delay-sampling loop filter) and Type B (conventional loop filter).

The prescaler must divide the 2.4-GHz signal while con-


suming a small power dissipation. Depicted in Fig. 6(b),
the circuit employs three current-steering flip-flops with
diode-connected loads. The use of NOR gates obviates the need
for power- and headroom-hungry level shift circuits (or large
input swings) required in NAND gates. The program counter
and the swallow counter incorporate static flip-flops to ensure
reliable operation at low frequencies.

IV. SIMULATION TECHNIQUES


A 2.4-GHz synthesizer with a reference frequency of 1 MHz
requires a transient simulation step of approximately 20 ps for
a total settling time on the order of 100 s, i.e., five million
points. The simulation, therefore, requires an extremely long
time (about 5 days on an Ultra 10 Sun Workstation) owing to
Fig. 4. Synthesizer architecture.
both the vastly different time scales and the large number of
devices (especially in the divider).
The varactors are implemented as accumulation-mode nMOS This section describes a number of techniques that reduce the
devices (placed inside n-well). In this design, a 160-fF varactor simulation time by several orders of magnitude while revealing
is employed to allow a tuning range of about 12%. The measured the loop dynamics with reasonable accuracy.
phase noise of the VCO is 120 dBc/Hz at 1-MHz offset.
A. Linear Discrete-Time Model
B. Pulse-Swallow Counter The voltages at nodes and in Fig. 2(b) can be, respec-
Shown in Fig. 6(a), the pulse-swallow counter consists of tively, expressed by the following discrete-time equations
a prescaler, a program counter and a swallow counter. The
pipelining in the swallow counter allows the use of a small
divider ratio in the prescaler. Simulations suggest that a 4/5
topology minimizes the overall divider power dissipation. (12)
892 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

Fig. 5. (a) LC oscillator. (b) Two-layer stacked inductor.

Fig. 6. (a) Divider. (b) Prescaler.

(13)

where denotes the input phase error. Even though the


transform of and can be derived, the high order of the
resulting polynomials makes it difficult to derive the close-loop
response in analytical form. Thus, the two equations, along with
the rest of the PLL, are realized in MATLAB. The VCO is mod-
eled as a phase accumulator, with each new value of phase ob-
tained as the previous phase plus the product of the time in-
terval and the new frequency. The phase detector is simply a
subtractor, generating for the above equations. The linear
discrete-time model facilitates the choice of the charge pump
Fig. 7. Settling behavior of MATLAB and transistor-level simulations.
current, the value of , and the value of for fast settling and
minimum ripple on the oscillator control line.
down by the same factor. All other loop parameters remain un-
Fig. 7 shows the settling behavior of the synthesizer as pre-
changed. From (10) and (11), we note that scaling and by
dicted by MATLAB and transistor-level implementation. The
100 maintains a constant damping factor while scaling the set-
simple discrete-time model yields a moderate accuracy while
tling time by 100. Since the PFD operates reliably at 100 MHz
requiring orders of magnitude less simulation time.
with no dead zone, this method directly reduces the simulation
time by a factor of 100.
B. Transistor-Level Model Fig. 8 depicts an example of time contraction by a factor of
The impact of various PFD, CP, and VCO nonidealities upon 10. Note that the time axis has a logarithmic scale. It can be
the loop dynamics must ultimately be studied in a realistic tran- observed that the loop settling behavior scales accurately by the
sistor-level implementation. We present two techniques that re- same factor.
duce the simulation time from days to minutes. In the second method, the divider is realized as a simple be-
The first method is based on “time contraction,” whereby the havioral model in HSPICE that uses a handful of ideal devices
reference frequency is scaled up by a factor of 100 and the main and its complexity is independent of the divide ratio. Illustrated
loop filter capacitor ( in Fig. 2) and the divide ratio are scaled in Fig. 9, the principle of the behavioral divider is to pump a
LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS 893

Fig. 8. Time contraction.

Fig. 10. Die photo.

Fig. 9. Divider behavioral model.


Fig. 11. Measured output spectrum of the synthesizer.

TABLE I
FAST SIMULATION SUMMARY of the die, whose active area measures 0.65 mm 0.45 mm.
The circuit has been tested in a chip-on-board assembly while
running from a 2.5-V power supply. The power dissipation is
20 mW.
Fig. 11 shows the output spectrum in the locked condition.
The phase noise is equal to 112 dBc/Hz at 1 MHz offset,
well exceeding the Bluetooth requirement. The primary refer-
ence sidebands are at approximately 58.7 dBc. This level is
lower than that achieved in [8] with differential VCO control
and an 86.4-MHz reference frequency. Similarly, the designs in
[9] and [10] exhibit an inferior tradeoff between the settling time
well-defined charge packet into an integrator in every period and the sideband magnitudes.
and reset the integrator when its output exceeds a certain level Fig. 12 plots the measured settling behavior of the synthesizer
. Using an ideal op amp, comparator, and switches with when its channel number is switched by 64. Here, the channel
proper choice of and , the circuit can achieve arbitrarily select input is periodically switched between the two end chan-
large divide ratios. (The duty cycle of output can be controlled nels and the oscillator control voltage is monitored. The settling
by .) This technique yields another factor of 20 reduction in time is about 60 s, i.e., 60 input cycles. Table II summarizes
the simulation speed, allowing the synthesizer to be simulated the measured performance of the synthesizer.
in less than 3 min on an Ultra 10 Sun Workstation. Table I sum-
marizes the results of the two simulation techniques. VI. CONCLUSION
A PLL stabilization technique is introduced that relaxes the
V. EXPERIMENTAL RESULTS tradeoff between the settling time and the ripple on the con-
The frequency synthesizer has been fabricated in a digital trol voltage, while obviating the need for resistors in the loop
0.25- m CMOS technology. Shown in Fig. 10 is a photograph filter. The proposed approach creates a zero in the open loop
894 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

[5] A. Zolfaghari, A. Chan, and B. Razavi, “A 2.4-GHz 34-mW CMOS


transceiver for frequency-hopping and direct-sequence applications,” in
IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2001, pp.
1259–1265.
[6] C. Lam and B. Razavi, “A 2.6-GHz/5.2-GHz frequency synthesizer in
0.4-m CMOS technology,” IEEE J. Solid-State Circuits, vol. 35, pp.
788–794, May 2000.
[7] A. Zolfaghari, A. Chan, and B. Razavi, “Stacked inductors and
transformers in CMOS technology,” IEEE J. Solid-State Circuits, pp.
620–628, Apr. 2001.
[8] L. Lin, L. Tee, and P. R. Gray, “A 1.4-GHz differential low-noise CMOS
frequency synthesizer using a wideband PLL architecture,” in IEEE Int.
Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2000, pp. 204–205.
[9] T. K. K. Kan, G. C. T. Leung, and H. C. Luong, “A 2-V 1.8-GHz fully
integrated CMOS dual-loop frequency synthesizer,” IEEE J. Solid-State
Circuits, vol. 37, pp. 1012–1020, Aug. 2002.
[10] C.-W. Lo and H. C. Luong, “A 1.5-V 900-MHz monolithic CMOS
fast-switching frequency synthesizer for wireless applications,” IEEE
J. Solid-State Circuits, vol. 37, pp. 459–470, Apr. 2002.
Fig. 12. Control voltage during loop settling.

TABLE II
SYNTHESIZER PERFORMANCE SUMMARY
Tai-Cheng Lee was born in Taiwan, R.O.C., in
1970. He received the B.S. degree from National
Taiwan University, Taipei, Taiwan, R.O.C., in
1992, the M.S. degree from Stanford University,
Stanford, CA, in 1994, and the Ph.D. degree from
the University of California, Los Angeles, in 2001,
all in electrical engineering.
He was with LSI Logic from 1994 to 1997 as a Cir-
cuit Design Engineer. He served as an Adjunct As-
sistant Professor with the Graduate Institute of Elec-
tronics Engineering (GIEE), National Taiwan Uni-
versity, from 2001 to 2002. Since 2002, he has been with the Department of
Electrical Engineering and GIEE, National Taiwan University, where he is an
Assistant Professor. His main research interests are in high-speed mixed-signal
and analog circuit design, data converters, PLL systems, and RF circuits.

Behzad Razavi (S’87–M’90–SM’00–F’03) received


the B.Sc. degree in electrical engineering from Sharif
University of Technology, Tehran, Iran, in 1985 and
the M.Sc. and Ph.D. degrees in electrical engineering
from Stanford University, Stanford, CA, in 1988 and
1992, respectively.
transfer function by adding two consecutive phase comparison He was an Adjunct Professor at Princeton
results and can be extended to more consecutive samples as in University, Princeton, NJ, from 1992 to 1994, and
at Stanford University in 1995. He was with AT&T
a transversal filter. The method also “amplifies” the loop filter Bell Laboratories and Hewlett-Packard Laboratories
capacitor by a large number (e.g., 10), saving substantial chip until 1996. Since September 1996, he has been an
area. The proposed concepts are demonstrated in a 2.4-GHz RF Associate Professor and subsequently Professor of electrical engineering at
the University of California, Los Angeles. He is the author of Principles of
CMOS synthesizer. Data Conversion System Design (New York: IEEE Press, 1995), RF Micro-
The stabilization technique finds applications in other phase- electronics (Englewood Cliffs, NJ: Prentice-Hall, 1998), Design of Analog
locked systems as well. Examples include clock generators and Integrated Circuits (New York: McGraw-Hill, 2001), Design of Integrated
Circuits for Optical Communications (New York: McGraw-Hill, 2002), and the
clock and data recovery circuits. editor of Monolithic Phase-Locked Loops and Clock Recovery Circuits (New
York: IEEE Press, 1996). His current research includes wireless transceivers,
REFERENCES frequency synthesizers, phase-locking and clock recovery for high-speed data
communications, and data converters.
[1] T. C. Lee and B. Razavi, “A stabilization technique for phase-locked Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the
frequency synthesizers,” in VLSI Symp. Dig. Tech. Papers, June 2001, 1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits
pp. 39–42. Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Inno-
[2] M. G. Johnson and E. L. Hudson, “A variable delay line PLL for CPU- vative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom
coprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp. Integrated Circuits Conference in 1998. He was the corecipient of the Jack Kilby
1218–1223, Oct. 1988. Outstanding Student Paper Award at the 2002 ISSCC. He served on the Tech-
[3] I. I. Novof, J. Austin, R. Kelkar, D. Strayer, and S. Wyatt, “Fully in- nical Program Committee of the International Solid-State Circuits Conference
tegrated CMOS phase-locked loop with 15 to 240 MHz locking range (ISSCC) from 1993 to 2002. He has also served as Guest Editor and Associate
6
and 50 ps jitter,” IEEE J. Solid-State Circuits, vol. 11, pp. 1259–1265, Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS
Nov. 1995. ON CIRCUITS AND SYSTEMS, and the International Journal of High Speed Elec-
[4] S. Sidiropoulos, D. Liu, J. Kim, G. Wei, and M. Horowitz, “Adaptive tronics. He is recognized as one of the top ten authors in the 50-year history of
bandwidth DLL’s and PLL’s using regulated supply CMOS buffers,” in ISSCC. He is also an IEEE Distinguished Lecturer.
VLSI Symp. Dig. Tech. Papers, June 2000, pp. 124–127.
490 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

An Adaptive PLL Tuning System Architecture


Combining High Spectral Purity and
Fast Settling Time
Cicero S. Vaucher, Member, IEEE

Abstract—An adaptive phase-locked loop (PLL) architecture reception condition is provided when the receiver is displaced
for high-performance tuning systems is described. The architec- within different coverage regions. For the system to be effec-
ture combines contradictory requirements posed by different per- tive, the background scanning has to be performed in a trans-
formance aspects. Adaptation of loop parameters occurs contin-
uously, without switching of loop filter components, and without parent (inaudible) way to the listener. A possible but expensive
interaction from outside of the tuning system. The relationship of way to do that is to use two tuners in the receiver, with one of
performance aspects (settling time, phase noise, and spurious sig- them being used for checking on alternative frequencies only.
nals) to design variables (loop bandwidth, phase margin, and loop Single-tuner solutions—which have a much better price/perfor-
filter attenuation at the reference frequency) are presented, and mance ratio—require a tuning system architecture able to do
the basic tradeoffs of the new concept are discussed. A circuit im-
plementation of the adaptive PLL, optimized for use in a multi- frequency hopping in an inaudible way [2]. In other words, a
band (global) car-radio tuner IC, is described in detail. The real- fast-settling-time architecture is required for these applications.
ized tuning system achieved state-of-the-art settling time and spec- Communication systems often pose severe requirements on
tral purity performance in its class (integer- PLL’s): a signal-to- the spectral purity of the tuning system local oscillator (LO)
noise ratio of 65 dB, a 100-kHz spurious reference breakthrough signal. There are two main reasons for this. First, to avoid prob-
signal under 81 dBc, and a residual settling error of 3 kHz after
1 ms, for a 20-MHz frequency step. It simultaneously fulfills the lems with reciprocal mixing of adjacent channels. Reciprocal
speed requirements for inaudible frequency hopping and the heavy mixing decreases the receiver's selectivity and disturbs the re-
signal-to-noise ratio specification of 64 dB. ception of weak signals. Second, because the mixing process,
Index Terms—Adaptive systems, FM noise, frequency synthe- which is used for down-conversion of the radio-frequency (RF)
sizers, phase-locked loops. signals, superposes the phase noise of the LO on the modula-
tion of the RF signal. Hence, the signal-to-noise ratio (SNR) at
the output of the demodulator is a function of LO's phase noise
I. INTRODUCTION level [3].
This paper describes an adaptive tuning system architecture
F AST settling time–frequency synthesizers are essential
building blocks of modern communication systems.
Typical examples are digital cellular mobile systems, which
that combines fast settling time with excellent spectral purity
performance. The architecture was optimized to be used in
employ a combination of time-division duplex (TDD) and a global car-radio tuner IC with inaudible RDS background
frequency-division duplex (FDD) techniques. In these systems, scanning. The integer- frequency synthesizer has an SNR of
the downlink frequencies (base station to handsets) are placed 65 dB and a 100-kHz spurious reference breakthrough under
in different bands with respect to uplink frequencies. In order 81 dBc at the voltage-controlled oscillator (VCO) ( 87 dBc
to save cost and decrease the size of the handset, it is desirable at the mixer). Residual settling error for a 20-MHz frequency
to use the same frequency synthesizer to generate uplink and step is 3 kHz after 1 ms. These results are similar to those of a
downlink frequencies. Requirements are that the synthesizer fractional- implementation [4]. The complexity of our tuning
has to switch between bands and settle to another frequency system, however, is much smaller. The adaptive phase-locked
within a predetermined time ( 1.7 ms for GSM and DCS-1800 loop (PLL) was integrated in a 5-GHz, 2- m bipolar tech-
systems [1]). nology. The tuning system works with 8.5-V supply voltage for
Car-radio receivers with optimal radio data system (RDS) the charge pumps and with 5 V for the logic functions. Total
performance ask for fast-settling-time tuning systems as well current consumption is 21 mA from the 5-V supply and 12 mA
[2]. The RDS network transmits a list of (nationwide) alterna- from the 8.5-V supply.
tive frequencies carrying the same program. The tuner performs The architecture of the multiband tuner IC is described in
a background scanning of these frequencies, so that optimum Section II. Section III presents relationships of settling time,
phase noise, and spurious signals to the design variables, namely
loop bandwidth, phase margin, and loop filter attenuation at the
reference frequencies. Section IV introduces the adaptive PLL
Manuscript received July 23, 1999; revised November 29, 1999. architecture and discusses the advantages and tradeoffs of the
The author is with Philips Research Laboratories, Eindhoven 5656 AA The
Netherlands (e-mail: Cicero.Vaucher@philips.com). concept. Section V describes the circuit implementation, and
Publisher Item Identifier S 0018-9200(00)02861-4. Section VI presents a summary of measured results.
0018–9200/00$10.00 © 2000 IEEE
VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE 491

Fig. 1. Simplified block diagram of the global car-radio tuner IC.

TABLE I
RECEPTION BANDS WITH CORRESPONDING TUNING SYSTEM PARAMETERS

II. MULTIBAND TUNER ARCHITECTURE AM DIV dividers, which are set in between the VCO output
and the RF mixers. Table I presents the VCO frequency and
The block diagram of the global tuner IC with inaudible back- tuning system parameter settings for various reception bands,
ground scanning is shown in Fig. 1. The receiver and tuning including the American Weather Band. By dividing the VCO
system architectures have been defined such that all reception output, the tuning resolution is 1 kHz in AM mode and 50 kHz in
bands can be accessed with a single VCO and a single loop filter, FM mode, despite the fact that reference frequencies are 20 kHz
without changes to the application. Mapping the frequency of and 100 kHz, respectively.
the VCO to the different input bands is achieved by dividing its Combining the different reception bands in one single appli-
output frequency by different ratios, depending on the band to cation—the same VCO and same loop filter—complicates the
be received. The division is accomplished in the FM DIV and design of the tuning system. A reception band with worst case
492 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 2. Open-loop frequency response (Bode plot) of a type-2, third-order (a)


charge-pump PLL for different values of phase margin  .

spectral purity requirements determines the loop filter design.


Nonetheless, robustness for variations in tuning system parame-
ters, for all reception bands, has to be insured. The relationships
between different performance aspects on system level are dis-
cussed in the following section.

III. SETTLING TIME AND SPECTRAL PURITY PERFORMANCE


The properties of a PLL are strongly related to its phase de-
tector implementation [5]. Present-day PLL frequency synthe-
sizers usually employ the tristate, sequential phase frequency
detector (PFD), combined with a charge pump (CP) [6]. The
analysis of the PLL properties presented in this paper assumes
the use of a PFD/CP in the loop.
(b)

A. Settling Time, Loop Bandwidth, and Loop Phase Margin Fig. 3. Setting transient for different values of  , normalized for f t. (a)
1( )
Setting error (represented as f f t =f ) versus f t. (b) Setting error
Bode diagrams are a powerful tool for designing PLL tuning (represented asln( 1 ( )
j f f t j=f ) ) versus f t.
systems [7], [8] because they enable direct assessment of the
loop's phase margin and open-loop bandwidth (0-dB fre-
reaching a minimum for values of around 50 . Increasing
quency ). Accurate and reliable results for and are ob-
the phase margin further leads to a sharp increase in the settling
tained with ease to implement behavioral models [9] and with
time.
fast ac simulation runs. In spite of the advantages of the “ac
The relationship of settling time and phase margin, displayed
method,” design equations relating the settling performance of a
in Fig. 4, can be understood with the help of Fig. 5. It presents
type-2, third-order charge-pump PLL1 [6] to its open-loop band-
the pole and zero locations of the closed-loop transfer function
width and phase margin have, to the best of our knowledge, not
of a third-order loop with different values of phase margin (Bode
yet been published in the open literature.
plots presented in Fig. 2). The real part of the dominant (com-
Fig. 2 presents Bode plots of a type-2, third-order loop for
plex) poles approach for values of of about 50 . When
different values of phase margin . Fig. 3(a) displays the tran-
equals 53 , all three poles lie at . That is the location
sient response of such a loop for three different values of phase
with the fastest damping of the transient error. The fastest re-
margin. The responses are plotted as , normalized
sponse, however, is obtained with 51 . The complex parts of the
for . is the remaining frequency error with respect to
poles “speed up” the settling transient a bit further (25%). For
the final value and is the amplitude of the frequency jump.
higher values of phase margin, the dominant real pole moves to
Fig. 3(b) presents the responses as , so that
the right on the real axis. This pole is responsible for the slowing
the impact of on the “long-term” transient response is easily
down of the PLL response for values of 53 . Fig. 5
observed.
shows that the dominant pole, for 60 phase margin, lies at about
The influence of the phase margin on the settling time, ob-
0.4 . Hence, it may be concluded that the usual practice of
tained with transient simulations similar to those of Fig. 3, is
designing critically damped loops—which have a phase margin
presented in Fig. 4. The figure shows the time necessary for
of about 70 [5]—is not appropriate for fast-settling-time appli-
the value of to reach a numerical value of
cations.
10. The settling time decreases with increasing phase margin,
Let us consider Fig. 3(b) again. One sees that the (envelope
1The most widely used configuration in synthesizer applications. of the) curves can be approximated by straight lines. The ap-
VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE 493

Fig. 4. Setting time as function of the phase margin for f =f =e . ( ) for a 1(ln( )) of ten.
Fig. 6. Average values of   1

In (3):
locking time(s);
amplitude of the frequency jump (Hz);
maximum frequency error (Hz) at ;
can be read from Fig. 6.
Two points about the present treatment of the transient re-
sponse need further explanation. First, the presented results are
based on a linear continuous-time model for the discrete-time
charge-pump PLL. It is known in the literature [6] that the
continuous-time approach is a good approximation for the
discrete-time PLL if the reference (sampling) frequency
of the loop is at least a factor of ten higher than its open-loop
bandwidth . Therefore, the value of , calculated with (3),
has to be checked against the loop's reference frequency . If
Fig. 5. Position of the closed-loop poles and zeros of a third-order PLL the target ratio is smaller than ten, then actual settling
corresponding to different values of  , as displayed in Fig. 2.
behavior will deviate from the calculations.
The second point is that usual implementations of the phase
proach proposed here takes into account with the help of an frequency detector have a limited linear phase error detection
effective damping coefficient . By so doing, we arrive at range, namely, from 2 to 2 [9]. When the instantaneous
the following approximation for the envelope of the curves of phase error becomes larger than 2 , the PFD interprets
Fig. 3(b): the error information as 2 . This effect leads to a
longer settling time than predicted with (3). The maximum value
(1) of , denoted , was found to obey the following rela-
tionship: , where is the main di-
Numerical estimations for can be obtained from tran- vider ratio and is a fitting factor for the influence of the
sient simulations with the help of the following expression: phase margin on . Numerical values for , obtained
from transient simulations, lie in the range [0.7,0.8]. Hence, the
maximum phase error is contained in the interval 2 , when
(2) 2 . If this condition is satisfied, then the
(discrete-time) transient response is accurately predicted by the
The settling time results presented in Fig. 4 leads to the nu- continuous-time linear model.
merical values for displayed in Fig. 6. These values rep- Inaudible RDS background scanning requires settling times
resent an average value for , as they are obtained from a of 1 ms, defined as a residual settling error of 6 kHz for a
of ten. 20-MHz frequency jump. The nominal loop phase margin is set
Manipulation of (1) results in an equation describing the min- to 50 , which corresponds to a of five. On the other
imum loop bandwidth required to achieve given settling speci- hand, it is appropriate to use a lower value for in the
fications and calculations (e.g., 2.5), to provide enough margin for variations
in the nominal values of loop bandwidth and phase margin.
Solving (3) for these settling specifications leads to a nominal
(3)
value of 3.2 kHz for the loop bandwidth .
494 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 7. FM noise density and residual FM for loop bandwidths of 800 Hz and 3 kHz.

The loop bandwidth that satisfies different settling require- the simulated frequency noise (FM noise) power density and the
ments can be calculated with the help of (3). Settling specifica- residual FM, which is plotted as function of , with fixed at
tions, however, often require loop bandwidths that are not op- 20 Hz. The FM noise density and the residual FM are plotted for
timal with respect to spectral purity performance, as will be- values of loop bandwidth of 800 Hz and of 3 kHz. For 3 kHz,
come clear in the next subsection. the residual FM amounts to 40 Hz rms, which is 12 dB higher
than the specification. A loop bandwidth of 800 Hz, on the other
B. Phase Noise Performance and Loop Bandwidth hand, leads to a residual FM of 8 Hz rms, which satisfies the
The dependency of the total phase noise of a PLL tuning SNR requirement.
system on the phase noise of the loop components is well known The contributions of different noise sources to the total fre-
in the literature [3], [5], [10]. The phase noise of the VCO is sup- quency noise density, in the case of an 800-Hz loop bandwidth,
pressed inside the loop bandwidth, whereas the (phase) noise are displayed in Fig. 8. The contribution of the VCO to the
from the other building blocks is transferred to the VCO output, residual FM equals that of the other synthesizer building blocks.
multiplied by the closed-loop transfer function of the PLL: a This is a good compromise, and 800 Hz was chosen as the nom-
low-pass function that suppresses their noise contribution out- inal loop bandwidth for in-lock situations.
side the loop bandwidth. There is a “crossover point” for the The settling specification requires a bandwidth of 3.2 kHz.
loop bandwidth, where the noise contribution from the dividers The SNR constraint, on the other hand, asks for 800 Hz. These
and charge pump becomes dominant with respect to the noise conflicting requirements can be combined when the loop band-
from the VCO. width is made adaptive as a function of the operating mode: fre-
For terrestrial FM reception, the LO signal residual frequency quency jump or in-lock.
noise (residual FM) determines the ultimate receiver's SNR per- Adapting the value of the loop bandwidth during frequency
formance. The SNR specification for the application is 64 dB, jumps is easily accomplished by switching the nominal value of
defined for a reference level of 22.5-kHz peak deviation with the charge-pump current [6], [13]. This method, however, often
50- s deemphasis. Complying to the specification requires the causes disturbances in the VCO tuning voltage—the so-called
residual FM in the LO signal to be less than 10 Hz rms. secondary glitch-effect—at the moment the current is switched
The frequency (FM) noise density of the LO signal from high to low values. These disturbances are highly unde-
is linked to its phase noise power density sirable, as they have to be corrected by the loop in small band-
by [5]. equals width mode. What is more, the “secondary glitches” may cause
2 , the single-sideband noise-to-carrier ratio, so that audible disturbances in analog systems and increase the bit error
. Finally, the residual FM can be calcu- rate in digital systems.
lated
To provide stability for a small bandwidth loop requires a
transfer function zero located at low frequencies (large time con-
(4) stant). A low-frequency zero, however, is undesirable for oper-
ation in high bandwidth mode. It causes the phase margin to be
The integration limits and in (4) depend on the signal “too” high, which increases the settling time. Note that the ef-
bandwidth of the application [3]. For terrestrial FM reception, fective damping coefficient decreases for high values of
the lower limit is 20 Hz and the higher is 20 kHz. Fig. 7 presents phase margin (see Fig. 6).
VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE 495

Fig. 8. Contributions from different noise sources to the total FM noise density and residual FM (20 Hz–20 kHz) with 800-Hz loop bandwidth.

Therefore, for optimal settling time and phase noise, one has
not only to switch the value of the loop bandwidth but also to
change the location of the zero in the transfer function.

C. Reference Spurious Signals and Loop Filter Attenuation


The use of phase frequency detectors yields the minimum
levels of spurious breakthrough at the reference frequency [11].
The spurious signals are due to compensation of leakage cur-
rents or to imperfections in the charge pump’s implementation.
Standard FM modulation theory and the small angle approxi-
mation lead to the following equation for the amplitude of the Fig. 9. Adaptive PLL tuning system architecture.
spurious signal (in dBc), which is at an offset frequency from
the carrier:

spurious
(5)
where
offset frequency from the carrier (Hz);
amplitude of ac current component with frequency
(A);
impedance of the loop filter at (V/A);
VCO gain (Hz/V).
The value of is twice the value of the loop-filter
dc leakage current [12] in loops operating with well-designed
charge pumps. In cases where the charge pump has charge-
sharing problems and/or charge injection into the loop filter,
may become dominated by these second-order effects.
The imperfections can lead to spurious components with (much)
higher amplitudes than would be expected based on the leakage
current alone. Fig. 10. Loop-filter configuration, charge-pump currents, and component
values used in the global car-radio tuner IC.
Rearranging the above equation leads to a formula that relates
the required filter attenuation at to the specified maximum
level of spurious signals , to the dc leakage The relevant values of equal and its harmonics in a
current , and to the VCO gain standard PLL operating with a reference frequency of Hz.
Therefore, the required loop-filter (trans)impedance for these
frequencies can be readily calculated. The VCO gain, the spu-
(6)
rious specification, and the expected (maximum) leakage cur-
rent are known.
496 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 11. Bode plots of the adaptive loop during frequency jumps and in-lock.

Fig. 12. Implementation of the DZ building block.

An important conclusion to be taken from the above equations creases the loop phase margin and increases the settling time in
is that the amplitude of the spurious signals is not dependent on high-bandwidth mode.
the absolute value of loop bandwidth. Instead, it is determined Therefore, to provide optimal settling, low-power dissipation,
by the (trans)impedance of the loop filter. This means that, at and good spurious performance, one has not only to switch the
least in principle, “any” spurious specification can be achieved value of the loop bandwidth but also to bypass (some) RC sec-
simply by decreasing the impedance level of the loop filter. In tions of the loop filter. The PLL architecture presented here
practice, this is not a viable option because the PLL loop band- complies with these requirements.
width is proportional to the value of the loop-filter resistor and
to the charge-pump current [6].
For a constant value of the loop bandwidth, a decrease of IV. ADAPTIVE PLL ARCHITECTURE
the loop-filter impedance level requires a proportional increase
A. Basic Architecture
of the nominal charge-pump current. This leads to difficulties
in the charge-pump design and to higher power dissipation. To The basic idea is to have two loops working in parallel, as
avoid these difficulties, more RC sections are added to the basic depicted in Fig. 9. Loop 1, built around PFD1 and CP1, is di-
loop-filter configuration, so that the filter attenuation at higher mensioned for in-lock operation. Loop 2, built around PFD2,
frequencies is increased. Additional RC sections, however, in- DZ, and CP2, is dimensioned for fast settling time. Loop 1 op-
evitably cause phase lag at lower frequencies. The phase lag de- erates all the time, whereas Loop 2 is only active during tuning
VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE 497

actions. Loop 1 and Loop 2 share the crystal oscillator, the ref-
erence divider, and the main divider.
A smooth takeover from Loop 1, after a frequency jump,
avoids “secondary glitch” effects. The high-current charge
pump CP2 is only active during tuning. CP2 is controlled by the
dead-zone (DZ) block. DZ generates a smooth transition into a
well-defined dead zone for CP2 when lock is achieved, so that (a) (b)
sudden disturbances of the VCO tuning voltage are avoided.
Additional freedom for optimization of the loop parameters
is obtained by using two separate charge-pump outputs and by
applying the charge-pump currents to different nodes of the loop
filter. In this way, the location of the zeros for frequency jumps
and in-lock can be set in a continuous way, without switching
of loop components—which is a source of “secondary glitch”
problems. Furthermore, the path from Icpl to Vtune may con-
tain additional filtering sections for, e.g., attenuation of spurious (c)
signals and/or fractional- quantization noise [14]. These filter
sections may be bypassed by Icph to increase the phase margin Fig. 13. Shift in locking position as function of VCO tuning voltage.
in high-bandwidth mode.
in-lock duty cycle. The processed up and dn signals are then ap-
B. Loop-Filter Implementation plied to low-pass filters and slicers, whose function is to prevent
The ideas described above are demonstrated with the help pulses that have too small a duty cycle from reaching CP2. The
of Figs. 10 and 11. Fig. 10 presents the loop-filter configura- cutoff frequency of the low-pass filters, the discrimination level
tion and component values used in the global tuner IC (Fig. 1). of the slicers, and the turn-on time of CP2 determine the size of
Fig. 11 shows the optimized Bode diagrams of the adaptive PLL the dead zone around the lock position s.
(in FM mode) with the loop filter of Fig. 10. A tradeoff among settling performance, circuit implementa-
During frequency jumps both CP1 and CP2 are active; the tion, and robustness arises, when the magnitude of the dead zone
loop filter zero frequency is 1/2 RbCa and lies at a high fre- has to be determined. Let us start discussing circuit aspects.
quency, matching the 0-dB open-loop frequency. It enables sta- The dead zone of charge pump CP2 should be centered
bility and fast tuning to be achieved. The nominal loop band- around the locking position of the loop for optimum settling and
width in this mode is 3.2 kHz, and the phase margin is 50 . After spectral purity performance. The locking position, however, is
the frequency jump only CP1 is active. The zero of the loop filter a function of the output voltage of charge pump CP1. The effect
moves to a lower frequency (1/2 Ra Rb Ca), without the is depicted in Fig. 13. One sees that, as the tuning voltage Vtune
switching of loop-filter components. The low-frequency zero in- increases, there is a shift of the locking position to positive
creases the phase margin in-lock. values of . The reason lies in the finite output resistance
When the loop is in-lock, an extra pole is introduced of the active element used in CP1. Different current gains in
(1/2 RcCc), which increases the 100-kHz reference sup- CP1's UP and DOWN branches need to be compensated by up
pression by about 20 dB. During frequency jumps, these and dn signals with different duty cycles at the locking point.
elements are bypassed by CP2, increasing the phase margin in Different duty cycles are accomplished by a shift in the loop's
high-bandwidth mode. If the loop bandwidth were increased locking position.
by simply switching the amplitude of CP1, one would end up Fig. 13 shows situations where the gain in the UP branch of
with an unstable loop, because of a phase margin of less than the pump decreases as Vtune increases. The ideal operating situ-
10 in high-bandwidth mode. ation is depicted in Fig. 13(a). Situation (b) is still allowed from
the point of view of spectral purity but has asymmetrical settling
C. Dead-Zone Implementation performance. Finally, (c) depicts a situation that should never
The new element in the adaptive PLL architecture is the com- happen: the locking position shifts so much that the high-cur-
bination of the DZ block with the high-current charge pump rent charge pump CP2 becomes active and degrades the in-lock
CP2. The function of DZ is to provide CP2 with a well-de- spectral purity. Therefore, increasing the size of CP2's dead zone
fined dead zone of s. The dead zone is centered symmet- ( s) eases the design of charge pump CP1 and increases the
rically around the locking position of charge pump CP1 [see robustness of the system.
Fig. 13(a)]. On the other hand, the size of CP2's dead zone influences the
The logic diagram of the DZ/CP2 combination is depicted in settling performance of the adaptive loop. The influence of
Fig. 12. The figure shows how the different logic functions in- on the transient response was simulated with behavioral models.
fluence the duty cycle of the up and dn signals from the phase The results are displayed in Fig. 14, together with the settling
frequency detector (PFD2). At the input of DZ, the up and dn requirements that ensure inaudible background scanning func-
signals have a finite duty cycle, even for an in-lock situation tionality. Table II presents the settling time for different settling
. The finite duty cycle eliminates dead-zone problems accuracies and different values of . A dead-zone value of
in CP1. The XOR and AND gates are used to cancel the finite infinity corresponds to the situation where only CP1 is active
498 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 14. Detail of settling transient for different values of  .

TABLE II
SIMULATED IN-LOCK SNR AND SETTLING TIME (ms) FOR A 20-MHz
FREQUENCY JUMP FOR DIFFERENT VALUES OF THE DEAD ZONE AND
DIFFERENT SETTLING ACCURACIES

(nonadaptive loop). Table II shows that by using the adaptive Fig. 15. Micrograph of the tuner IC.
loop architecture, it is possible to combine fast settling time with
good SNR in-lock. Increasing leaves more “residual” phase
(and frequency) error to be corrected by the small bandwidth
loop. The closer one comes to the locking point in high band-
width mode, the shorter the total settling transient will be. A
dead-zone value of 15 ns is a good compromise for the in-
tended application.

V. CIRCUIT IMPLEMENTATION
A die micrograph of the total tuner IC is displayed in Fig. 15.
The adaptive PLL has been integrated with the other functional
blocks of Fig. 1 in a 5-GHz, 2- m bipolar technology [15].
Fig. 16. Architecture of the main programmable divider.
A. Programmable Dividers
The architecture of the main divider is depicted in Fig. 16. current routing logic techniques (CRL) [12], [16]. The low-fre-
The high-frequency part of the programmable divider is based quency part of the main and reference dividers operate with low
on the programmable prescaler concept described in [12] and current levels to limit total power dissipation. To decrease the
consists of a chain of 2/3 divider cells. The modular architecture phase noise of the reference signal going to the phase detectors,
enables easy optimization of power dissipation and robustness this signal is reclocked in a high-current D-flip-flop (D-FF). The
for process variations. The division range of the basic prescaler clean crystal signal is used to clock the D-FF. The total main di-
configuration is extended by the low-frequency programmable vider current consumption is 5 mA. The first 2/3 cell consumes
counter. The logic functions of the PLL were implemented with 2.1 mA.
VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE 499

Fig. 17. Simplified circuit diagram of charge pump CP1.

Fig. 18. CP1 and CP2 charge-pump currents as a function of 1t .

B. Oscillators
The LC VCO uses an external tank circuit. It can be tuned
from 150 to 250 MHz, with a voltage tuning range from 0.5
to 8 V. The VCO phase noise is 100 dBc/Hz at 10 kHz, for
a carrier frequency of 237 MHz. The VCO core consumes
1.5 mA. The 20.5-MHz reference crystal oscillator operates
in linear mode, to avoid harmonics interfering in the FM
reception bands. Quadrature generation for the image rejection
FM mixers (see Fig. 1) is accomplished in a divider-by-two
(FM DIV), with the exception of reception in the American
Weather Band (WX). In that case, I/Q signals are generated
with a RC-CR network directly from the VCO. This avoids the
need to have the VCO operating at 346 MHz, and a change in
the LC VCO tuned circuit during WX reception.

C. Charge Pumps
Fig. 17 shows the simplified circuit diagram of the low-cur-
rent charge pump CP1. The up and dn signals from the phase Fig. 19. Settling transient for a 20-MHz tuning step.
500 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

(a)

(b)

Fig. 20. Spectral purity measurements in FM mode: (a) reference spurious breakthrough and (b) close to the carrier.

detector drive the input differential pairs, which set the currents back arrangement provided by Q3 and Q4. This prevents asym-
in the PNP current switches Q1 and Q2 on and off. The collector metry in the source and sink currents, ensuring good centring of
outputs of Q1 and Q2 are kept at equal dc levels by the dc feed- the charge-pump characteristics for all tuning voltages. Q5 and
VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE 501

Fig. 21. Evaluation of the FM channel—VCO purity determines SNR for V > 300 V. Fin = 97:1 MHz, AF freq = 1 kHz. SNR meas.: FMdev = 22:5
kHz; 26 dB = 2:0 V. THD meas.: FMdev = 75 kHz.

Q6 provide means for stabilization of currents and for speeding VII. CONCLUSION
up the switching of Q1 and Q2. The reset circuits monitor the
This paper described an adaptive PLL architecture for
currents in Q1 and Q2 and generate the reset signals RST Up
high-performance tuning systems. The relationships of per-
and RST Dn. These signals are fed back to reset the phase de-
formance aspects to design variables were presented. It is
tectors. The high-current charge pump CP2 is a scaled-up ver-
demonstrated that design for spectral purity performance
sion of the CP1 circuit, without the reset circuits.
often leads to suboptimal settling performance, because of
different requirements on the loop bandwidth and on the
VI. MEASUREMENTS
location of the zeros and poles of the closed-loop transfer
The measured charge-pump currents as a function of the function. The adaptive architecture described here resolves
time difference between the phase detector inputs are shown these contradictory requirements, without the necessity of
in Fig. 18. Good centering of the two charge-pump outputs switching circuit elements in the loop filter. The adaptation of
is observed, and there is enough margin for variations in loop bandwidth occurs continuously, as a function of the phase
the in-lock position of CP1. The measured settling transient error in the loop, and without interaction from outside of the
response is displayed in Fig. 19. The settling performance tuning system. During frequency jumps, high bandwidth and
complies to the settling requirements and enables inaudible high phase margin are obtained by bypassing filter sections.
background scanning in single-tuner RDS applications. When the loop is locked, the architecture allows heavy filtering
The frequency spectrum of the VCO in FM mode is presented of spurious signals. The implementation of the dead-zone
in Fig. 20(a) and (b). Fig. 20(a) shows the spurious reference block was presented, and the basic tradeoffs of the concept
breakthrough at 100 kHz to be under 81 dBc. There is yet a were discussed. The adaptive PLL was optimized for use in a
6-dB improvement in noise and spurious breakthrough before multiband (global) car-radio tuner IC, which features inaudible
the VCO signal reaches the FM mixers, due to the division by background scanning. Design and architecture of the PLL
two in the FM DIV divider (see Fig. 1). Fig. 20(b) displays the building blocks were discussed, and measurement results were
phase noise spectrum close to the carrier. Spectrum measure- presented. The integrated adaptive PLL tuning system achieved
ments done in AM mode showed a reference spurious break- state-of-the-art settling and spectral purity performance in its
through of 57 dBc, at an offset of 20 kHz from the carrier. For class (integer- PLL’s). It fulfills simultaneously the speed
AM, the improvement in phase noise and spurious performance requirements for inaudible frequency hopping and the heavy
amounts to 26 dB, due to the division by 20 in between the VCO SNR specification of 64 dB.
and the AM mixers.
Finally, the SNR and THD of the total FM receiver chain are
ACKNOWLEDGMENT
displayed in Fig. 21 as a function of the antenna input signal
level . For low values of , the noise is dominated by RF The author wishes to thank D. Kasperkovitz for technical sup-
input noise and by the quality of the building blocks in the signal port during the project, K. Kianush for his tireless disposition
processing chain: low-noise amplifier, mixers, and demodulator. in bringing the car-radio project to a successful end, H. Verei-
For high values of ( 300 V), the dominant noise source jken for the optimization and layout of the synthesizer building
becomes the LO signal. The excellent measured FM sensitivity, blocks, B. Egelmeers for the implementation and evaluation
2.0 V for 26-dB SNR, and the ultimate SNR of 65 dB verify of the concept in a bread-board functional model, and G. van
the spectrum purity of the tuning system and of the RF channel. Werven for the measurements.
502 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

REFERENCES [12] C. Vaucher and D. Kasperkovitz, “A wide-band tuning system for fully
integrated satellite receivers,” IEEE J. Solid-State Circuits, vol. 33, no.
[1] B. Razavi, “A 900 MHz/1.8 GHz CMOS transmitter for dual-band appli-
7, pp. 987–998, July 1998.
cations,” IEEE J. Solid-State Circuits, vol. 34, pp. 573–579, May 1999. [13] K. Nagaraj, “Adaptive charge pump for phase-locked loops,” U.S. Patent
[2] K. Kianush and C. S. Vaucher, “A global car radio IC with inaudible
5 208 546, 1993.
signal quality checks,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. [14] B. Miller and B. Conley, “A multi-modulator fractional divider,” in Proc.
Papers, 1998, pp. 130–131. IEEE 44th Annu. Symp. Frequency Control, 1990, pp. 559–567.
[3] W. P. Robins, Phase Noise in Signal Sources, 2nd ed, ser. 9. London,
[15] Philips Semiconductors, TEA6840H global car-radio tuner datasheet,
U.K.: Inst. Elect. Eng., 1996. 1999.
[4] H. Adachi, H. Kosugi, T. Awano, and K. Nakabe, “High-speed fre-
quency-switching synthesizer using fractional N phase-locked loop,”
[16] W. G. Kasperkovitz, “Digital shift register,” U.S. Patent 5 113 419, 1992.
IEICE Trans. Electron., pt. 2, vol. 77, no. 4, pp. 20–28, 1994.
[5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New
York: Wiley, 1997.
[6] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Cicero S. Vaucher (M’98) was born in São Fran-
Commun., vol. 28, no. 11, pp. 1849–1858, Nov. 1980. cisco de Assis, Brazil, in 1968. He graduated in elec-
[7] H. Meyr and G. Ascheid, Synchronization in Digital Communica- trical engineering from the Universidade Federal do
tions. New York: Wiley, 1990. Rio Grande do Sul, Porto Alegre, Brazil, in 1989.
[8] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979. He joined the Integrated Transceivers group
[9] B. Razavi, Ed., Monolithic Phase-Locked Loops and Clock Recovery of Philips Research Laboratories, Eindhoven,
Circuits. New York: IEEE Press, 1996. The Netherlands, in 1990, where he works on
[10] V. F. Kroupa, “Noise properties of PLL systems,” IEEE Trans. implementations of low-power building blocks for
Commun., vol. C-30, pp. 2244–2552, Oct. 1982. frequency synthesizers, on synthesizer architectures
[11] C. S. Vaucher, “Synthesizer architectures,” in Analog Circuit Design, R. for low-noise/high-tuning-speed applications, and
J. van de Plassche, Ed. Norwell, MA: Kluwer, 1997. on CAD modeling of PLL synthesizers.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000 1445

Fast-Switching Frequency Synthesizer with a


Discriminator-Aided Phase Detector
Ching-Yuan Yang, Student Member, IEEE, and Shen-Iuan Liu, Member, IEEE

Abstract—A phase-locked loop (PLL) with a fast-locked II. BASIC IDEA AND MODEL
discriminator-aided phase detector (DAPD) is presented. Com-
pared with the conventional phase detector (PD), the proposed A simple charge-pump PLL consists of four major blocks: the
fast-locked PD reduces the PLL pull-in time and enhances the phase detector (PD), the charge-pump circuit, the loop filter, and
switching speed, while maintaining better noise bandwidth. The the voltage-controlled oscillator (VCO) [3]–[6]. Fig. 2 shows
synthesizer has been implemented in a 0.35- m CMOS process, the linear model of a charge-pump PLL-based frequency syn-
and the output phase noise is 99 dBc/Hz at 100-kHz offset.
Under the supply voltage of 3.3 V, its power consumption is 120
thesizer. The closed-loop transfer function can be represented
mW. as
Index Terms—Bandwidth adjusting, fast acquisition, fast
locking, frequency synthesizers, phase detectors, phase-locked (1)
loops.
The conventional PD is implemented in conjunction with a
I. INTRODUCTION charge-pump loop filter in the PLL, as illustrated in Fig. 3.
To determine the transfer function of the PD, assume there is
P HASE-LOCKED loop (PLL) circuits have been found to
be useful wherever there is a need to synchronize a local os-
cillator with an independent incoming signal, such as serial data
a time interval between two input signals and in the
PD, the output current of the charge-pump circuit is a pulse of
duration , and the amplitude of the charge-pump current is
links and RF wireless communications. In order to optimize the
. In the continuous-time approximation, the average value
loop performance, some features should be taken care of [1], [2].
per input signal period can be given as
First, to minimize output phase jitter due to external noise, the
loop bandwidth should be made as narrow as possible. Second,
to minimize output jitter due to internal oscillator noise, or to ob- (2)
tain best tracking and acquisition properties, the loop bandwidth
should be made as wide as possible. These principles obviously The transfer function curve of a linear PD is shown in Fig. 4(a),
oppose each other; and therefore some compromises between where the vertical axis represents the charge injected into the
these two principles are always inevitable. The block diagram loop filter during one period of the input signal. The character-
of a PLL with a discriminator-aided phase detector (DAPD) is istic of a nonlinear PD, as shown in Fig. 4(b), can be divided
shown in Fig. 1. One could leave the discriminator connected into two regions [7]. It has the same characteristic within the
permanently and/or merely weight the relative contributions of locked-in region as that of the linear PD, but the acquisition
the system so as to obtain the desired damping. The discrimi- time will be reduced with the steeper characteristic outside the
nator-aided path adds to lock the PLL quickly. Once the PLL lock-in region. When designing a PLL with the nonlinear PD,
is in lock, a better bandwidth can be maintained while the dis- first the central slope is determined to fulfill the requirement of
criminator is disconnected. noise and modulation for the PLL with a standard PD. Then,
In this paper, a novel DAPD is presented to reduce pull-in the slope near is gradually increased to improve ac-
time and to enhance the switching speed of the PLL, while quisition speed. The proposed nonlinear PD can be built with
maintaining the same noise bandwidth and avoiding modula- delay cells and standard PD circuits, as shown in Fig. 4(c). The
tion damping. Section II describes the basic concept of the pro- standard PD is a digital circuit, triggered by the positive edge
posed structure. Sections III and IV present the realization and of the input reference signal and the output feedback signal
the measurement of the system, respectively, and Section V con- . Considering the delay cells with , the PDs decide the
cludes the paper. position of the phase difference among these regions. Ac-
cording to the value of , the charge pump will output the cor-
Manuscript received November 30, 1999; revised April 28, 2000. This responding current controlled by the up signals or the down
work was sponsored by the National Science Council under Contract signals . The behavior model of the nonlinear PD can be ex-
88-2219-E-002-024. plained by the waveforms of Fig. 4(d). According to the time
C.-Y. Yang was with the Department of Electrical Engineering, National
Taiwan University, Taipei, Taiwan 10617, R. O. C. He is now with the difference between both input signals and , the up signals
Department of Electronic Engineering, HuaFan University, Taipei, Taiwan are used to increase and the down signals are used to de-
223, R.O.C. crease the frequency of signal . The nonlinear PD always gen-
S.-I. Liu is with the Department of Electrical Engineering, National Taiwan
University, Taipei, Taiwan 10617, R. O. C. erates the right signal to equalize the frequency of both input
Publisher Item Identifier S 0018-9200(00)08697-2. signals as the conventional PD. The time interval is positive
0018–9200/00$10.00 © 2000 IEEE
1446 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Fig. 1. Block diagram of PLL with discriminator-aided phase acquisition.

Fig. 2. Linear model of PLL frequency synthesizer.

and a resistor with a capacitor added in parallel as shown


in Fig. 2. The impedance of this filter can be

(4)

with and . The


open-loop gain of the PLL equals

(5)

Fig. 3. Phase detector with charge-pump filter.


which has a crossover frequency of

(negative) where leads (lags) . When is larger than


, may appear “high” level; when is smaller (6)
than may appear “high” level. As the nonlinear PD
is applied, two cases can occur during different time interval : The open-loop gain of this third-order PLL can be calculated in
Case 1: , the injected charge . terms of the frequency , as follows:
Case 2: , the injected charge
, which can be approxi-
mated as , is very small.
Generally, the total transfer function of the PD with the cur-
(7)
rent-pump circuit and the loop filter can be expressed as [8]

and its phase margin can be determined in terms of


(3)
(8)
where is the pump current of the charge-pump circuit. The
impedance is the series connection of a capacitor In order to maintain the same loop gain and phase margin for
YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER 1447

Fig. 4. (a) Characteristic of the conventional linear phase detector. (b) Characteristic of the nonlinear phase detector. (c) Block diagram of the nonlinear phase
detector. (d) Operation of the nonlinear phase detector.

Fig. 5. Block diagram of the frequency synthesizer.

the sake of stability, the charge-pump current becomes III. CIRCUIT REALIZATION
instead of outside the locked-in region, and the loop-filter
A. Architecture
resistor would become instead of while increases
times, i.e., the loop bandwidth increases times. It may speed The designed frequency synthesizer integrates the proposed
up the switching capability of the PLL. Once it is locked on DAPD, the charge-pump circuit, a prescaler, and a VCO in a
the correct frequency, the PLL will then return to the low-noise single CMOS chip. It is similar to the structure of a conventional
operation. integer-N frequency synthesizer, as shown in Fig. 5. By adding
1448 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Fig. 6. Schematic of phase detector with DAPD and charge-pump filter.

the frequency-doubling block, the output frequency can be up are placed on a factor of four below and above ,
to 900 MHz from a 450-MHz VCO. respectively. In addition, a pump current of 560 A is applied
and the parameter is chosen. The values of the resistors
B. Phase Detector with DAPD and Charge-Pump Filter and and the capacitors and are 470 , 235 , 33 nF,
A schematic diagram of the DAPD is shown in Fig. 6. The phase and 2.2 nF, respectively. The open-loop gain response is depicted
frequency detectors are used to compare the phase difference of in Fig. 7. Curve (a) is the characteristic of the PLL with the DAPD
both input signals. The output signal ( ) of the DAPD de- while the bandwidth is 120 kHz. However, the PLL will return
pends on the phase difference of both input signals whether it is curve (b) with the bandwidth of 40 kHz when it is near in lock.
larger than or not. Considering the delay cells with delay These curves give the same phase margin of approximately 60 .
, which is very small but never negligible, the DAPD decides Thus the PLL would be usually stable.
the operating bandwidth of the loop filter. When leads , the Currently, most frequency synthesizers use phase-frequency
time difference islarger than , and is “low”and is detectors (PFDs) as their PDs. A PFD is a sequential circuit which
“high.” Otherwise, when lags is smaller than , and can not only detect the phase error but also provides a frequency-
is “high” and is “low.” In a word, if the absolute value sensitive signal to aid acquisition when the loop is out of lock.
of the time difference between input signals is larger than , The drawback of some conventional PFDs is a dead zone in the
may appear “high” level. Also, the charge-pump current be- phase characteristic, which generates the phase error in the output
comes andtheresistoroftheloopfilterbecomes ,i.e., signals. To solve this problem, a dynamic CMOS PFD is adopted
. Until the absolute value of is within and as shown in Fig. 8(b), which is similar to the one proposed in
are both “high,” thus is brought to “low” level, then [10]. The PFD consists of two half-transparent registers, shown
the charge-pump current and the resistor return to and , re- in Fig. 8(a), [9] and a NAND gate. It is triggered by the negative
spectively, with a narrower bandwidth for better noise rejection. edge of input signals. The timing diagram of the PFD is shown in
However, the delay cell is adopted according to the VCO’s noise. Fig. 8(c). Even though the input signals are in-phase, the glitches
Assuming that the phase characteristic of the signal is , caused by the reset path always exist. So, extra filters are added
should be larger than to make the DAPD work. in the DAPD to remove the effect of the glitches.
In our design, the loop bandwidth of the PLL equals about So far, the positive gain of the VCO is applied from the above
krad/s, and the loop gain zero and the loop pole discussion. However, since the gain of the VCO is negative as
YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER 1449

Fig. 7. Simulated open-loop gain Bode plot.

(a) (b)

(c)
Fig. 8. Implementation of phase-frequency detector. (a) Half-transparent cell. (b) Phase-frequency detector (PFD). (c) Timing diagram of PFD.

described later, and of the PD connected to the charge mode [12]–[15]. It consists of a synchronous divide-by-4/5
pump should be interchanged. The charge pump, which is based counter as the first stage and an asynchronous divide-by-8
on one described in [11], is adopted. It suppresses the charge counter as the second stage. The circuits in the first stage
sharing from the parasitic capacitance by a pair of switched- are fully differential, while the single-ended logic circuits
current sources. are used in the second stage. To reduce the supply noise, an
emitter-coupled logic (ECL)-like differential logic is used in
C. Dual-Modulus Prescaler the high-speed stage [16]. In the divide-by-4/5 circuit, the
DFF is a differential flip-flop. Fig. 10 shows the schematic
The dual-modulus prescaler is the high-frequency building diagram of a NAND-gate logic flip-flop. Merging the logic gates
block in the frequency synthesizer. This circuit shown in Fig. 9 to a flip-flop saves power and increases the operating speed.
divides the frequency of the VCO output signal by a factor of The toggle flip-flops are made by true single-phase clocking
32 or 33 depending on the logic value of the controlled signal (TSPC) DFFs of [12] behind a differential-to-single buffer.
1450 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Fig. 11. VCO schematic.

Fig. 9. Functional block diagram of the dual-modulus prescaler.

Fig. 12. IC microphotograph.

Fig. 10. Schematic of the differential NAND-gate flip-flop.

This buffer is used to achieve the rail-to-rail output signal in


the low-speed stage.

D. VCO
Fig. 13. Experimentally measured VCO transfer curve.
The VCO is another high-frequency building block in a fre-
quency synthesizer. Still, an ECL-like current-mode differential
IV. MEASUREMENT RESULTS
pair, as shown in Fig. 11, is used as a delay cell [17], [18] to
achieve high common-mode rejection in a four-stage ring oscil- The synthesizer is implemented in a 0.35- m CMOS process.
lator. The coarse tuning of the ring-oscillator’s center frequency The microphotograph of the fabricated frequency synthesizer
is achieved by the bias Vbpo1 (or through the use of a dig- is shown in Fig. 12. The loop filter is off-chip, and the output
ital-to-analog converter), and a fine tuning technique is needed signal of the VCO is connected to a source follower. The fre-
for the PLL voltage-control path. The gain required for the os- quency synthesizer is measured at a supply voltage of 3.3 V.
cillator is easily determined by the ratio of M1 and M2 as the The frequency of the reference signal is 14 MHz. Fig. 13 shows
current gain. The proposed delay cell has the better noise per- the measured VCO transfer function by varying the controlled
formance because the operation of the circuit is carried out by voltage. The measured VCO has a monotonic frequency range
the differential signal immune to the power-supply-injected and of 435–485 MHz. The gain of the VCO is 32.4 MHz/V at the
substrate-injected noise sources. The replica bias circuit adjusts center frequency of 460 MHz. Fig. 14 shows the output signal
the load over a wide range in response to a swept supply cur- spectrum (using HP8560A Spectrum Analyzer after locked) of
rent. It insures the output swing of delay cells maintain fixed 448 MHz with the phase noise 99 dBc/Hz at 100-kHz offset.
and takes a changeable bias current to cover a suitable range of By adding an external frequency doubler, however, the phase
different output frequencies. Bypass capacitors are also an im- noise is 91 dBc/Hz at 100-kHz offset from 896-MHz carrier
portant consideration for the replica bias and voltage reference as shown in Fig. 15. Also, the measured waveform in the time
circuits. On-chip bypass capacitors can be used to help reduce domain is also shown in Fig. 16, and its rms and peak-to-peak
their noise contribution to the ring-oscillator delay cells. jitter measured by CSA803 (Communication Signal Analyzer)
YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER 1451

Fig. 14. Measured output spectrum of the frequency synthesizer. (a) With span
50 MHz. (b) With span 1 MHz.

Fig. 16. Measured waveform. (a) In time domain. (b) Jitter performance.

Fig. 15. Measured output spectrum of the frequency synthesizer with added
frequency doubler.

are 18.9 and 110 ps, respectively. Another important parameter


is the time that the PLL takes to lock in to a new frequency when
channel switches. Fig. 17 shows the switching waveforms for
a frequency jump from 448 to 462 MHz from the HP53310A
Modulation Domain Analyzer. Obviously, the DAPD can im-
prove the switching speed of the PLL. The power consumption
is 120 mW and the chip area is 40 2.0 mm including the pad
areas. Fig. 17. Measured frequency jump waveform of the frequency synthesizer.
1452 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

V. CONCLUSION [12] Q. Huang and R. Rogenmoser, “Speed optimization of edge-triggered


CMOS circuits for gigahertz single-phase clocks,” IEEE J. Solid-State
In this paper, a PLL with the DAPD is implemented in a Circuits, vol. 31, pp. 456–465, Mar. 1996.
0.35- m CMOS process. The proposed DAPD can be applied [13] B. Chang, J. Park, and W. Kim, “A 1.2- GHz CMOS dual-modulus
prescaler using new dynamic D-type flip-flop,” IEEE J. Solid-State Cir-
to enhance the switching speed of the PLL, but maintain better cuits, vol. 31, pp. 749–752, May 1996.
noise bandwidth. When adding the DAPD in the PLL, it will [14] P. Larsson, “High-speed architecture for a programmable frequency di-
control the charge pump and loop filter and still maintain vider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol.
31, pp. 744–748, May 1996.
the loop stablity with the same phase margin as in the steady [15] C. Y. Yang, G. K. Dehng, J. M. Hsu, and S. I. Liu, “New dynamic
state. The prototype frequency synthesizer using this structure flip-flops for high-speed dual-modulus prescaler,” IEEE J. Solid-State
is also implement at 448 MHz, and the output waveform Circuits, vol. 33, pp. 1568–1571, Oct. 1998.
[16] F. Piazza and Q. Huang, “A low-power CMOS dual-modulus prescaler
is 99 dBc/Hz at 100-kHz offset. By adding a frequency for frequency synthesizer,” IEICE Trans. Electron., vol. E80-C, pp.
doubler, the synthesizer can operate at 896 MHz, and the output 314–319, Feb. 1997.
waveform is 91 dBc/Hz at 100-kHz offset from carrier. [17] S. J. Lee, B. Kim, and K. Lee, “A fully integrated low-noise 1-GHz
frequency synthesizer design for mobile communication application,”
IEEE J. Solid-State Circuits, vol. 32, pp. 760–765, May 1997.
ACKNOWLEDGMENT [18] D. Y. Jeong, S. H. Chae, W. C. Song, and G. H. Cho, “High-speed dif-
ferential-voltage clamped current-mode ring oscillator,” Electron. Lett.,
The authors would like to thank the SHARP Technology vol. 33, pp. 1102–1103, June 1997.
Company, Japan, for the fabrication of the chip.

REFERENCES Ching-Yuan Yang (S’97) was born in Miaoli,


Taiwan, R.O.C., in 1967. He received the B.S.
[1] F. M. Gardner, Phaselock Techniques, 2nd ed. New York, NY: Wiley, degree in electrical engineering from the Tatung
1979. Institute of Technology, Taipei, Taiwan, in 1990, and
[2] P. Larsson, “Reduced pull-in time of phase-locked loops using a simple the M.S. and Ph.D. degrees in electrical engineering
nonlinear phase detector,” IEE Proc. Commun., vol. 142, no. 4, pp. from National Taiwan University, Taipei, in 1996
221–226, Aug. 1995. and 2000, respectively.
[3] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs, He is currently and Assistant Professor with the
NJ: Prentice-Hall, 1991. Department of Electronic Engineering, Huafan Uni-
[4] F. M. Gardner, “Charge-pump phase-locked loops,” IEEE Trans. versity, Taiwan. His research interests are in the area
Commun., vol. COM-28, pp. 1849–1858, Nov. 1980. of integrated circuits and systems for high-speed in-
[5] R. E. Best, Phase-Locked Loops: Theory, Design and Applica- terfaces and wireless communications.
tions. New York, NY: McGraw-Hill, 1984.
[6] M. V. Paemel, “Analysis of a charge-pump PLL: A new model,” IEEE
Trans. Commun., vol. 42, pp. 2490–2498, July 1994.
[7] C. Y. Yang, W. C. Chung, and S. I. Liu, “Effectively reduced pull-in Shen-Iuan Liu (S’88–M’93) was born in Keelung,
time of PLL with nonlinear phase comparator,” in 8th VLSI/CAD Symp., Taiwan, R.O.C., on April 4, 1965. He received the
Taiwan, R.O.C., Aug. 1997, pp. 205–208. B.S. and Ph.D. degrees in electrical engineering from
[8] D. Byrd, C. Davis, and W. O. Keese, “A fast locking scheme for PLL National Taiwan University, Taipei, Taiwan, in 1987
frequency synthesizer,” National Semiconductor, Santa Clara, CA, Ap- and 1991, respectively.
plication Note, July 1995. During 1991 to 1993, he served as a Second
[9] J. Yuan and C. Svensson, “Fast CMOS nonbinary divider and counter,” Lieutenant in the Chinese Air Force. During 1991 to
Electron. Lett., vol. 29, pp. 1222–1223, June 1993. 1994, he was an Associate Professor in the Depart-
[10] S. Kim, K. Lee, Y. Moon, D. K. Jeong, Y. Choi, and H. K. Kim, “A 960- ment of Electronic Engineering, National Taiwan
Mb/s/pin interface for skew-tolerant bus using low jitter PLL,” IEEE J. Institute of Technology. He joined the Department of
Solid-State Circuits, vol. 32, pp. 691–700, May 1997. Electrical Engineering, National Taiwan University,
[11] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator with Taipei, Taiwan in 1994 and has been a Professor since 1998. He holds nine
5- to 110-MHz of lock range for microprocessors,” IEEE J. Solid-State U.S. patents and fourteen R.O.C. patents, with some pending. His research
Circuits, vol. 27, pp. 1599–1607, Nov. 1992. interests are in analog and digital integrated circuits and systems.
2232 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Low-Power Dividerless Frequency


Synthesis Using Aperture Phase Detection
Arvin R. Shahani, Derek K. Shaeffer, Student Member, IEEE, S. S. Mohan, Student Member, IEEE,
Hirad Samavati, Student Member, IEEE, Hamid R. Rategh, Student Member, IEEE,
Maria del Mar Hershenson, Student Member, IEEE, Min Xu, Student Member, IEEE,
C. Patrick Yue, Student Member, IEEE, Daniel J. Eddleman, Student Member, IEEE,
Mark A. Horowitz, Senior Member, IEEE, and Thomas H. Lee, Member, IEEE

Abstract—A phase-locked-loop (PLL)-based frequency synthe-


sizer incorporating a phase detector that operates on a windowing
technique eliminates the need for a frequency divider. This new
loop architecture is applied to generate the 1.573-GHz local
oscillator (LO) for a Global Positioning System receiver. The
LO circuits in the locked mode consume only 36 mW of the
total 115-mW receiver power, as a result of the power saved by
eliminating the divider. The PLL’s loop bandwidth is measured
to be 6 MHz, with a reference spurious level of 047 dBc. The
front-end receiver, including the synthesizer, is fabricated in a
0.5-m, triple-metal, single-poly CMOS process and operates on
a 2.5-V supply.
Index Terms—Frequency synthesizers, Global Positioning Sys-
tem, phase detection, phase-locked loops, radio-frequency inte-
grated circuits, radio receivers.

Fig. 1. GPS receiver architecture.


I. INTRODUCTION

T HE growing demand for portable, low-cost wireless-


communication devices has spurred interest in radio-
frequency integrated circuits. Part of offering a completely
integrated solution involves identifying a low-power, mono-
lithic gigahertz local oscillator (LO) implementation. A quartz-
crystal-based oscillator cannot be used directly for the LO,
since the fundamental modes of inexpensive quartz crystals
are limited to approximately 30 MHz [1], and overtone orders
of 50 are impractical. However, a crystal oscillator can be N
Fig. 2. Integer- synthesizer block diagram.
used as the reference in a static-modulus phase-locked-loop
(PLL) frequency synthesizer. As is well known, the stability of for a divider: the aperture phase detector (APD). Treatment
the frequency-multiplied reference is retained by a wideband begins at the architectural level and descends into the APD’s
loop. This ability to synthesize a stable high-frequency source detailed nature. Both the theory and the implementation of
is beneficial, but it comes at the expense of significant power an APD are covered. Section III presents experimental results
consumption. This paper addresses the power issue by intro- on the APD PLL.
ducing a new type of phase detector capable of phaselocking
the synthesizer’s frequency-multiplied output to its reference II. PLL
input, without the use of a divider. Eliminating the need for
the divider allows the synthesis of a 1.573-GHz output on only A. Architecture
36 mW of power in this technology. The conventional and widely used implementation of the
Section II examines the PLL-based LO used for the Global PLL frequency synthesizer with static modulus is the integer-
Positioning System (GPS) receiver architecture shown in synthesizer [3]. The traditional divide-by- block shown
Fig. 1 [2] and introduces the element that eliminates the need in Fig. 2 can be realized with a single counter. However,
there are two drawbacks associated with the divider: power
Manuscript received May 7, 1998; revised August 4, 1998. consumption and switching noise. Power consumption is large,
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305 USA. particularly at high frequencies, because of the well-known
Publisher Item Identifier S 0018-9200(98)09432-3. relationship. For example, a recently published 1.6-
0018–9200/98$10.00  1998 IEEE
SHAHANI et al.: FREQUENCY SYNTHESIS USING APD 2233

(a)

Fig. 4. APD synthesizer block diagram.

(b)

(c)

Fig. 5. Idealized APD state diagram.

alignment is unambiguous, as pictured in Fig. 3(b). Because


the PFD compares phase over the entire reference cycle, a
PFD cannot phaselock two inputs at different frequencies. In
fact, it is precisely this property that makes the PFD popular.
(d)
Now consider using a PFD without a divider. Clearly, there
Fig. 3. Phaselock techniques. (a) Phaselocked signals. (b) Phaselock with a
divider and PFD. (c) PFD along; negative charge pump current commands
would be an edge ambiguity problem, rendering the PFD
the VCO to decrease its frequency, breaking phaselock. (d) Phaselock with quite ineffective, as seen in Fig. 3(c). The reason is that the
an APD. PFD responds to every edge of the VCO, evidenced by the
charge pump current’s net negative value. This erroneously
GHz integer- synthesizer built in a 0.6- m CMOS technol- commands the VCO to decrease its frequency. However, by re-
ogy reported a total power consumption of 90 mW, of which stricting the time interval during which phase is examined, one
22.5 mW were used by the divider [4]. A further disadvantage may eliminate the edge ambiguity, and hence the frequency
of the divider is the on-chip interference generated by its high- divider. The dashed boxes in Fig. 3(d) define the window
speed digital transitions. This is particularly worrisome if the during which phase may be compared, even if the two inputs
synthesizer is to be integrated with the front end’s sensitive are of different frequency. The window can be controlled by
low-noise amplifier. the reference time base, since it periodically opens at that
To reduce power consumption and high-frequency noise, a rate. Furthermore, the window need only be wide enough
windowing technique that eliminates the divide-by- block so that a VCO edge falls within it, which is equivalent to
for phase comparisons is investigated here. To appreciate how requiring that the window be active for a time longer than the
windowing may be of benefit, it is worthwhile to revisit instantaneous VCO period. No dividers are thus necessary to
the phenomenon of locking in a conventional PLL. To re- maintain phaselock, and this phase detector, called an APD,
tain phaselock, it is necessary to align every th rising or can operate with two inputs that are at different frequencies,
falling edge of the voltage controlled oscillator (VCO) with as shown in Fig. 4.
a corresponding reference edge. Phaselock is demonstrated in A more substantive description of the APD’s operation is
Fig. 3(a) for , where every fourth rising VCO edge provided in Fig. 5, which illustrates the state diagram for an
lines up with a rising reference edge. A divider with a phase- idealized APD. When the window opens, the phase detector
frequency detector (PFD) accomplishes edge alignment by first becomes active. The -input rising edge sets the (denoting
dividing down the VCO by the right multiple so that edge “late”) terminal true, and the -input rising edge sets the
2234 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Fig. 7. APD PLL block diagram in lock.

where is the reference phase and

(3)

where is the VCO phase and is the angular VCO fre-


N FAA.
Fig. 6. APD synthesizer block diagram with integer- quency. The average charge pump current over one reference
cycle can thus be written as
(denoting “early”) terminal true. Subsequent edges of the
-input are ignored until the next window opens. The time (4)
difference between the rising edges of the and signals
is proportional to the phase error between the reference phase When the loop is in lock, , giving
and the VCO phase. If is set first, the VCO phase is late;
and conversely, if is set first, the VCO phase is early. When (5)
the window closes, the and terminals are reset (to false).
Fig. 1 shows that some type of frequency acquisition aid where is the phase-detector gain constant. Note that even
(FAA) is required to bring an APD-based loop initially into though there is no explicit divider in the loop, the VCO phase
lock. This necessity is a consequence of restricting phase is divided by in (5), just as in a conventional loop.
comparisons to a window, which eliminates the phase de- This model can be used in place of the APD in Fig. 4, and
tector’s ability to perform frequency detection. This issue is the other blocks in the same figure can be replaced by their
discussed in further detail in Section II-D. For this work, an corresponding linear time-invariant (LTI) models, yielding
external acquisition aid was used for experimental purposes. the overall system model shown in Fig. 7. Fig. 7 is an LTI
An integrated implementation of the acquisition aid, Fig. 6, representation of the APD PLL in lock, from which the phase
uses the traditional divider with PFD to lock the loop and transfer function is readily found to be
then powers down the acquisition aid, transferring control to
the low-power APD. An APD can be used once in lock because (6)
the reference is derived from a stable crystal oscillator.
where is the VCO gain constant and is the loop
B. Loop Theory filter’s impedance, expressed in the -domain.
Having provided an overview of APD operation, we now
develop a linearized APD PLL model relating input and output C. APD Characteristic ( Versus )
phase. This model is important for quantitative loop design and The derivation in the previous subsection treats the APD
ensures that the synthesized output has the desired stability for small phase errors. For completeness, it is instructive to
and noise performance. examine the response of the APD to arbitrary phase errors.
From the description of the late and early APD signals given Now, the delay between the time the window opens and the
in the previous subsection, the average charge pump current time at which the reference edge occurs becomes important.
over one reference cycle is given by This delay is designated by , which is a positive quantity
whose least restrictive range is limited to , where is
(1)
the reference period. However, the loop can lock if and only
where is the magnitude of the charge pump current, is if is in the interval , where is the VCO period.
the time of the first VCO rising edge in the window, is the Otherwise, the first VCO edge within the window will always
time of the reference rising edge in the window, and is the precede the reference edge.
angular reference frequency. The current can be expressed From Fig. 8, it is apparent that the characteristic will be
as a function of the reference and VCO phases by relating these periodic in VCO phase, because when the VCO waveform has
phases to and , assuming small phase errors. Expressions moved one VCO period to the right, the situation is identical
relating edge time to signal phase are to the start. As the VCO waveform moves to the right, the
time difference varies proportionally with phase error
(2) . Therefore, to find the APD’s characteristic, and need
to be calculated at only two points, and the remainder of the
SHAHANI et al.: FREQUENCY SYNTHESIS USING APD 2235

Fig. 10. APD characteristic for d = (Tv )=2 and N = 4.

Fig. 8. Position of VCO and reference edge in window.

Fig. 11. M = 2; N = 7 subharmonic-lock mode.

Fig. 9. APD characteristic over (2 )=N interval.


phase error’s periodicity, with larger values increasing the
periodicity. Fig. 10 shows the complete APD characteristic (a
characteristic is generated by connecting these endpoints. variation in ) for the specific case where and
and are first calculated for .
(7)
D. Subharmonic-Lock Modes
(8)
The existence of subharmonic-lock modes explains the need
Next, and are calculated at the other extreme where for an acquisition aid. During each window, which opens
periodically at the reference rate, the APD makes a single
phase comparison. It is this property that allows an APD to
(9) phaselock the VCO’s output to an integer multiple of the
reference input. But the ability to examine the phase of two
(10) signals at different frequencies introduces more modes than
just the desired integer-lock modes. Additional subharmonic-
From this information, the portion of the APD characteristic lock modes occur if the net current delivered over multiple
shown in Fig. 9 can be constructed. cycles of the reference is zero, allowing the loop to stay locked
The influence of two parameters— , the delay between at an undesired frequency [5].
the time the window opens and when the reference edge If we designate by the number of reference cycles over
occurs, and , the ratio between the VCO and reference which the net charge delivered to the loop filter is zero, then
frequencies—warrants special attention. Decreasing shifts an expression relating the reference frequency to the VCO
the characteristic diagonally up (along the line of the charac- frequency when phaselock occurs is . Fig. 11
teristic), and increasing shifts the characteristic diagonally displays the points on the APD characteristic between which
down. It is desirable to have equal to half the synthesized the loop ping-pongs for the specific case where , and
frequency’s period. By designing for this condition, the APD . Because , the charge pump alternates between
characteristic will be centered about to provide a pumping up on one cycle and pumping down on the next cycle,
symmetrical correction range. The parameter affects the balancing the charge to the loop filter over two cycles.
2236 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

The behavior of this APD circuit differs somewhat from


the ideal APD discussed in Section II-A. In particular, the
circuit implementation responds to falling edges instead of
rising edges, and more precisely, the precharged gates act
as level detectors of a low voltage level instead of as edge
detectors.
A simulation of the APD characteristic over a interval
is shown in Fig. 13, where . The phase error is
plotted against the average charge pump current over one
reference cycle , and includes the nonidealities of the charge
Fig. 12. APD circuit diagram. pump as well. The flat section near 0.2 rad is where the
signal driving the charge pump is compressing due to
These subharmonic-lock modes are problematic because the level detection nature of the precharged gates. Another
they are spaced, in frequency, closer than the neighboring imperfection in this circuit’s APD characteristic is the section
integer-lock modes. However, the APD favors integer over with finite negative slope instead of a discontinuity. From the
subharmonic modes for two reasons. First, the loop’s band- characteristic, the phase detector’s gain constant in a state
width imposes a limit on . If the number of reference cycles of zero static phase error is evaluated to be 7.4
over which the charge pump current averages to zero grows A/rad.
too large, the loop will act on partial information because the
loop responds to signals averaged over a loop period. The loop F. PLL Circuit Implementation
period is the reciprocal of the closed-loop bandwidth. Another In Section II-B, a general model for a locked APD PLL was
reason the APD favors integer over subharmonic modes is that developed, expressing the closed-loop phase transfer function
the subharmonic modes have a lower detector gain because in terms of the loop filter’s -domain impedance and an
the VCO edge arrives at a different time in each of the idealized VCO. We now provide specific expressions for the
cycles. If the APD characteristic is nonlinear, then the overall loop as actually implemented.
detector gain is the average of the individual linearized The loop filter used is the conventional network shown in
detector gains. Fig. 14, whose -domain impedance is
Using an FAA to ensure frequency lock eliminates the
concern of locking in a subharmonic mode. Once lock has
(11)
been achieved, and control transferred to the APD, the APD
is capable of maintaining lock at the desired frequency.
A single-pole amplifier was used to interface with the VCO’s
E. APD Circuit Implementation varactor, thus the ideal VCO transfer function must
Fig. 12 shows a circuit implementation of an APD. The be modified to
reference clock (which has about a 50% duty cycle) is shaped
by the structure preceding the delay to have fast falling edges, (12)
since these are the edges that enable the precharged gates.
When the reference input is low, is off and is on, where is the 3-dB bandwidth of the VCO’s preamplifier.
causing the output to be high. After the reference rises, Using (11) and (12) in (6) enables us to write the complete
shuts off before turns on due to the two inverter delays. phase transfer function of the implemented APD PLL as shown
Therefore, does not fight to pull the output low, and in (13) at the bottom of the page. In the next section, we
creates a fast falling edge. The window opens on this fast compare measured data to (13).
falling edge. The delay between the opening of the window
and the reference edge is determined by two inverters with a
capacitor in the middle. The APD uses two precharged gates III. EXPERIMENTAL RESULTS
to evaluate the reference and VCO phases. An advantage of A test chip (see Fig. 15) containing a copy of the APD
precharged gates is that they only respond once while active. PLL used in the complete GPS receiver is used to evaluate
In this case, the precharged gates are precharged low, and rise the APD PLL. Two separate tests are performed; one to verify
on detection of low levels. Also, the precharge action does not the derived closed-loop transfer function of the APD PLL and
affect the loop to first order, because the state the other to observe the synthesized LO spectrum for the GPS
has the same action as the state . receiver. In the second test, the synthesized LO is also checked

(13)
SHAHANI et al.: FREQUENCY SYNTHESIS USING APD 2237

Fig. 13. Simulated APD circuit characteristic.

Fig. 14. Loop filter used in APD PLL.

Fig. 16. PLL test setup number 1.

determined by the VCO’s phase noise, making measurement


of the PLL’s transfer function difficult. One of the largest
factors affecting measurement accuracy is the noise floor of
the instrument. To minimize this error source, measurements
Fig. 15. Die photograph of PLL on GPS receiver test chip.
of the floor with a clean source are performed first. These
results are later used to calibrate the data. Reference phase
with a microwave frequency counter to verify its long-term noise and PLL phase noise are also both measured. After some
stability. data processing, the PLL’s closed-loop phase transfer function
Fig. 16 shows the experimental setup for the first test. Phase is determined.
noise is measured for offsets from 1 kHz to 10 MHz with the Fig. 17 shows the measured and the predicted
HP8563E spectrum analyzer, which has special phase-noise- , from (13), for the case where the reference frequency
measurement software. Ten MHz is used as the upper limit is 143 MHz and the VCO frequency is 1.573 GHz .
since the loop is designed to have a bandwidth less than 10 The seven loop parameters in (13) are set as follows: is
MHz. Beyond the loop’s bandwidth, the PLL’s phase noise is known; is taken from measured VCO data; , and
2238 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Fig. 17. Measured and predicted j H (f ) .


j Fig. 19. LO spectrum.

TABLE I
MEASURED APD PLL PERFORMANCE

Fig. 18. PLL test setup number 2.

are taken to be their designed loop filter values; is


calculated from the technology data; and is fit. The fit value
of , 6.6 A/rad, is a little less than the simulated value noted The PLL has a wide bandwidth of 6 MHz, and the APD circuit
in Section II-E, 7.4 A/rad. Since for an ideal consumes only one-quarter of the total synthesizer power. With
APD, one could argue that the discrepancy in is due to the elimination of the divider, the main power consumer in
an actual pump current that is lower than the pump current the synthesizer is now the VCO.
used in simulations. But when is measured, it is found to
be correct. Still, the discrepancy in is readily explained.
The simulation in Fig. 13 establishes an upper bound on IV. CONCLUSION
because it is measured in a state of zero static phase error. A new method for performing phase detection that elimi-
The simulated APD circuit characteristic illustrates that the nates the divide-by- function within a PLL has been pre-
detector gain (i.e., the slope) decreases the farther that one sented. A frequency acquisition aid circuit, which can be
departs from zero radians. The charge pump is known to have powered down once lock is established, is required. By using
some offset; thus, the loop has some static phase error in lock an aperture phase detector, a 1.573-GHz local oscillator can be
to overcome the offset, resulting in a slightly lower . synthesized on roughly half the power of a loop containing a
Fig. 18 shows the experimental setup for the second test. conventional divider. Additionally, elimination of the divider
The LO spectrum is measured with the HP8563E spectrum also reduces the frequency of transitions that might cause
analyzer, and the frequency is checked with an HP5350B substrate and supply bounce. The power savings and noise
microwave frequency counter. Fig. 19 displays the synthesized reduction make the APD PLL an attractive design for low-
output spectrum, in which the PLL’s ability to track the power, integrated frequency synthesizers.
low close-in phase noise of the reference can be seen. The
visible skirts are due to the VCO’s phase noise outside the
6-MHz bandwidth of the PLL. Spurious tones at 47 dBc ACKNOWLEDGMENT
are primarily due to control-line ripple resulting from charge The authors gratefully acknowledge Rockwell International
pump leakage. In GPS applications, the measured spurious for fabricating the receiver and Dr. C. Hull and Dr. P. Singh
level is acceptable because of the absence of blockers at for their valuable assistance. In addition, the authors acknowl-
the corresponding offset frequencies. In more demanding edge Tektronix, Inc., for supplying simulation tools and E.
applications, one may reduce ripple through improved charge McReynolds for his invaluable support of, and assistance with,
pump design and the use of analog phase interpolation [3]. CMOS modeling issues. Last, the authors thank IBM for
Table I provides a summary of the APD PLL’s performance. generous student support through IBM fellowships.
SHAHANI et al.: FREQUENCY SYNTHESIS USING APD 2239

REFERENCES Min Xu (S’97), for a photograph and biography, see this issue, p. 2231.

[1] M-tron Engineering Notes, Dec. 1997.


[2] D. K. Shaeffer, A. R. Shahani, S. S. Mohan, H. Samavati, H. R. Rategh,
M. Hershenson, M. Xu, C. P. Yue, D. J. Eddleman, and T. H. Lee, “A
115-mW, 0.5-m CMOS GPS receiver with wide dynamic-range active
filters,” IEEE J. Solid-State Circuits, vol. 33, pp. 2219–2231, Dec. 1998. C. Patrick Yue (S’93), for a photograph and biography, see this issue, p. 2231.
[3] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits.
Cambridge: Cambridge Univ. Press, 1998.
[4] J. F. Parker and D. Ray, “A 1.6 GHz CMOS PLL with on-chip loop
filter,” IEEE J. Solid-State Circuits, vol. 33, pp. 337–343, Mar. 1988.
[5] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.
Daniel J. Eddleman (S’98), for a photograph and biography, see this issue,
p. 2231.

Arvin R. Shahani, for a photograph and biography, see this issue, p. 2041.

Mark A. Horowitz (S’77–M’78–SM’95) received the B.S. and M.S. degrees


Derek K. Shaeffer (S’98), for a photograph and biography, see this issue, in electrical engineering from the Massachusetts Institute of Technology,
p. 2230. Cambridge, in 1978 and the Ph.D. degree from Stanford University, Stanford,
CA, in 1984.
He is the Yahoo Founders Professor of Electrical Engineering and Computer
Science at Stanford. His research area is in digital system design. He has led
a number of processor designs including MIPS-X, one of the first processors
S. S. Mohan (S’98), for a photograph and biography, see this issue, p. 2231. to include an on-chip instruction cache; TORCH, a statistically scheduled,
superscalar processor; and FLASH, a flexible DSM machine. He has also
worked on a number of other chip design areas, including high-speed memory
design, high-bandwidth interfaces, and fast floating point. In 1990, he took
a leave from Stanford to help start Rambus, Inc., a company designing
Hirad Samavati (S’98), for a photograph and biography, see this issue, p.
high-bandwidth memory interface technology. His current research includes
2041.
multiprocessor design, low-power circuits, memory design, and high-speed
links.
Dr. Horowitz received a 1985 Presidential Young Investigator Award and
an IBM Faculty Development Award, as well as the 1993 Best Paper Award
Hamid R. Rategh (S’98), for a photograph and biography, see this issue, from the International Solid-State Circuits Conference.
p. 2231.

Maria del Mar Hershenson (S’98), for a photograph and biography, see this Thomas H. Lee (S’87–M’87), for a photograph and biography, see this issue,
issue, p. 2231. p. 2041.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001 777

A 1.8-GHz Self-Calibrated Phase-Locked Loop with


Precise I/Q Matching
Chan-Hong Park, Student Member, IEEE, Ook Kim, Member, IEEE, and Beomsup Kim, Senior Member, IEEE

Abstract—This paper describes a 1.8-GHz self-calibrated


phase-locked loop (PLL) implemented in 0.35- m CMOS
technology. The PLL operates as an edge-combining type frac-
tional- frequency synthesizer using multiphase clock signals
from a ring-type voltage-controlled oscillator (VCO). A self-cali-
bration circuit in the PLL continuously adjusts delay mismatches
among delay cells in the ring oscillator, eliminating the fractional
spur commonly found in an edge-combing fractional divider due
to the delay mismatches. With the calibration loop, the fractional
spurs caused by the delay mismatches are reduced to 55 dBc,
and the corresponding maximum phase offsets between the
multiphase signals is less than 0.2 . The frequency synthesizer
PLL operates from 1.7 to 1.9 GHz and the closed-loop phase noise
is 105 dBc/Hz at 100-kHz offset from the carrier. The overall
circuit consumes 20 mA from a 3.0-V power supply. Fig. 1. I/Q signal generation from a ring oscillator.
Index Terms—Delay mismatch, fractional- frequency synthe-
sizer, I/Q signal generation, PLL, ring oscillator, self-calibration. On the other hand, a ring oscillator may be used to produce
the quadrature clock signals without such a phase shifter [8],
I. INTRODUCTION [9]. The multiphase clock signals from a multistage ring oscil-
lator are easily converted into the I/Q clock signals, as shown in

I N A MODERN wireless digital data transmission system,


both in-phase (I) and quadrature-phase (Q) channels are used
for channel efficiency. Therefore, precise I and Q clock sig-
Fig. 1. However, in practice, the mismatches between the delay
cells cause significant I/Q phase errors, and an additional phase
shifter is therefore required.
nals for the modulation/demodulation of both I and Q channels A self-calibration technique to eliminate the delay mis-
are needed for high-performance digital transceivers. Since any matches between the delay cells in a ring oscillator is proposed
gain or phase imbalance between I and Q signals reduces the in this paper. A delay calibration loop in the phase-locked loop
dynamic range and degrades the bit-error rate (BER) of the re- (PLL) measures the delay mismatch in each delay cell at a
ceivers, the accuracy of the 90 phase difference between I and time and eliminates it through an extra control line attached to
Q clock signals generated from a local oscillator (LO) must be the delay cells. The calibration loop automatically operates in
maintained as far as possible. Especially for homodyne or image the background and barely interferes with the main PLL loop
rejection receivers, the effects of I/Q mismatch become more behavior.
critical [1]. A prototype 1.8-GHz edge-combining fractional- fre-
A lossy phase shifter utilizing resistors and capacitors [2], quency synthesizer equipped with the calibration circuit is
[3] or a quadrature generator using a frequency divider [4], implemented. Fig. 2 shows the block diagram. Thanks to
[5] is widely used to derive quadrature-phase clock signals the calibration technique, the fractional spur caused by the
from a single-phase oscillator. However, the RC phase shifter mismatches is attenuated by 25 dB and the I/Q phase offset is
circuit often produces phase and amplitude errors due to the maintained within 0.2 .
mismatches of components, and the quadrature generator may The structure of the proposed fractional- frequency syn-
suffer from phase imbalances due to the inexact input clock thesizer is presented in Section II. Section III describes the
duty cycle. For the correction of these I/Q phase errors, some delay mismatch problem of the edge-combining-type synthe-
analog or digital calibration techniques have been adopted [6], sizer PLL. Section IV shows the underlined algorithm for the
[7]; however, increased phase noise or complexity caused by proposed mismatch calibration scheme. Section V presents
the added calibration system limits the performance of the the circuit implementation of the self-calibrated frequency
system. synthesizer PLL. Finally, experimental results are shown in
Section VI, and conclusions are given in Section VII.
Manuscript received June 15, 2000; revised October 15, 2000.
C.-H. Park and B. Kim are with the Department of Electrical Engineering
and Computer Sciences, Korea Advanced Institute of Science and Technology,
II. EDGE-COMBINING FRACTIONAL- FREQUENCY
Taejon 305-701, Korea (e-mail: bkim@ee.kaist.ac.kr). SYNTHESIZER
O. Kim was with SK Telecom, Sungnam Kunggi-do 463-020, Korea. He is
now with Silicon Image, Inc., Sunnyvale, CA 94085 USA. The frequency resolution of a conventional integer- fre-
Publisher Item Identifier S 0018-9200(01)03031-1. quency synthesizer is the same as the reference frequency of
0018–9200/01$10.00 © 2001 IEEE
778 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 2. Block diagram of the proposed fractional- N frequency synthesizer


PLL with a self-calibration loop.

the synthesizer PLL. Therefore, narrow channel spacing is ac-


companied by a small loop bandwidth, which leads to slow dy-
namics [10]. In case of a fractional- frequency synthesizer,
the output frequency is a fractional multiple of the reference fre-
quency, so that narrow channel spacing is achieved along with a
higher phase detector frequency. Consequently, the loop band-
width can be widened, and faster settling time and lower close-in Fig. 3. (a) Block diagram of the fractional divider. (b) Operation of the
phase noise of the frequency synthesizer is achieved [10]. k
fractional divider when = 1.
The noninteger dividing values in a fractional- synthesizer
can be realized by the periodic dithering of the dividing ratio
between integer values [11]. However, the dithering leads to
a periodic phase error and introduces spurious tones in the
output spectrum. To resolve the spurious noise problem, several
methods, such as phase interpolation using a digital-to-analog
(D/A) converter [12], or altering the dithering pattern using a
– modulator [10], [11] have been proposed and used, but
they still have limitations such as increased power dissipation Fig. 4. Edges of the multiphase signals. (a) With delay mismatches. (b)
and spurious noise. Without delay mismatches.
On the other hand, if multiphase clock signals are available,
the noninteger divider is directly implemented without dithering
from to when the output pulse is multiplexed from
or interpolation. Fig. 3 shows the operation of the fractional-
out8 to out1.
divider using the proposed edge-combining technique. The
When the PLL is locked, the output frequency of the PLL is
output of an integer frequency divider is sequentially latched
, and the synthesizer operates as a modulo-8
by the eight-phase voltage-controlled oscillator (VCO) output
fractional- frequency synthesizer. The value of determines
signals, as shown in Fig. 3(a). As a result, a set of phase-shifted
the switching sequence of the pulse-switching block. The
waveforms is obtained, and the amount of the phase shift is
switching sequence and division ratio are controlled by a
1/8 of the VCO period. By manipulating the waveform set, a
separated control logic.
waveform whose period is a fractional multiple of the VCO
period is generated. For example, if the desired fractional
III. DELAY MISMATCH PROBLEM AND FRACTIONAL SPURS
value is 1/8, the pulse-switching block multiplexes the delayed
waveforms with following periodic sequence: Although multiphase clocks from a ring oscillator can be used
to implement a fractional- frequency synthesizer, an impor-
tant problem still exists: how to deal with the delay mismatches
between the delay cells in the VCO. Ideally, the phase differ-
and the output period of the fractional divider becomes ences among the multiphase signals from a ring oscillator are
, as shown in Fig. 3(b). Here, the division ratio may precisely equal, as shown in Fig. 4(a). However, in practice,
periodically change in order to create the noninteger division the edges of the multiphase clocks are not uniformly spaced,
ratio. For example, if the fractional ratio of 1/8 is required as as shown in Fig. 4(b). The delay mismatches arise from several
shown in Fig. 3(b), the division ratio should periodically switch causes such as mismatch, device size mismatch, and so on.
PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING 779

Fig. 5. s-domain model of a PLL.

Since each edge of the fractional- divider output is peri-


odically synchronized with one of the multiphase signals from
the VCO, the timing information on the delay mismatches is
contained in the divider output. Therefore, when the PLL is
locked, the delay mismatches introduce periodic phase errors
at the input of the phase-frequency detector (PFD). Due to the
periodic phase errors, fractional spurs appear in the output spec-
trum of the synthesizer. Also, if I and Q signals are tapped from
the VCO, the I/Q phase offset will appear. Therefore, the delay
mismatches must be eliminated to realize low phase-noise fre-
quency synthesizer.
The relationship between the phase errors and the fractional
spurs is derived from the noise transfer function of the PLL.
Fig. 5 shows the -domain model of the PLL. If only the effect of
the divider noise is considered, the output phase noise becomes

Fig. 6. Simulated output spectrum of the PLL with the maximum phase offset
(1) of 2.5 .

TABLE I
PARAMETERS FOR SPURIOUS NOISE SIMULATION OF THE PLL
where
PFD gain;
loop filter transfer function;
VCO gain;
dividing ratio;
output phase noise;
phase noise generated from the fractional divider. and the output spectrum of the PLL appears at . The
For the edge-combining frequency synthesizer, is relative power of the spurious tones is given by
mainly caused by the delay mismatches between the delay
cells, and is periodic because the fractional- dividing is per-
formed by periodic combining of the phase-shifted waveforms. dBc (3)
Assuming that is a sinusoidal, the output voltage of the
PLL is presented as
Fig. 6 shows the simulated output spectrums of the frequency
synthesizer. When the maximum phase offset is set to 2.5 ,
30-dBc spurious tones are shown near the carrier. During the
simulation, the design parameters of the PLL are selected as de-
scribed in Table I.

IV. MISMATCH CALIBRATION ALGORITHM


Fig. 7 shows the input waveforms of the PFD when the PLL
is locked and the fractional dividing ratio is set to 1/8. is
(2) the relative phase error corresponding to the th output signal of
780 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 7. Periodic phase error at the PFD input when PLL is locked without calibration.

the VCO. By adjusting the rising phases of the corresponding where is the phase error due to the th output signal after
outputs, the phase errors due to the delay mismatches can be cycles of calibration, and is the amount of the calibration
eliminated. for the th VCO output in the th iteration.
Since a PLL makes the average phase error zero in the locking If the above iteration is performed continuously until
mode, the sum of the individual phase error becomes zero when is satisfied for all delay cells, the final value of the phase
the PLL is locked. In other words, when the number of the mul- error due to the 1st VCO output becomes
tiphase clocks is eight
(4)
If the calibration circuit shifts the rising phase of the first output
by , the phase errors are temporarily changed to
(5) (9)
When the PLL is locked again
Similarly
(6)
(10)
based on (4), and by assuming that the phase disturbance, ,
is equally distributed to all delay cells to satisfy (6), the phase Consequently, all the phase errors become zero after finishing
errors become the completion of the calibration. Note that the calibration algo-
rithm performs correctly even if the amount of the individual
phase correction is different and even if the order of the calibra-
(7) tion is changed. Fig. 8 shows the trend of phase error during the
calibration, simulated by MATLAB. Here, maximum 10-mV
If this operation is performed on each output of the VCO one mismatches are assumed.
by one, the phase errors are changed as shown at the bottom of
the page, and this is the completion of one iteration of the cali- V. CIRCUIT IMPLEMENTATION
bration procedure. The resulted phase errors after first iteration
A. Overall Structure
are given by
As shown in Fig. 2, a loop for the calibration is combined
with the main fractional- synthesizer. The calibration loop pe-
riodically measures the phase error due to delay mismatch at
the PFD, and compensates for the mismatches by updating the
(8) offset control voltage of delay cells one after another. This up-
date operation must be performed only when the PLL is locked,

initially:

1st step:

2nd step:

.. .. .. ..
. . . .

8th step:
PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING 781

Fig. 8. Simulated behavior of phase error during the compensation. Fig. 10. Detail structure of the self-calibration loop.

one of the offset control signals is updated. The signal


is periodically asserted, and the period is much longer than the
locking time of the PLL. The output of the charge pump is se-
quentially connected to the one of the offset-control nodes in the
VCO. Since the sequence of the calibration is identical to the
pulse-combining sequence in the fractional- divider, the mea-
sured phase error affects only the corresponding VCO output.
When the fractional dividing ratio is zero, the frequency syn-
thesizer operates as an integer- type, and only one output of
the VCO is used for phase comparison. The digital logic con-
trols the sequence of the signal updating through the switches
in the capacitor array, .

VI. MEASUREMENTS

Fig. 9. Schematic of the delay cell having offset control capability. A self-calibrated fractional- frequency synthesizer PLL has
been fabricated in 0.35- m CMOS technology. The micropho-
tograph of the fabricated chip is shown in Fig. 11, and its active
because the phase error due to the mismatches can be accurately area is about mm . Both frequency synthesizers with
measured only in the locking mode. If the calibration interval is and without a calibration loop have been integrated in the same
shorter than the lock-in time of the main loop, the locking be- chip to demonstrate the proper operation of the mismatch cali-
havior of the main loop becomes disturbed and even unstable. bration scheme. In both cases, an external 25-MHz crystal os-
Therefore, it is important to make not only the calibration in- cillator is used as a reference clock. The bandwidth of the PLL
terval long enough but also the amount of phase change, , gen- is set to 1 MHz.
erated by the individual calibrating operation, small enough to Fig. 12 shows the measured output spectrum of the frac-
make sure that the main loop quickly responds. In this work, tional- RF synthesizer. Fig. 12(a) is the output spectrum of the
the loop gain of the calibration loop is chosen to be 1/10 of frequency synthesizer without a calibration loop. In this figure,
the main-loop loop gain. The updated offset control signals are the fractional dividing ratio is set to 1/8. Without calibration
maintained until the next update by a capacitor array connected circuit, 30-dBc spurious noise appears at 3.125 ( 25/8) MHz
to each delay cell. The delay cell used in this work is of the same offset from the carrier frequency. In this case, the maximum
type as the low-noise delay cell used in [13]. However, to con- phase offset is estimated as about 2.5 by the equations in
trol the rising phase of each output, four transistors are added, Section II. On the other hand, in the output spectrum of the
as shown in Fig. 9. For example, if is low, the rising of self-calibrated frequency synthesizer, the power of the spurious
is pulled earlier. tones is attenuated to 55 dB, as shown in Fig. 12(b), and
the calculated maximum phase offset is less than 0.2 . Initial
B. Calibration Loop settling of the calibration loop takes about 5.0 ms. Fig. 13 shows
Fig. 10 shows the circuit for the mismatch calibration. The the measured phase noise of the frequency synthesizer. The
calibration circuit consists of a PFD shared with the main loop, closed-loop phase noise at 100-kHz offset from the 1.8-GHz
an additional charge pump, and a capacitor array. When carrier is 105 dBc/Hz. Table II summarizes the measured
is high, the PFD output signal is driven to the charge pump, and characteristics of the PLL.
782 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 11. Microphotograph of the self-calibrated PLL.

Fig. 12. Output spectrum of the synthesizer PLL. (a) Without calibration. (b) With calibration.

VII. CONCLUSION
A self-calibrated 1.8-GHz PLL for fractional- frequency
synthesizing is fabricated in a 0.35- m CMOS process. A
ring-type oscillator is used to generate the multiphase signals,
and a self-calibration loop reduces the output fractional spurs
caused by delay mismatches between the delay cells. The phase
offset of the I/Q signals from the ring oscillator is also relieved.
With this calibration scheme, the fractional spur on the PLL is
attenuated by 25 dB and the maximum phase offset is thereby
reduced to less than 0.2 .

REFERENCES
[1] B. Razavi, RF Microelectronics. Englewood Cliffs, NJ: Prentice Hall,
1998.
Fig. 13. Measured phase noise of the PLL. [2] C. D. Hull, J. L. Tham, and R. R. Chu, “A direct-conversion receiver
for 900-MHz (ISM band) spread-spectrum digital cordless telephone,”
IEEE J. Solid-State Circuits, vol. 31, pp. 1955–1963, Dec. 1996.
TABLE II [3] M. Steyaert, M. Borremans, J. Janssens, B. D. Muer, N. Itoh, J.
PERFORMANCE SUMMARY OF THE SELF-CALIBRATED PLL Craninckx, J. Crols, E. Morijuji, H. S. Momose, W. Sansen, T. Yamaji,
H. Tanimoto, and H. Kokatsu, “A single-chip CMOS transceiver for
DCS-1800 wireless communications,” in ISSCC Dig. Tech. Papers, San
Francisco, CA, Feb. 1998, pp. 48–49.
[4] A. Montalvo, A. Holden, W. Suter, C. Angell, S. White, N. Klemmer,
and D. Homol, “A 22-mW NADC receiver IF Chip with integrated
second IF channel filtering,” in ISSCC Dig. Tech. Papers, San Francisco,
CA, Feb. 1999, pp. 48–49.
[5] J. L. Tham, M. A. Margarit, B. Pregardier, C. D. Hull, R. Magoon, and
F. Carr, “A 2.7-V 900-MHz/1.9-GHz dual-band transceiver IC for dig-
ital wireless communication,” IEEE J. Solid-State Circuits, vol. 34, pp.
286–291, Mar. 1999.
PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING 783

[6] B. Razavi, “Design considerations for direct-conversion receivers,” Ook Kim (M’86) received the M.S. and Ph.D. de-
IEEE Trans. Circuits Syst. II, vol. 44, pp. 428–435, June 1997. grees in electronics engineering from Seoul National
[7] L. Yu and W. M. Snelgrove, “A novel adaptive mismatch cancellation University, Seoul, Korea, in 1988 and 1994, respec-
system for quadrature IF radio receivers,” IEEE Trans. Circuits Syst. II, tively.
vol. 46, pp. 789–801, June 1999. He was with the Electronics and Telecommunica-
[8] Y. Sugimoto and T. Ueno, “The design of a 1-V 1-GHz CMOS VCO tions Research Institute, Taejon, Korea, from 1994 to
circuit with in-phase and quadrature-phase outputs,” in Proc. Int. Symp. 1998, and with SK Telecom, Seoul, Korea, from 1998
Circuits and Systems, Hong Kong, June 1997, pp. 269–272. to 1999. Since 1999, he has been with Silicon Image
[9] A. A. Abidi, “Direct-conversion radio transceivers for digital commu- Inc., Sunnyvale, CA. He was a Visiting Researcher
nications,” IEEE J. Solid-State Circuits, vol. 30, pp. 1399–1410, Dec. at the Department of Electrical and Electronic En-
1995. gineering, Adelaide University, Adelaide, Australia,
[10] T. A. D. Riley, M. A. Copeland, and T. A. Kwasniewski, “Delta–sigma during 1992, and a Visiting Scholar at the Department of Electrical Engineering,
modulation in fractional-N frequency synthesis,” IEEE J. Solid-State Stanford University, Stanford, CA, during 1999. His research interests are in
Circuits, vol. 28, pp. 553–559, May 1993. CMOS mixed mode circuit design, high-speed data conversion, wireless circuit
[11] M. H. Perrot, “Techniques for high data rate modulation and low power technology, and high-speed data communication.
operations of fractional-N frequency synthesizers,” Ph.D. dissertation,
Mass. Inst. of Technol., Cambridge, MA, 1997.
[12] U. L. Rohde, Digital PLL Frequency Synthesizers. Englewood Cliffs,
NJ: Prentice Hall, 1983. Beomsup Kim (S’87–M’90–SM’95) received the
[13] C.-H. Park and B. Kim, “A low-noise 900-MHz VCO in 0.6-m B.S. and M.S. degrees in electronic engineering
CMOS,” IEEE J. Solid-State Circuits, vol. 34, pp. 586–591, May 1999. from Seoul National University, Seoul, Korea, in
1983 and 1985, respectively, and the Ph.D. degree in
electrical engineering and computer sciences from
the University of California, Berkeley, in 1990.
He worked as a Graduate Researcher and Grad-
uate Instructor at the Department of Electrical Engi-
neering and Computer Sciences, University of Cali-
fornia, Berkeley, from 1986 to 1990. From 1990 to
1991, he was with Chips and Technologies, Inc., San
Jose, CA, where he was involved in designing high-speed signal processing
ICs for disk drive read/write channel. From 1991 to 1993, he was with Philips
Research, Palo Alto, CA, conducting research on digital signal processing for
video, wireless communication, and disk drive applications. During 1994, he
Chan-Hong Park (S’92) received B.S. and M.S. de- was a Consultant, developing the partial response maximum likelihood detec-
grees in electrical engineering from Korea Advanced tion scheme of the disk drive read/write channel. In 1994, he became an Assis-
Institute of Science and Technology (KAIST), tant Professor with the Department of Electrical Engineering, Korea Advanced
Taejon, Korea, in 1994 and 1996, respectively. He Institute of Science and Technology (KAIST), Taejon, Korea, and is currently
is currently working toward the Ph.D. degree in an Associate Professor. During 1999, he took sabbatical leave at Stanford Uni-
electrical engineering at KAIST. versity, Stanford, CA, and at the same time, consulted for Marvell Semicon-
From 1994, he has been with the Department ductor Inc., San Jose, on the Gigabit Ethernet and wireless LAN DSP archi-
of Electrical Engineering, KAIST, as a Grad- tecture. His research interests include mixed-mode signal processing IC design
uate Researcher, where he has been involved in for telecommunication, disk drive, and LAN, high-speed analog IC design, and
designing 100Base-T transceiver ICs, low-noise VLSI system design.
phase-locked loops, and RF front-ends for wireless Dr. Kim is a corecipient of the Best Paper Award for 1990–1991 from the
communications. His research interests include CMOS RF circuits for IEEE JOURNAL OF SOLID-STATE CIRCUITS. He received the Philips Employee
wireless communication, high-frequency analog IC design, and mixed-mode Reward in 1992. Between June 1993 and June 1995, he served as an Associate
signal-processing IC design. Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II.
788 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

A 2.6-GHz/5.2-GHz Frequency Synthesizer in


0.4-m CMOS Technology
Christopher Lam and Behzad Razavi, Member, IEEE

Abstract—This paper describes the design of a CMOS frequency


synthesizer targeting wireless local-area network applications in
the 5-GHz range. Based on an integer- architecture, the synthe-
sizer produces a 5.2-GHz output as well as the quadrature phases
of a 2.6-GHz carrier. Fabricated in a 0.4- m digital CMOS tech-
nology, the circuit provides a channel spacing of 23.5 MHz at 5.2
GHz while exhibiting a phase noise of 115 dBc/Hz at 2.6 GHz and
100 dBc/Hz at 5.2 GHz (both at 10-MHz offset). The reference
sidebands are at 53 dBc at 2.6 GHz, and the power dissipation
from a 2.6-V supply is 47 mW.
Index Terms—Frequency dividers, oscillators, phase-locked
loops, RF circuits, synthesizers, wireless transceivers.

Fig. 1. Transceiver architecture.


I. INTRODUCTION

W IRELESS local area networks (WLAN’s) provide great


flexibility in the communication infrastructure of envi-
ronments such as hospitals, factories, and large office buildings.
II. SYNTHESIZER ENVIRONMENT

The design of a 5-GHz synthesizer in a 0.4- m CMOS tech-


While WLAN standards in the 2.4-GHz range have recently nology presents many difficulties at both the architecture and
emerged in the market, the data rates supported by such sys- the circuit levels. The high center frequency of the voltage-con-
tems are limited to a few megabits per second. By contrast, a trolled oscillator (VCO), the poor quality of inductors due to
number of standards have been defined in the 5-GHz range that skin effect and substrate loss, the limited tuning range, the non-
allow data rates greater than 20 Mb/s, offering attractive solu- linearity of the VCO input/output characteristic, the high speed
tions for real-time imaging, multimedia, and high-speed video required of the feedback divider, the mismatches in the charge
applications. One of these standards is high-performance radio pump, and the implementation of the loop filter are among the
LAN (HIPERLAN) [1]. issues encountered in this design.
HIPERLAN operates across 5.15–5.30 GHz and provides a A 0.4- m-long NMOS transistor in this technology achieves
channel bandwidth of 23.5 MHz with Gaussian minimum shift an of less than 15 GHz with a gate–source overdrive voltage
keying (GMSK) modulation. The receiver sensitivity must ex- of about 400 mV, a typical value in this design.
ceed 70 dBm. Also, a 5-nH inductor exhibits a self-resonance frequency of
This paper presents the design of a frequency synthesizer 6.5 GHz and a of 5 at this frequency, indicating that skin
for 5-GHz WLAN applications. To target realistic specifica- effect and substrate loss are much more significant at 5.2 GHz
tions, HIPERLAN is chosen as the framework. Employing an than at 2.6 GHz. The technology offers no high-density linear
integer- architecture, the circuit generates a 5.2-GHz output capacitors, creating difficulty in the design of the loop filter.
for the transmit path and the quadrature phases of a 2.6-GHz The foregoing limitations make it necessary that the trans-
carrier for the receive path. Realized in a 0.4- m CMOS tech- ceiver and the synthesizer be designed concurrently so as to
nology, the synthesizer provides a channel spacing of 23.5 MHz relax some of the synthesizer requirements. Fig. 1 shows the
while dissipating 47 mW from a 2.6-V supply. The phase noise transceiver architecture and its interface with the synthesizer.
at 10-MHz offset is equal to 115 dBc/Hz at 2.6 GHz and 100 The receive path consists of two downconversion stages, each
dBc/Hz at 5.2 GHz. using a local oscillator (LO) frequency of 2.6 GHz, and the
Section II of this paper describes the synthesizer environment transmit path modulates the VCO by the Gaussian-filtered base-
and general issues, and Section III introduces the synthesizer band data, producing a GMSK output.
architecture. Section IV presents the design of each building An important feature of this architecture is that the synthe-
block, and Section V summarizes the experimental results. sizer is shared between the transmitter and the receiver, reducing
the system complexity substantially. This is possible because
HIPERLAN incorporates time-division duplexing (TDD). Also,
Manuscript received July 30, 1999; revised December 1, 1999. the transceiver requires the generation of the quadrature phases
The authors are with the Department of Electrical Engineering, University of
California, Los Angeles, CA 90095 USA (e-mail: razavi@ee.ucla.edu). of the 2.6-GHz carrier rather than the 5.2-GHz output, a task
Publisher Item Identifier S 0018-9200(00)02987-5. readily accomplished by the synthesizer itself.
0018–9200/00$10.00 © 2000 IEEE
LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER 789

proper choice of device dimensions and bias current, a differen-


tial swing of 0.5 V can be achieved at this port. Note that if a
frequency doubler were used, the output would be single-ended
and difficult to convert to differential form at such a high fre-
quency.
The tuning of the oscillator poses several difficulties: the var-
actor diode must exhibit a small series resistance and remain
reverse-biased even with large swings in the oscillator, and the
varactor capacitance must be large enough to yield the required
tuning range, but at the cost of increasing the power dissipation
or the phase noise. This design incorporates a p -n diode in-
side an n-well and strapped with metal to reduce the n-well se-
Fig. 2. Synthesizer architecture.
ries resistance [4]. Such a structure suffers from a large parasitic
n-well/substrate capacitance, making it desirable to connect the
anode of the diode to the oscillator. This is accomplished as il-
lustrated in Fig. 4(b), where only one of the two oscillators is
shown for clarity. Here, the control voltage varies the dc poten-
tial at nodes and by varying the on-resistance of .
However, the sharp variation of the on-resistance of leads
to significant change in the gain of the VCO. To make the tran-
Fig. 3. Position of reference sidebands. sition smoother, another transistor, , in series with a resistor
is added as shown in Fig. 4(c). Transistor serves as a clamp,
III. SYNTHESIZER ARCHITECTURE keeping the tail current source in saturation. Otherwise, the os-
The synthesizer is based on an integer- phase-locked loop cillator may turn off during synthesizer loop transients.
architecture (Fig. 2). The feedback divider senses the 2.6-GHz Since the minimum voltage at node is only a few hundred
output because it is not possible to design a dual-modulus millivolts above ground, an NMOS differential pair cannot
divider in 0.4- m CMOS technology that operates at 5.2 GHz directly sense the 5.2-GHz signal at this node. Instead, a
reliably. Controlled by the digital channel-select input, the common-gate stage is used [Fig. 4(d)]. But if is constant,
220–225 circuit generates frequency steps of then turns off for low values of . Modifying the circuit
MHz in the 2.6-GHz band and 23.5 MHz in the 5.2-GHz band. as shown in Fig. 4(e) ensures that the common-gate stage
A critical issue in the architecture of Fig. 2 is the nonlinearity carries a constant bias current across the full tuning range.
of the VCO characteristic, i.e., the variation of the VCO gain, The choice of the inductors and capacitance of the varactors
, with the control voltage . This effect manifests it- entails a compromise between the phase noise and the tuning
self in the loop settling behavior as well as the magnitude of the range. In this design, 7-nH inductors are used, each contributing
phase noise and reference sidebands at the output. The problem a parasitic capacitance of 120 fF. The cross-coupled transistors
is partially resolved through the use of a correction circuit that are relatively wide to ensure startup, yielding approximately
adjusts the charge-pump current according to the value of 175 fF of gate-source capacitance. The differential pairs cou-
[2]. pling the oscillators also load the tank. As a result, the varactor
An interesting property of the architecture of Fig. 2 is the capacitance for 2.6-GHz operation must not exceed 160 fF.
position of the reference spurs with respect to the main carrier. The inductors are realized as stacked spirals [5] made of metal
Since the reference frequency is half the channel spacing, such 4 and metal 3 with a width of 6 m. Since the tuning range
spurs fall at the edge of the channel rather than at the center of is inevitably narrow, it is critical to predict the oscillation fre-
the adjacent channel for both 2.6- and 5.2-GHz outputs (Fig. 3). quency accurately. A distributed model is used for each inductor,
Since the interference energy received by the antenna is small yielding an error of only a few percent in the measured fre-
at the edge, the maximum allowable magnitude of the spurs can quency of oscillation.
be quite higher than if the reference frequency were equal to
B. Frequency Divider
23.5 MHz.
The design of a 2.6-GHz programmable divider with a rea-
IV. BUILDING BLOCKS sonable power dissipation in 0.4- m CMOS technology is quite
difficult. A number of circuit techniques are introduced in this
A. VCO work to ameliorate the power–speed tradeoff.
The VCO core is based on two 2.6-GHz coupled oscillators The divider is based on a pulse-swallow topology. Shown in
operating in quadrature, as shown in Fig. 4(a) [3], [4]. The fully Fig. 5(a) is a conventional implementation, consisting of a dual-
differential topology of each oscillator raises the possibility of modulus prescaler, a fixed-ratio program counter, and a pro-
sensing the common-source nodes A, B, C, or D as the 5.2-GHz grammable swallow counter. The RS latch is typically included
output. In fact, since the 2.6-GHz oscillators operate in quadra- in the swallow counter and is drawn explicitly here for clarity.
ture, the waveforms at A and B (or C and D) are 180 out of The prescaler begins the operation by dividing by 1 until
phase, thereby serving as a differential output at 5.2 GHz. With the swallow counter is full. The RS latch is then set, changing
790 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 4. Evolution of the VCO topology.

the prescaler modulus to and disabling the swallow counter. 2 circuits. In a conventional 2/3 realization [Fig. 7(a)],
The division continues until the program counter is full and the flip-flop FF is loaded by an OR gate, whereas FF is loaded
RS latch is reset. The overall divide ratio is therefore equal to by FF , an AND gate, and an output buffer. Since FF limits the
. speed, the fanout of three inherent to this topology translates
The pulse-swallow divider used in this work is shown in to substantial power dissipation. Furthermore, if the divider is
Fig. 5(b). Here, the RS latch is followed by a D flip-flop to implemented by current-steering circuits, the AND gate requires
allow pipelining of the prescaler modulus control signal. This stacked logic and hence level-shift source followers. Both of
modification is justified below. The overall divide ratio is now these issues intensify the power–speed tradeoff.
equal to 1. A critical decision in the design of the The 2/3 circuit used in this work is shown in Fig. 7(b).
divider is the choice between low-swing current-steering logic Here, FF is loaded by a NOR gate and FF by a NOR gate and
and rail-to-rail CMOS logic. Simulations of the circuit with a buffer. Simulations indicate that the reduction of the load ca-
various values of , , and indicate that the minimum power pacitance of FF increases the maximum operating speed by ap-
dissipation occurs if the prescaler incorporates current steering, proximately 40%.
its output is converted to rail-to-rail swings, and the remainder The NOR/flip-flop combination is realized as depicted in
of the circuit incorporates standard dynamic and static CMOS Fig. 8. The resistors are made of n-well, and the bias voltage
logic. The use of current steering in the prescaler also obviates is generated to fall midway between the high and low levels of
the need for large oscillator swings, saving power in the VCO inputs and . The output of the prescaler drives a differential
buffer. to single-ended converter, producing rail-to-rail swings for the
The design of the 8/9 prescaler for 2.6-GHz operation remainder of the divider.
presents a great challenge. Shown in Fig. 6, the prescaler The divider of Fig. 5 incorporates pipelining for the prescaler
consists of a synchronous 2/3 circuit and two asynchronous modulus control, thereby relaxing the minimum delay require-
LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER 791

Fig. 7. Divide-by-2/3 circuit: (a) conventional topology and (b) circuit used in
this work.

Fig. 5. Pulse swallow divider. (a) Conventional topology. (b) Addition of


pipelining in the prescaler modulus control path.

Fig. 8. Implementation of NOR/flip-flop combination.

Fig. 6. Prescaler.

ment in this path. Fig. 9 illustrates the issue. When the 9 oper-
ation of the prescaler is finished, the circuit would have at most
seven cycles of to change the modulus to eight. In this par-
ticular prescaler, the timing budget is actually about five input
cycles—approximately 1.9 ns. Thus, with no pipelining, the last
pulse generated by the prescaler in the 9 mode must propagate
through the level converter, the first 2 stage in the swallow
counter, the subsequent logic, the RS latch, and the three-input
NOR gate in less than 1.9 ns. Such a delay constraint necessi- Fig. 9. Pipelining in the prescaler modulus control path.
tates the use of current steering in this path, raising the power
dissipation and complicating the design. With pipelining, on
the other hand, the maximum tolerable delay increases to about
eight input cycles—approximately 3.1 ns.

C. Charge Pump and Loop Filter


Fig. 10 shows the charge pump [6] and the loop filter. Here,
and —rather than and —operate as switches.
Thus, the problem of transistor charge injection and clock
feedthrough to the output is somewhat alleviated. In addition
to these errors, up and down currents produced by the charge Fig. 10. Charge pump and loop filter.
pump may also create ripple on the control voltage. Since in
locked condition, and turn on at every phase compar- To appreciate the significance of these effects, let us consider
ison instant, any mismatch between their magnitudes, duration, some typical values in this design. If the reference sidebands are
or absolute timing results in a net current that is drawn from to be 50 dB below the carrier, then with GHz/V
the loop filter. and MHz, the ripple amplitude must not exceed
792 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 11. (a) Addition of correction circuit to charge pump. (b) Simple folding circuit. (c) Folding circuit with one reference voltage.

75 V.1 This indicates that great attention must be paid to the to . For this reason, and are formed as poly-metal
design of the phase/frequency, the charge pump, and the loop sandwiches (albeit with much less density than MOS capaci-
filter so as to minimize the above errors. tors).
Another source of ripple in the control voltage is the low Another issue in the design of the loop filter of Fig. 10 relates
output impedance of and in Fig. 10, especially as to the thermal noise produced by . Low-pass filtered by
reaches within a few hundred millivolts of the rails. This ef- and , this noise modulates the VCO, raising the output phase
fect creates additional mismatch between the up and down cur- noise. The thermal noise on the control voltage per unit band-
rents as a function of , potentially leading to larger refer- width is given by
ence sidebands near the ends of the tuning range. Transistors
and degenerate and , respectively, alleviating (1)
this issue (another advantage of this topology over the standard
charge-pump configuration).
The addition of in the circuit of Fig. 10 to suppress the
ripple potentially degrades the stability of the loop. Simulations where denotes the noise density of . From the
suggest that for , the settling time increases negli- narrow-band frequency modulation theory [8], we know that
gibly. In this design, pF, pF, and k . if a sinusoid with a peak amplitude and frequency
The two capacitors can be realized by either MOSFET’s or poly- modulates a VCO, the output sidebands fall at rad/s below
metal sandwiches, a choice determined by the control voltage and above the carrier frequency and exhibit a peak amplitude
range. To achieve the maximum tuning range, must ap- of . Approximating the noise per unit band-
proach the supply and ground rails, demanding a reasonable ca- width in (1) by a sinusoid, we obtain the output relative phase
pacitor linearity across this range. MOS capacitors, however, ex- noise per unit bandwidth at an offset frequency as
hibit substantial change as their gate-source voltage falls below
the threshold. Even a parallel combination of an NMOS capac-
itor (connected to ground) and a PMOS capacitor (connected to
) suffers from a two-fold variation as goes from zero
1The ripple is approximated by a sinusoid here. In a more rigorous method,
(2)
the ripple can be expressed as a Fourier series [7].
LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER 793

Fig. 12. Die photograph.

With the values chosen in this design, the output phase noise
reaches 138 dBc/Hz at 10-MHz offset for
GHz/V. While it is desirable to reduce the value of , the re-
quired increase in leads to a severe area penalty because of
the low density of the poly-metal capacitors. Note that since the
stability factor , if is,
say, halved, then must be quadrupled to maintain constant
(for a given charge-pump current).

D. Correction Circuit
The gain of the VCO varies substantially across the tuning
Fig. 13. Measured spectra at 2.6 and 5.2 GHz in locked condition.
range, resulting in considerable change in the settling behavior.
As depicted in Fig. 11(a), it is desirable to vary the charge-pump
current, , such that the product of and and
hence remain relatively constant. Rather than use piecewise
linearization [2], this work incorporates an analog folding tech-
nique. Fig. 11(b) shows a possible solution. Here and
are off if is well below 1.1 V and hence . As
approaches 1.1 V, turns on while is off. Thus, drops,
reaching a low value as carries most of and a neg-
ligible current. As approaches and exceeds 1.3 V, turns
on and eventually returns to . This design actually uti-
lizes the topology shown in Fig. 11(c), where only one reference
voltage is required and each differential pair provides a built-in
offset by virtue of skewed device dimensions. The characteristic
is similar to that shown for Fig. 11(b), with driving the cur- Fig. 14. Measured spectrum at 2.6 GHz.
rent mirrors in the charge pump.
The reference voltage of 1.2 V in Fig. 11(c) assumes that
the gain of the VCO reaches its maximum at V.
This value is somewhat process- and temperature-dependent,
limiting (according to simulations) the suppression of the VCO
nonlinearity to about one order of magnitude.

V. EXPERIMENTAL RESULTS
The frequency synthesizer has been fabricated in a 0.4- m
digital CMOS technology. All of the inductors and capacitors
are included on the chip. Fig. 12 is a photograph of the die,
which measures 1.75 1.15 mm . The circuit has been tested
with a 2.6-V supply.
Figs. 13(a) and (b) depict the output spectra in the locked Fig. 15. Setup for settling time measurement.
condition. The phase noise at 10-MHz offset is equal to 115
dBc/Hz at 2.5 GHz and 100 dBc/Hz at 5.2 GHz. A significant approximately 53 dB below the carrier. For the 5.2-GHz output,
part of the phase noise at 5.2 GHz is attributed to the consider- the sidebands are buried under the noise floor.
able loss of the output 50- buffer. Fig. 14 shows the 2.6-GHz The settling behavior of the synthesizer has also been studied.
output along with the reference sidebands. The sidebands are Fig. 15 illustrates the setup, where the modulus of the feedback
794 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Designing a multigigahertz synthesizer in 0.4- m CMOS


technology necessitates circuit techniques such as: 1) a quadra-
ture VCO with inherent frequency doubling, 2) a dual-modulus
divider with equalized fanout, 3) pipelining in pulse-swallow
counters, and 4) use of folding stages to compensate for
nonlinearity in the VCO characteristic.

REFERENCES
[1] “Radio equipment and systems (RES); High performance radio local
area network (HIPERLAN); Functional specification,” ETSI, Sophia
Antipolis, France, ETSI TC-RES, July 1995.
[2] J. Craninckx and M. S. J. Steyaert, “A fully integrated CMOS
DCS-1800 frequency synthesizer,” IEEE J. Solid-State Circuits, vol.
33, pp. 2054–2065, Dec. 1998.
[3] A. Rofougaran et al., “A 900-MHz CMOS LC oscillator with quadrature
outputs,” in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 392–393.
[4] B. Razavi, “A 1.8 GHz CMOS voltage-controlled oscillator,” in ISSCC
Dig. Tech. Papers, Feb. 1997, pp. 388–389.
[5] R. B. Merril et al., “Optimization of high Q
inductors for multi-level
metal CMOS,” in Proc. IEDM, Dec. 1995, pp. 38.7.1–38.7.4.
[6] J. Alvarez, H. Sanchez, and G. Gerosa, “A wide-band low-voltage PLL
Fig. 16. Control voltage during loop settling. for PowerPC microprocessors,” IEEE J. Solid-State Circuits, vol. 30, pp.
383–391, Apr. 1995.
[7] B. Razavi, RF Microelectronics. Upper Saddle River, NJ: Prentice-
TABLE I Hall, 1998.
SYNTHESIZER PERFORMANCE [8] L. W. Couch, Digital and Analog Communication Systems, 4th
ed. New York: Macmillan, 1993.

Christopher Lam received the B.Sc. and M.Sc. de-


grees in electrical engineering from the University of
California, Los Angeles, in 1997 and 1999, respec-
tively.
He is currently with the Wireless Communication
Group, National Semiconductor, Santa Clara,
CA. His interests include phase-locked loops and
communication circuits.

divider is switched periodically and the control voltage is moni- Behzad Razavi (S’87–M’90) received the B.Sc. de-
gree from Sharif University of Technology, Tehran,
tored. The 0.8-pF capacitor results from the trace on the printed Iran, in 1985 and the M.Sc. and Ph.D. degrees from
circuit board, and the active probe presents an input capacitance Stanford University, Stanford, CA, in 1988 and 1992,
of 2 pF. Since pF and pF, the addition of these respectively, all in electrical engineering.
He was with AT&T Bell Laboratories, Holmdel,
parasitics markedly degrades the stability. Therefore, a 100-k NJ, and subsequently Hewlett-Packard Laboratories,
resistor is placed in series with the active probe to mimic the Palo Alto, CA. Since September 1996, he has been
role of and . The low-pass filter thus formed has a corner an Associate Professor of electrical engineering at the
University of California, Los Angeles. His current re-
frequency comparable to the loop bandwidth, and the 0.8-pF ca- search includes wireless transceivers, frequency syn-
pacitor still produces ringing in the time response. Fig. 16 shows thesizers, phase-locking and clock recovery for high-speed data communica-
the measured control voltage, indicating a settling time on the tions, and data converters. He was an Adjunct Professor at Princeton Univer-
sity, Princeton, NJ, from 1992 to 1994, and at Stanford University in 1995.
order of 40 s. He is a member of the Technical Program Committees of the Symposium on
Table I summarizes the measured performance of the synthe- VLSI Circuits and the International Solid-State Circuits Conference (ISSCC),
sizer. in which he is Chair of the Analog Subcommittee. He is the author of Principles
of Data Conversion System Design (New York: IEEE Press, 1995), RF Micro-
electronics (Englewood Cliffs, NJ: Prentice-Hall, 1998), and Design of Analog
VI. CONCLUSION CMOS Integrated Circuits (New York: McGraw-Hill, 2000), and the editor of
Monolithic Phase-Locked Loops and Clock Recovery Circuits (New York: IEEE
The speed and quality of the devices available in an IC tech- Press, 1996).
Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the
nology directly affect the choice of transceiver architectures, 1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits
synthesizer topologies, and circuit configurations. In order to Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Inno-
optimize the overall system performance, the transceiver and vative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom
Integrated Circuits Conference in 1998. He has also served as Guest Editor and
the synthesizer must be designed concurrently, with particular Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE
attention to the frequency planning. TRANSACTIONS ON CIRCUITS AND SYSTEMS.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002 835

A CMOS Monolithic -Controlled Fractional-N 16


Frequency Synthesizer for DCS-1800
Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE

Abstract—A monolithic 1.8-GHz 16 -controlled fractional-


phase-locked loop (PLL) frequency synthesizer is implemented in a
standard 0.25- m CMOS technology. The monolithic fourth-order
type-II PLL integrates the digital synthesizer part together with
a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz
2 mm2 . To investigate
16
dual-path loop filter on a die of only 2
the influence of the modulator on the synthesizer’s spectral
purity, a fast nonlinear analysis method is developed and exper-
imentally verified. Nonlinear mixing in the phase-frequency de-

16
tector (PFD) is identified as the main source of spectral pollution in
fractional- synthesizers. The design of the zero-dead zone
PFD and the dual charge pump is optimized toward linearity and
spurious suppression. The frequency synthesizer consumes 35 mA
from a single 2-V power supply. The measured phase noise is as Fig. 1. Principle of 16 fractional-N synthesis.
low as 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz.
The measured fractional spur level is less than 100 dBc, even
for fractional frequencies close to integer multiples of the refer- digital noise coupling, the modulator is scheduled for inte-
ence frequency, thereby satisfying the DCS-1800 spectral purity gration on the digital baseband signal processing IC of the full
constraints. transceiver system.
Index Terms—CMOS RF integrated circuits, 16
modulator, The paper describes the design of a monolithic 1.8-GHz
fractional- frequency synthesis, phase-locked loop, phase noise. -controlled fractional- PLL frequency synthesizer. In
Section II, the influence of noise on PLL bandwidth
I. INTRODUCTION requirements is theoretically analyzed for multistage noise
shaping (MASH) and multibit single-loop modulators.

T HE END of the 20th century was characterized by the unri-


valed growth of the telecommunication industry. The main
cause was the introduction of digital signal processing in wire-
Next, a fast nonlinear analysis method is presented, which
predicts possible degradation of the PLL spectral purity by
in-band noise leakage and re-emerging of spurious tones.
less communications, driven by the development of high-per- The nonlinearities in the phase-frequency detector (PFD)
formance low-cost CMOS technologies for VLSI. However, the charge pumps are identified as the main trouble spots. The
implementation of the RF analog front end remains the bottle- fourth-order type-II PLL building-block design is discussed in
neck. This is reflected in the large effort put into monolithic Section IV, focusing on integrated filter and voltage-controlled
CMOS integration of RF circuits both by academics and in- oscillator (VCO) design and on the realization of a linear phase
dustry [1]–[3]. error-to-charge-pump current conversion. In Section V, the
The goal of this work is the monolithic integration in stan- experimental results of the fractional- synthesizer prototype
dard CMOS technology of a frequency synthesizer to enable the are presented and compared to the simulations, showing good
full integration of a transceiver front end in CMOS, including correspondence.
a low-IF receiver and a direct upconversion transmitter [1]. To
achieve a high degree of integratability and fast settling under
II. THE FRACTIONAL- SYNTHESIZER
low-noise constraints, a fractional- synthesizer topology
has been chosen [4] (Fig. 1). fractional- synthesis circum- A. Introduction
vents the severe speed–spectral purity–resolution tradeoff of the A block diagram of a fractional- synthesizer is shown
classic phase-locked loop (PLL) synthesizer, by providing syn- in Fig. 1. The modulator output controls the instantaneous
thesis of fractional multiples of the reference frequency. Spu- division modulus of the prescaler, such that the mean division
rious tones that emerge from the fractional division are whitened modulus is , with the number of bits of the
and noise shaped by the action and ultimately filtered by modulator and the input word. The corresponding phase
the loop filter. To prevent degradation of the spectral purity by changes at the prescaler output are quantized, leading to possible
spurious tones and quantization noise. By selecting higher order
Manuscript received November 5, 2001; revised January 31, 2002. modulators, the spurious energy is whitened and shaped to
The authors are with the Katholieke Universiteit Leuven, Department high-frequency noise, which can be removed by the low-pass
Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: bram.de-
muer@esat.kuleuven.ac.be). loop filter. As a result, for a given frequency resolution, an ar-
Publisher Item Identifier S 0018-9200(02)05856-0. bitrary high can be chosen, by assigning the proper number
0018-9200/02$17.00 © 2002 IEEE
836 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 2. Third-order multibit single-loop 16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability
reasons.

of bits to the modulator. The loop bandwidth is not restricted


by the reference spur suppression, resulting in faster settling and
higher integratability. Additionally, the division modulus is de-
creased by a factor (with the minimum number of
bits for the frequency resolution, i.e., 7.02 in this case), so that
noise of the PLL blocks, except for the VCO, is less amplified.

B. The Modulators
The influence of both third-order MASH and multibit
single-loop modulators on the spectral purity of the
fractional- synthesizer is investigated. Since the order of
the integrated PLL loop filter is three, the order of the
modulators must also be three or higher to ensure that
noise has at least a 20-dB/dec rolloff at intermediate offset
frequencies, causing no degradation of the output phase noise.
Both modulators have an internal accuracy of 16 bit and 1 LSB
dithering is applied to further randomize any spurious energy. Fig. 3. Maximum PLL bandwidth f versus the reference frequency and
The dithering sequence is third-order noise shaped to avoid an different16 modulator orders, for the type-II fourth-order PLL. The dashed
curve is for the third-order single-loop modulator. The targeted phase-noise
increased noise floor.
The MASH or cascade 1-1-1 modulator is chosen be-
0
specification is 136 dBc/Hz at 3 MHz for DCS-1800.

cause it is easy to integrate in CMOS and is unconditionally


coupling and sensitivity to PLL nonlinearities, as will be
stable. The noise transfer function (NTF) of the MASH modu-
discussed in Section III.
lator is and contains three poles at the
origin of the plane. The result is harsh LF noise shaping and C. Theoretical Analysis
and substantial HF noise. In the time domain, this is reflected
in the intensive prescaler modulus switching. To synthesize a To theoretically model the impact of control on the spec-
frequency of , all moduli between 64 and 71 are tral purity of the synthesizer, a linear-time-invariant (LTI) PLL
employed. model is employed, with the quantization noise as an ad-
The multibit single-loop modulator is shown in Fig. 2. ditive noise source at the prescaler output. The prescaler
For ease of integration, the feedforward and feedback coeffi- with control can be looked upon as a digital-to-phase (D/P)
cients are a power of 2. Only four output bits are needed to con- converter. Every reference cycle, the prescaler subtracts
trol the prescaler moduli, but five output bits are used, to avoid rad from its input signal, with determined
overlap of the intended input operating range and the unstable by the modulator output. The resulting quantization noise
input regions. The NTF of the presented modulator is given in on the division modulus, and thus output phase, is approximated
(1) and contains only one pole at the origin of the plane and by uniformly distributed white noise [5]. The quantization noise
two low- Butterworth poles at , with a passband power is with for both modu-
gain of 3.2. lators with the modulus range and the number of signif-
icant output bits. The phase noise contribution of the
(1) modulator at the output of the synthesizer is found in (2) [6],
with the closed-loop transfer function of the fourth-order
Although the single-loop modulator is more complex than type-II PLL.
the MASH modulator, it offers a higher flexibility in terms of
noise shaping. The HF quantization noise of the modulator (2)
can be spread out by proper pole positioning. As a result, the
prescaler modulus switching is less intense. Only the moduli Since the main advantage of fractional- synthesizers
between 66 and 69 are needed to synthesize . is the decoupling of the reference frequency and the PLL
The reduced HF switching has advantageous effects on noise bandwidth , the influence of the noise on the bandwidth
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 837

III. FAST NONLINEAR ANALYSIS METHOD


The theoretical analysis suggested that applying control
to the prescaler would not cause any problems for the spectral
purity of the PLL. Practice, however, proves this wrong. A fast
nonlinear analysis method is developed which can take into ac-
count the nonlinearity of the PLL building blocks. The analysis
method is at the same time sufficiently fast to sweep simulations
over different degrees of nonlinearities and operating points, and
is capable of performing sufficiently long transient simulations
to get accurate fast Fourier transforms (FFTs) of the phase vari-
able. The fractional operation of the PLL is simulated in discrete
time and in open loop under locked conditions to avoid drift of
the phase error. To further speed up the simulation, the building
blocks are represented by high-level models with parameters to
model any nonlinear behavior or mismatch in critical transis-
tors. The simulations are performed in Matlab [9].
Fig. 4. Maximum PLL bandwidth f versus the reference frequency and To find the phase error, generated by the modulation of
16
different 18
modulator orders for < :15 . The dashed curve is for the the division modulus, the variation of the number of RF pulses,
third-order single-loop modulator. , at the output of the divider is monitored. Every reference
cycle, the number of RF pulses at the divider output is deter-
requirement is examined. To comply with the most stringent mined by the number of pulses swallowed by the control,
DSC-1800 phase noise specification, i.e., 133 dBc/Hz :
at 3 MHz offset [7], the target phase noise is
(3 MHz) dBc/Hz. In Fig. 3, the maximum PLL
bandwidth is plotted versus the reference frequency (4)
for different MASH modulator orders. The dashed line is the
maximum bandwidth for the single-loop multibit modu- The resulting quantized phase changes are compared with the
lator of Section II-B. For a reference frequency of 26 MHz, not phase that would be expected when the loop would be in lock,
much is gained from increasing the modulator order. For a high i.e., the phase corresponding to the fractional part of the divi-
bandwidth and thus a fast PLL, the reference frequency and/or sion modulus . The result is the instantaneous accumu-
the modulator order should be increased leading to an increased lated phase error :
power consumption and circuit complexity. The maximum
bandwidth is 87 kHz for the third-order MASH modulator and (5)
62 kHz for the single-loop multibit modulator.
Apart from the out-of-band phase-noise constraint, the in-
The phase error is converted to current pulses, , in the
tegrated in-band phase noise, determining the rms phase error
charge pump. The (phase-error charge-pump cur-
of the PLL is of importance. To be sure that the
rent) conversion is modeled to contain any PFD nonlinearity.
does not corrupt the rms phase error, the dynamic range of the
Mismatch in the up and down current sources, resulting in gain
modulator must be higher than the dynamic range of the PLL
mismatch for positive and negative phase errors is modeled by
[8]. The integrated in-band frequency noise is given by
. The occurrence of a dead zone is modeled by
with the noise bandwidth of the PLL
and 10 the in-band phase noise in dBc/Hz. The noise
bandwidth of the presented PLL is . The max- (6)
imum bandwidth of the PLL is calculated in (3) [8].
By taking an FFT of the current pulses, the current noise
(3) spectrum is obtained. The current noise spectrum is modeled
as a phase-noise source which is subjected to its corresponding
The maximum PLL bandwidth is plotted versus the ref- closed-loop transfer function, obtained from the LTI PLL
erence frequency of the PLL for different MASH modulator model. This means that the filter is modeled by its linear
orders in Fig. 4. For the single-loop multibit modulators transfer function, which includes parasitic gain and pole
(dashed curve), the actual maximum bandwidth can be calcu- position changes. The nonlinear conversion from voltage to
lated to be 25% smaller than in (3), due to the Butterworth poles. frequency/phase in the VCO is modeled by the variation of the
In the case of a third-order modulator, a 1.5 rms phase error (to VCO gain, when changing the operating point of the PLL.
ensure at least an overall rms phase error of 2 ) and a of The analysis tool enables the evaluation and comparison of
26 MHz, the maximum bandwidth is 810 and 614 kHz, respec- the effect of MASH and single-loop noise on the PLL.
tively. Obviously, the constraint posed on the modulator This analysis is performed with the following nonlinearities: a
noise due to in-band noise contributions is much less severe than 0.1% dead zone and a gain mismatch of 2%. The internal ac-
the constraint due to the out-of-band phase noise at 3 MHz. curacy of both modulators is 16 bit. The reference frequency
838 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

1
Fig. 5. Simulation results. The phase error  for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP [i] for
(c) the MASH modulator and (d) the single-loop multibit modulator.

is 26 MHz and the fractional division number is 67.92. The domain, this effect corresponds to the smaller phase excursions.
output frequency is 1.76592 GHz, i.e., 2.08 MHz offset from The difference in phase error between MASH and single-loop
an integer multiple of . In Fig. 5(a) and (b), the time-domain modulators is reflected in a lower noise floor, i.e., a 10-dB dif-
phase error is plotted for both modulators. Note that the ference. In addition, previously unnoticed spurious tones appear
fractional- PLL frequency synthesizer can hardly be called a in the output spectrum at with .
phase-locked loop, since the loop is never in lock! Due to the Fig. 6 shows the noise of both modulators as it appears at
shaping of the HF noise in the single-loop modulator, the in- the PLL output for an ideal (dotted) and a nonlinear
stantaneous phase error is smaller than for a MASH modulator. conversion (solid). The results of the ideal case closely match
This has two important consequences. First, the on-time of the the theoretical results of Section II-C (solid light gray). Due
charge pumps is smaller for the single-loop modulator, making it to nonlinearity, the simulated output spectrum of the integer-
less sensitive to noise coupling from the substrate and the power PLL (the dash-dotted line) is seriously deteriorated by noise
supply. Second, the sensitivity to the nonlinear con- in the PLL noise bandwidth, increasing the . Especially,
version in terms of noise leakage is reduced. the MASH converter is critical in terms of in-band noise due
To be able to examine the effect of nonlinearities in the fre- to the higher phase error [see Fig. 5(a)], despite the inherently
quency domain, the FFTs of the charge-pump current pulses lower LF noise of the MASH modulator. Note that the sim-
are plotted in Fig. 5(c) and (d). A noise floor appears in ulations are performed without taking into account noise cou-
the output spectrum as well as spurious tones, although the pling through the substrate or power-supply lines. As a conse-
output is perfectly randomized and dithered. Due to the non- quence, the actual spurious performance of the fractional-
linear mixing in the PFD charge pump, noise at folds PLL could be worse than simulated. The presented simulation
back to lower offset frequencies, similar to the effect of a non- results are for a division modulus 67.92, close to an integer mul-
linear DAC in a multibit ADC. Since the noise at is tiple of . When analyzing division moduli in between integer
much lower for the single-loop modulator, its noise leakage multiples of , noise leakage is still observed, but the spurious
due to the nonlinear mixing in the PFD is also lower. In the time tones are well below the phase noise.
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 839

Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a)
the MASH modulator and (b) the single-loop multibit modulator.

Fig. 6. Simulation results. The 16 noise at the output of the PLL for (a) the
PFD. This effect can be worsened by substrate and power-
supply coupling with signals at .
MASH modulator and (b) the single-loop multibit modulator. The results are
plotted for an ideal PFD (dotted), which closely corresponds to the theoretical
results (solid light gray) and for a nonlinear PFD (solid). They are compared to IV. PLL BUILDING-BLOCK CIRCUIT DESIGN
the simulated integer PLL phase noise (the dash-dotted line).
A. The Fourth-Order Type-II PLL
The explanation for the re-emerging of spurious tones is that A fourth-order type-II PLL is integrated, including a 4-bit
the modulator is unable to sufficiently decorrelate the successive prescaler, a zero-dead-zone PFD, a dual charge pump, and a
output samples. To quantify the correlation in the modulator 3-step equalizer, together with an on-chip LC-tank VCO and a
output, the discrete time autocorrelation estimate is calculated third-order dual-path 35-kHz low-pass loop filter (see Fig. 8).
and plotted for both modulators for inputs close to an integer The equalizer performs a 3-step piecewise equalization of the
value (see Fig. 7). The autocorrelation calculations show corre- loop gain, by keeping the product of the VCO gain and the
lation, although 1–LSB noise-shaped dithering is applied. The charge-pump current constant. To prevent switching between
autocorrelation of the single-loop modulator shows large different equalization states, the state transitions exhibit hys-
correlation peaks, explaining the higher spurious tones in the teresis.
output phase-noise spectrum of the PLL. With the autocorrela-
tion estimate, the necessary internal accuracy of the mod- B. The 4-Bit Prescaler
ulators is found to be at least 13 bits for MASH and 16 bit The first high-speed division of the prescaler is done
for single-loop modulators to sufficiently decorrelate the with two differential single-transistor-clocked (DSTC) logic
modulator output for inputs close to integers. A second possible n-latches [10], forming a differential dynamic D-flip-flop. The
source of tones is the downconversion of tones which are inher- flip-flop operates with rail-to-rail internal signals to minimize
ently present around [5], by the nonlinear mixing in the the residual prescaler phase noise [11] to levels insignificant to
840 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 8. Fully integrated fourth-order type-II phase-locked loop.

the overall phase-noise performance. The 16-modulus division TABLE I


SUMMARY OF THE LOOP PROPERTIES AND PERFORMANCE OF THE
(64 79) is implemented with the phase-switching topology FOURTH-ORDER TYPE-II PLL
[12]. The division moduli are generated by switching between
the 90 -spaced output phases of the second D-flip-flop. When
the 90 spacing is not ideal, spurs appear at 1/4, 2/4, and 3/4 of
the PLL reference frequency. It takes careful layout and circuit
design to equalize the delays of the different quadrature paths,
such that these spurious tones are suppressed to negligible
levels.

C. The Voltage-Controlled Oscillator


The LC VCO with on-chip inductor combines a 30% tuning
range at only 2 V and an excellent phase-noise performance
over a large frequency range. To minimize the VCO phase
noise, a simulator-optimizer program has been developed
which searches the optimal inductor geometry for a given
technology. The resulting hollow octagonal balanced inductor
has a as high as 9 with an inductance of 2.86 nH, for a
standard 0.25- m CMOS technology with only two metal
with a multiplication factor in the dual charge pumps. The
layers (0.6 and 1.0 m) [13].
addition realizes the low-frequency zero needed for loop sta-
The VCO is implemented as a single differential pMOS-only
bility in a type-II PLL, without adding the actual capacitor [12].
topology, leading to an enhanced tuning range, without in-
The total number of capacitors is the same as in a classical
creasing the power consumption and the VCO gain, [13].
fourth-order type-II PLL, but for the same phase noise the in-
For the frequency range of interest, is between 100 and
tegrated capacitance is more than 5 times smaller. Due to the
200 MHz/V, explaining the need for equalization of the loop
rather high VCO gain, the integrated capacitance is still 1.4 nF
gain. The VCO output is buffered from the prescaler input to
to be able to comply with the DCS-1800 phase-noise require-
prevent kickback noise from entering the tank. The measured
ments. An extra pole is added at 210 kHz to ensure
phase noise is as low as 127.5 dBc/Hz at 600 kHz and
enough suppression at higher offset frequencies. A filter op-
142.5 dBc/Hz at 3 MHz for a carrier frequency of 1.82 GHz.
timization model is developed, determining all pole and zero
positions and the capacitance–resistance tradeoff to obtain low
D. The 35-kHz Dual-Path Loop Filter noise and high integratability [14]. The results of the optimiza-
To achieve full integration, a dual-path filter topology has tion at 1765.92 MHz are listed in Table I. The total phase noise
been implemented (Fig. 8). Two filter paths, one active integra- is without the noise. The MASH and single-loop (SL)
tion ( ) and one passive low-pass filter are added noise contributions result from the nonlinear analysis. As
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 841

Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left)
the dummy current branch, denoted by the suffix d, and the output branch.

seen in Section II-C, the loop bandwidth needs to be smaller than at a fixed level (see Fig. 8). Additionally, the charge-pump cur-
62 kHz for noise suppression. However, to ensure sufficient rent is designed to be at least a magnitude larger than the fixed
suppression of the low-frequency fractional spurious tones for parasitic charge injection of the switch transistors. The current
inputs close to the integers, the bandwidth is designed to 35 kHz. switches are implemented with pMOS and nMOS transistors to
Despite the rather low loop bandwidth for a fractional- syn- compensate charge injection. Finally, a timing control scheme
thesizer, a settling time of less than 293 s for a 104-MHz step [Fig. 9(a)] is developed to control the charge-pump switches.
is simulated. The up and down control pulses of the PFD are converted to syn-
chronized control signals to drive both the output current branch
E. The Conversion and the dummy current branch of the charge pump [Fig. 9(b)].
Fig. 9(a) shows the dummy and output control signals. The
The nonlinear analysis of Section III identified nonlinearity dummy control is delayed versus the output control by
of the conversion as the main cause of noise leakage modifying the thresholds of the second inverter-string (indicated
and spurious tones. Therefore, the PFD and charge-pump cir- by high and low) such that the current always flows, pre-
cuits are carefully optimized toward spurious suppression as venting hard on/off switching of the current sources. To equalize
such and toward a highly linear phase-error detection for rise and fall times and force a perfect rad relation between
spurious suppression. nMOS and pMOS control signals, latches at the outputs of both
First, the reference spur generation by the PFD charge-pump inverter strings are implemented. Capacitors at the control out-
circuit is carefully minimized. The integration in the first path of puts lower the rise and fall times to prevent large charge injec-
the loop filter is done actively to keep the charge-pump output tions by fast switching.
842 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 10. IC microphotograph and the measurement setup in which it is embedded.

To linearize the conversion, the phase detection


is performed by a zero-dead-zone PFD [15], to prevent a hard
nonlinearity around 0 phase error. Due to the delay added in
the PFD, both the up and down current sources are on, for small
or zero phase errors, enabling the PFD to react to very small
phase errors. The on-time fraction of the charge pump due to
the delay is less than 10%. This value is a tradeoff between
dead-zone prevention and sensitivity to noise coupling, when
the charge pumps are on. To further minimize digital noise cou-
pling, the sampling in the PFD and the computational events
in the modulator and prescaler are offset in phase. Conse-
quently, the phase-error decision making is done in a relatively
quiet environment. To make sure that the gains for positive and
negative phase-error detection are equal, the current source tran-
sistors are oversized to ensure sufficient matching. As a side ef-
fect, the current source noise, which can seriously affect the
in-band noise, is decreased. Additionally, the timing control of Fig. 11. Measured output spectrum of the 16 fractional- N PLL at 1.76592
Fig. 9(a) provides synchronization between the two filter paths 0
GHz. All spurious tones are well below 75 dBc/Hz.
and the switches of the charge pumps themselves, thereby en-
suring equal positive and negative phase-error detection gain. 1.76592 GHz, i.e., for a fractional division by 67.92 for compar-
HSPICE simulations of the PFD charge-pump circuit are per- ison with the simulated results. The input to the modulators
formed and show no dead zone and no gain mismatch with ideal is a 16-bit word ( ), resulting in a frequency resolution of
transistor matching. around 400 Hz. The power-supply voltage is only 2 V. Fig. 11
shows the output spectrum of the fractional- PLL over a span
of 55 MHz. The reference spurs are well below 75 dBc, due
V. EXPERIMENTAL RESULTS to the careful charge-pump timing control.
To measure the fractional performance of the frequency syn-
Fig. 10 shows the IC microphotograph and the measurement thesizer, the Matlab data is stored in the data generator memory.
setup in which it is embedded. The fractional measure- Unfortunately, the maximum memory capacity is only 128 kbit,
ments are performed by controlling the PLL divider moduli with leading to large spurious tones at the output at low offset fre-
an HP80000 data generator, which generates the 4-bit control quencies. These large tones corrupt the gain calibration, which is
word. The 4-bit output bit stream is generated using Matlab. performed by the phase-noise measurement system every offset
This provides a flexible way to test different kinds of mod- frequency decade, such that accurate measurements of the phase
ulators, without the need for redesigns. All presented measure- noise at offsets smaller than 10 kHz are not feasible. The mea-
ments are performed with a 26-MHz reference frequency and at sured phase noise of the PLL with the MASH modulator and the
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 843

TABLE II
SUMMARY OF MEASURED SPECIFICATIONS COMPARED TO THE
DCS-1800 SPECIFICATIONS

Fig. 12. Phase-noise measurement with the 16 single-loop multibit converter


at 1.76592 GHz compared to the phase noise at integer division (light).

noisy control pulses are close to the LC tank and the bonding
wires of the VCO power supply. Without proper shielding, the
VCO phase noise is seriously degraded by this noise coupling.
In Fig. 13, the measured noise and the noise as sim-
ulated in Section III (dashed) is compared. The dash-dotted line
is the simulated phase noise of the PLL without control. The
simulated noise leakage closely matches the measured re-
sults, except at very low offsets due to the limited memory. The
phase noise at high offsets is increased versus the simulated PLL
results due to noise coupling. Second-order tones are larger in
measurements, since the models in the simulator do not include
second-order effects and noise coupling. Tones at 520 kHz are
believed to come from subharmonic tones present in the
Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz modulator output [5], which are amplified by mixing through
compared to the simulated 16 noise at the output of the PLL (dashed), and
with the simulated PLL output without 16 control (dash-dotted).
noise coupling. When comparing the results for the MASH and
the single-loop modulator, the measured results are less pro-
nounced than the simulated results (see Fig. 6). The measured
single-loop multibit modulator is presented in Figs. 12 and 13.
phase noise for the single-loop modulator is however a few deci-
Small spurs are present at 2.08 MHz as predicted by the simu-
bels lower than for the MASH modulator. Note that all measure-
lations in Fig. 6. The spur level is well below 100 dBc, due to
ments are performed for frequencies close to integer multiples
careful PFD charge-pump design. The phase noise at 600 kHz
of .
is lower than 120 dBc/Hz. The measured settling time of the PLL is 226 s for a
In Fig. 12, the measured phase noise of the PLL with a
104-MHz frequency step. The power consumption of the PLL
multibit single-loop modulator (dark) is compared to the phase
is 70 mW from a 2-V power supply. The fully integrated
noise at integer division (light). Noise at lower offsets origi-
low-phase-noise VCO is responsible for almost 66% of the
nates from the modulator due to noise folding in the PFD,
total power consumption. The IC area is 2 2 mm , including
as predicted by the simulations. As a result, the rms phase error
bonding pads and bypass capacitors. Table II shows the mea-
is increased from 1.7 to 3 . Note that the phase noise
sured specifications compared to the DCS-1800 specifications
of the PLL at integer divisions is as low as 124 dBc/Hz
[1]. The specifications of the IC prototype comply with the
at 600 kHz, which is only 0.3 dB higher than predicted by
DCS-1800, only the is degraded due to the limited
the PLL simulations (see Table I). The measured results for
resolution of the measurement setup.
fractional division are much noisier than predicted by simu-
lation. The phase noise at offset frequencies close to 10 kHz
is increased due to the limited memory of the data generator. VI. CONCLUSION
The noise at higher offset frequencies is corrupted by noise A monolithic 1.8-GHz -controlled fractional- PLL
coupling from the data generator. As can be seen in Fig. 10, frequency synthesizer is implemented in a standard 0.25- m
the -control bonding wires, which conduct rail-to-rail, very CMOS technology. The monolithic fourth-order type-II PLL
844 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

integrates the digital synthesizer part together with a fully [11] B. De Muer and M. S. J. Steyaert, “A single-ended 1.5-GHz 8/9 dual-
integrated LC VCO, a high-speed prescaler, and a 35-kHz modulus prescaler in 0.7-m CMOS with low phase-noise and high
input sensitivity,” in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC),
dual-path loop filter on a die of only 2 2 mm . To investigate The Hague, Sept. 1998, pp. 256–259.
the influence of the modulator on the synthesizer’s spectral [12] J. Craninckx and M. S. J. Steyaert, “Low-phase-noise fully integrated
purity, a fast nonlinear analysis method is developed, showing CMOS frequency synthesizers,” Ph.D. dissertation, Katholieke Univ.
Leuven, Belgium, 1997.
good correspondence with measurements, in contrast to the [13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, “A 1.8-GHz
results of the theoretical analysis. Nonlinear mixing in the highly tunable low-phase-noise CMOS VCO,” in Proc. IEEE Custom
phase-frequency detector and the VCO is identified as the main Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585–588.
[14] B. De Muer and M. S. J. Steyaert, “Fully integrated CMOS frequency
source of spectral pollution in fractional- synthesizers. synthesizers for wireless communications,” in Analog Circuit Design,
MASH and single-loop multibit modulators are compared W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell,
for use in fractional- synthesis. Although the MASH is stable MA: Kluwer, 2000, pp. 287–323.
[15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.
and easy to integrate, the single-loop modulator presents a
better solution, showing less sensitivity to noise leakage and
noise coupling and providing more flexibility. The measured
phase noise is lower than 120 dBc/Hz at 600 kHz and Bram De Muer (S’00) was born in Sint-Amands-
139 dBc/Hz at 3 MHz. The measured fractional spur level is berg, Belgium, in 1973. He received the M.Sc.
lower than 100 dBc, satisfying the DCS-1800 spectral purity degree in electrical engineering in 1996 from the
Katholieke Universiteit Leuven, Belgium, where
requirements. All measurements are performed for frequencies he is currently working toward the Ph.D. degree
close to integer multiples of the reference frequency, where the on high frequency low-noise integrated frequency
synthesizer is most sensitive to spurious tones. synthesizers at the ESAT-MICAS laboratories.
He has been a Research Assistant with
ESAT-MICAS laboratories since 1996. His research
REFERENCES is focused on integrated low-phase-noise VCOs with
on-chip planar inductors and high-speed prescaler
[1] M. S. J. Steyaert, J. Janssens, B. De Muer, M. Borremans, and N. Itoh, “A design, leading to fully integrated 16 fractional-N synthesizers in CMOS
2-V CMOS cellular transceiver front-end,” IEEE J. Solid-State Circuits, technology.
vol. 35, pp. 1895–1907, Dec. 2000.
[2] T. Cho, E. Dukatz, M. Mack, D. Macnally, M. Marringa, S. Mehta, C.
Nilson, L. Plouvier, and S. Rabii, “A single-chip CMOS direct-conver-
sion transceiver for 900-MHz spread-spectrum digital cordless phones,”
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Michel S. J. Steyaert (S’85–A’89–SM’92) was born
Francisco, CA, Feb. 1999, pp. 228–229. in Aalst, Belgium, in 1959. He received the M.S.
[3] A. Rofougaran, G. Chang, J. J. Rael, J. Y.-C. Chang, M. Rofougaran, P. degree in electrical-mechanical engineering and
J. Chang, M. Djafari, J. Min, E. W. Roth, A. A. Abidi, and H. Samueli, the Ph.D. degree in electronics from the Katholieke
“A single-chip 900-MHz spread-spectrum wireless transceiver in 1-m Universiteit Leuven (K.U. Leuven), Heverlee,
CMOS—Part II: Receiver design,” IEEE J. Solid-State Circuits, vol. 33, Belgium, in 1983 and 1987, respectively.
pp. 547–555, Apr. 1998. From 1983 to 1986, he obtained an IWONL fel-
[4] M. Copeland, T. Riley, and T. Kwasniewski, “Delta–sigma modulation lowship (Belgian National Foundation for Industrial
in fractional-N frequency synthesis,” IEEE J. Solid-State Circuits, vol. Research) which allowed him to work as a Research
28, pp. 553–559, May 1993. Assistant at the Laboratory ESAT at K.U. Leuven.
[5] S. R. Norsworthy, R. Schreier, and G. C. Themes, Delta–Sigma Data In 1987, he was responsible for several industrial
Converters: Theory, Design and Simulation. New York: IEEE Press, projects in the field of analog micropower circuits at the Laboratory ESAT as
1997. an IWONL Project Researcher. In 1988, he was a Visiting Assistant Professor
[6] B. Miller and R. Conley, “A multiple modulator fractional divider,” at the University of California, Los Angeles. In 1989, he was appointed by
IEEE Trans. Instrum. Meas., vol. 40, pp. 578–583, June 1991. the National Fund of Scientific Research (Belgium) as a Research Associate,
[7] “Digital cellular communication system (Phase 2+); Radio transmission in 1992 as a Senior Research Associate, and in 1996 as a Research Director
and reception,” Eur. Telecommun. Standards Inst., ETSI 300 190 (GSM at the Laboratory ESAT, K.U. Leuven. Between 1989 and 1996, he was also
05.05 version 5.4.1), 1997. a part-time Associate Professor and since 1997 an Associate Professor at
[8] W. Rhee, B.-S. Song, and A. Ali, “A 1.1-GHz CMOS fractional-N
16
the K.U. Leuven. His current research interests are in high-performance and
frequency synthesizer with a 3-b third-order modulator,” IEEE J. high-frequency analog integrated circuits for telecommunication systems and
Solid-State Circuits, vol. 35, pp. 1453–1460, Oct. 2000. analog signal processing.
[9] The Mathworks Inc., Matlab User’s Guide, Version 5. Englewood Dr. Steyaert received the 1990 European Solid-State Circuits Conference
Cliffs, NJ: Prentice Hall, 1997. Best Paper Award, the 1995 and 1997 ISSCC Evening Session Award, the
[10] J. Yuan and C. Svensson, “New single-clock CMOS latches and flip- 1999 IEEE Circuit and Systems Society Guillemin–Cauer Award, and the
flops with improved speed and power savings,” IEEE J. Solid-State Cir- 1991 NFWO Alcatel-Bell-Telephone award for innovative work in integrated
cuits, vol. 32, pp. 62–69, Jan. 1997. circuits for telecommunications.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997 691

A 960-Mb/s/pin Interface for Skew-Tolerant Bus


Using Low Jitter PLL
Sungjoon Kim, Student Member, IEEE, Kyeongho Lee, Student Member, IEEE, Yongsam Moon, Student Member, IEEE,
Deog-Kyoon Jeong, Member, IEEE, Yunho Choi, and Hyung Kyu Lim, Member, IEEE

Abstract—This paper describes an I/O scheme for use in a high- The chip layout and experimental results are presented in
speed bus which eliminates setup and hold time requirements Section IV followed by a conclusion in the final section.
between clock and data by using an oversampling method. The
I/O circuit uses a low jitter phase-locked loop (PLL) which
suppresses the effect of supply noise. Measured results show peak-
to-peak jitter of 150 ps and rms jitter of 15.7 ps on the clock line. II. SYSTEM ARCHITECTURE
Two experimental chips with 4-pin interface have been fabricated Two chips, bus master and bus slave, were designed. Bus
with a 0.6-m CMOS technology, which exhibits the bandwidth masters in a system bus initiate bus transactions, and slaves re-
of 960 Mb/s per pin.
spond to the tenured master. For example, a memory controller
Index Terms— Skew-tolerant, high speed bus, oversampling, works as the master chip and a memory with a high-speed
phase locked loop, jitter, CMOS, phase frequency detector, volt- interface works as the slave chip. A simplified block diagram
age controlled oscillator.
of the two chips is shown in Fig. 1. The bus signals are
composed of 4-b wide data lines, a clock line, and a reference
I. INTRODUCTION line. A charge pump PLL multiplies the external clock by
two and generates two sets of multiphase clocks for both bit
A S the speed of high-speed digital systems tends to be
limited by the bandwidth of pins, new I/O architectures
are gaining momentum over conventional ones. The advent
serialization and data oversampling. The relationship between
internal 12-phase clocks and external clock is shown in Fig 2.
First set of multiphase clocks are 12 multiphase clocks with
of 64 Mb and 256 Mb DRAM’s and faster logic chips also
30 of phase separation. These 12 clocks are shown in Fig 2(a)
propels the need for high-speed I/O interface while reducing
as PCK[0] to PCK[11]. These multiphase clocks were laid out
the number of pins and hence the system cost. Synchronous
to minimize the interference. Fig 2(b) shows the multiphase
DRAM’s increased chip bandwidth up to 220 Mb/s/pin [1]. A
clock distribution. Ground lines were inserted between each
revolutionary architecture using delay-locked loops (DLL’s) or
multiphase clock to minimize the interference. When one
phase-locked loops (PLL’s) was also successful in providing
clock is switching, the adjacent clocks are guaranteed to
over 500 Mb/s/pin bandwidth [2], [3]. Such a narrow, high-
be in stable state. This configuration minimizes coupling
speed bus provides large bandwidth in a small, low pin-count
between clocks. The second set of multiphase clocks are
package, but such high-speed bus architectures inevitably
four multiphase clocks with 90 of phase separation. This
require strict phase relationships between clock and data.
second set of multiphase clocks, TCK[0] to TCK[3], are in
A phase-tolerant I/O scheme was also developed previously
phase with PCK[0], PCK[3], PCK[6], PCK[9], respectively.
for a point-to-point link [4]. This paper describes an I/O
We generate these two separate sets of clocks to equalize
scheme for use in a high-speed bus which eliminates setup
loading conditions.
and hold time margins by using blind 3 oversampling and
An 8-b parallel data stream is first converted to a 4-b
data recovery. In the new scheme, the clock line delivers
data stream by an internal clock and then serialized with a
only frequency information. The data receiving circuits extract
serialization circuit. The serializer circuit used is the same
phase information from the data itself. An 8-b data bus
type of circuit reported in [4]. The only difference is that
employing this skew insensitive scheme can deliver over
four phase clocks instead of ten phase clocks of the previous
960 MB/s. Two experimental chips with 4-pin interface were
design are used in this design, thereby reducing area and
fabricated.
parasitic capacitance at high-speed nodes. The serial stream
In Section II, the chip architecture and the skew-tolerant I/O
is driven by a current controlled open-drain output driver.
scheme will be presented. The circuit design techniques for
The second set of multiphase clocks, TCK[0] to TCK[3],
low jitter PLL and other circuits are discussed in Section III.
are used by the transmitter to serialize 4 b of data. Each
pin connected to a high-speed bus has 12 oversamplers and
a output driver. In [6], 32 clock phases are generated to
Manuscript received August 20, 1996; revised December 3, 1996.
S. Kim, K. Lee, Y. Moon, and D.-K. Jeong are with the Inter-University oversample the incoming data. The decision on the degree
Semiconductor Research Center, Seoul National University, Seoul 151-742, of oversampling is a tradeoff between input data phase jitter
Korea. tolerance, power, and area. If too many clock phases are used
Y. Choi and H. Lim are with Samsung Electronics Co., Yongin-City,
Kyungki-Do, Korea. per bit period, power consumption and chip area will increase.
Publisher Item Identifier S 0018-9200(97)02850-3. But low oversampling ratio may affect the tolerance of phase
0018–9200/97$10.00  1997 IEEE
692 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 1. Simplified block diagram of master and slave chip.

(a)

(b)
Fig. 2. (a) External clock and 12 multiphase clocks relationship. (b) Multiphase clock layout.

jitter on the incoming data. If the phase jitter on the incoming Fig. 3. The serial input data is sampled at the rising edges
data is low and the PLL has low jitter characteristics, the of each multiphase clock. The receiver samples the serial
oversampling ratio can be as low as three [7]. The oversampler data blindly without any constraint on setup and hold time
oversamples the bus data three times per bit using 12 phase margins. The sampled data is amplified again regeneratively to
clocks provided by a PLL. To extract correct phase information reduce possible metastability. Fig. 3 shows two high-speed bus
from the data stream, the high-to-low transition is inserted in signals, bus signal 0 and bus signal 1, with skew between them.
each head of a packet on each pin for correct data sampling. When the signal receiver detects the first 1-to-0 transition, it
The slaves of the bus keep oversampling the bus signals to selects the next bit as the first valid data. The third bit after
catch the start of a bus transfer. This process is illustrated in the first valid bit is also selected as valid. It is assumed
KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS 693

Fig. 3. Skew-insensitive bus operation.

Fig. 4. Byte skew handling operation.

Fig. 5. Functional block diagram of charge pump PLL.

Fig. 7. Implemented phase frequency detector.

that the next oversampled bit after the first 1-to-0 transition
was sampled near the center of data eye pattern. Each pin
of the data bus tracks the start phase of a data transfer
separately. After each pin catches the start of a data transfer,
the demultiplexed data of each pin is retimed into a single
internal clock domain. Since this process can be done in one
clock cycle, the masters can respond quickly as distance from
the signal source changes.
Since this scheme allows skew not only in clock line but
also among data lines, there is a possibility that some of
the demultiplexed parallel data are one internal clock cycle
Fig. 6. Conventional phase frequency detector. earlier or later than the other demultiplexed data after retiming.
694 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 8. (a) PFD dead zone and (b) PLL jitter.

Fig. 9. Voltage controlled oscillator circuit diagram.

The skew handler examines the parallel output of each pin


and checks whether every pin is aligned properly. If some
of the parallel outputs are not aligned, skew handler delays Fig. 10. Simulated UP/DOWN pulse width difference as a function of input
the parallel outputs which arrived earlier. Fig 4 explains the phase difference.
operation of the interpin skew handler operation.
detector, charge pump, loop filter, clock divider, and a voltage-
III. CIRCUIT DESCRIPTION controlled oscillator (VCO). With a six-stage differential VCO,
12 clock phases are available to oversample the incoming data
A. Low Jitter PLL and to serialize parallel data into serial bit stream.
The performance of the PLL or DLL is one of the limiting One of the critical building blocks of the PLL is the phase
factors of the high-speed interface or serial communications. frequency detector (PFD). A low precision PFD has a wide
The jitter characteristics become more important especially for dead zone (undetectable phase difference range), which results
such applications that require integration of PLL or DLL with in increased jitter. The jitter caused by the large dead zone can
noisy digital circuits. Integration with digital circuits induces be reduced by increasing the precision of the phase frequency
noise on the supply rails or on the substrate. Since the charge detector. Fig. 6 shows a conventional implementation of a
pump PLL used in this design generates multiple phase clocks static PFD [8]. This conventional PFD is an asynchronous state
to divide one external clock period into many equally spaced machine. The delay time to reset all internal nodes determines
intervals, the accuracy and the jitter characteristics become the circuit speed. The critical path of the conventional PFD
more important. is shown in bold lines in Fig. 6. The critical path forms a
Fig. 5 shows the functional block diagram of a charge feedback path with six gate delays. The dead-zone occurs when
pump PLL clock generator. It consists of a phase frequency the loop is in a lock mode and the output of the charge pump
KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS 695

Fig. 11. Voltage controlled oscillator circuit diagram.

does not change for small changes in the input signals at the
PFD. Any width of the dead-zone directly translates to jitter
in the PLL and must be avoided.
To overcome the speed limitation and to reduce the dead
zone, a new dynamic logic style PFD was designed. A
similar dynamic comparator was reported before [9]. But our
implementation requires fewer number of transistors. Fig. 7
shows the circuit diagram of the PFD. Conventional static
logic circuitry was replaced by dynamic logic gates. As a
result, the number of transistors in the PFD core is reduced
from 44 to 16. The critical path of this PFD is shown also
in Fig. 7. The critical path of this PFD is composed of three-
gate feedback path. The shortened feedback path delay and
dynamic operation allow high precision in the high-frequency Fig. 12. VCO operation for step supply noise.
operation.
Fig. 8. shows the relation between dead zone of PFD and
on the performance of the VCO. So the noise insensitivity of
the phase error of PLL. If the phase difference of EXT clock
the VCO is very important. The VCO implemented in this
and VCO clock is smaller than the dead zone, the PFD cannot
detect the phase difference. So the phase error signal of PFD design has a simple bias circuit to reject supply step noise.
will remain zero, resulting in unavoidable phase error between The processor or bus can have intervals when there is heavy
EXT clock and VCO clock. The minimum peak-to-peak phase circuit activity in switching large amounts of capacitance and
error caused by this dead zone is intervals when there is very little circuit activity. This will
show up as steps or impulses on the power supply of PLL [8].
Minimum Peak-to-Peak Phase Error (1) The actual peak-to-peak jitter in this case becomes dominated
by the peaks in the impulse transient noise response. The VCO
used in the design is a six-stage differential-type ring oscillator
In order to avoid dead zone, the PFD asserts both UP and
with limited voltage swing and is shown in Fig. 11. Each stage
DOWN outputs as shown in Fig 9. For in-phase inputs of
is made up of a differential NMOS pair with variable resistance
EXT_CLK and VCO_CK, the charge pump will see both
loads made of PMOS devices operating in the triode region.
UP and DOWN pulse for the same short period of time. If
The bias voltage for the PMOS is generated by a replica bias
there is a phase difference between EXT_CLK and VCO_CK,
the width of UP and DOWN pulse will be proportional to circuit. The operation of this bias circuit is shown in Fig. 12.
the phase differences of the inputs. Fig. 10 shows the SPICE The voltage dynamically tracks the supply variations. The
simulation result of the UP/DOWN pulse width differences replica bias circuit which consists of replica delay cell and an
as a function of the input phase differences. The deadzone of op-amp sets the minimum voltage level of the internal VCO
the PFD is significantly smaller than the measured maximum swing to The signal is generated by two resistors and
PLL jitter. one capacitor. When the supply rail is quiet, the voltage swing
Several critical parameters of the PLL, such as speed, timing of the internal VCO is - Let us assume that there is a
jitter, spectral purity, and power dissipation, strongly depend supply voltage step variation of at some point. After the
696 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 13. Phase and byte sync block diagram.

Fig. 14. Sampler circuit diagram.

step change at the supply, the level settles to to the increased supply voltage for a short period of time.
And the voltage swing at the VCO increases with a time
(2) constant determined by and OPAMP bandwidth
and approaches to
with a time constant of
(4)
(3)
which result in the increase of one stage delay. This gives
At the instant of supply step change, the voltage difference an averaging effect on the VCO delay after the supply step
between and remains the same due to the capacitor change, making the delay change minimized with supply step
at the generator. If - is fixed, the delay cells run change. If we select and values for a minimum
a little bit faster due to the supply voltage increase instead average delay change, the effect of supply step change can be
of keeping exact constant delay. Since - remains the nullified. The values we chose for this particular process are
same temporarily, the delay cells run a little bit faster due k k and pF.
KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS 697

PLL circuits can be sensitive to noise pickup from the


supplies and substrate. So the PLL circuit has a dedicated
power and ground pads. Bypass capacitors are included in the
layout to stabilize VDD and GND of PLL. Guard rings are
used to isolate PLL and other digital parts. The placement of
multiphase clocks were carefully chosen to remove possible
coupling between clocks.

B. Phase and Byte Sync


Phase and byte sync block at Fig. 1 is shown in Fig 13.
It consists of 3-to-1 mux array, metastability resolver, start
bit finder, phase memory, word memory, shifter, and D-flip-
flops (DFF’s). This circuit finds the start bit and decimates
the oversampled 12 b and aligns the byte boundary. The
oversampled 12 b are sent from the sampler to the metastability
resolver. Since the oversampled 12 b are not sampled at
the center of the eye, there is a possibility that some of
the bits are still at the metastable state. The metastability
is practically removed by one more stage of synchronizers
in the metastability resolver. The start bit finder receives
information from the metastability resolver and selects one
of the three phases as a correct phase and also extracts byte
align information. The phase and byte align information are Fig. 15. Microphotograph of master chip.
stored at the phase and word memory. The 3-to-1 mux array
decimates 12 b into 4 b. The shifter at the final stage aligns
the byte boundary according to the value of the word memory.

C. Oversampler
The oversampler used in the data receiver is shown in
Fig. 14. Each oversampler is a cascaded sense amplifier and
uses four clocks for correct, timely sampling. It is very
important to reduce the probability of metastability by careful
design and layout. The same size is used for both PMOS and
NMOS in the core synchronizing amplifier to maximize the
loop bandwidth.

IV. EXPERIMENTAL RESULTS


Two prototype chips, master and slave, have been fabricated
in a 0.6- m double-metal CMOS process. Fig. 15 shows the
microphotograph of the fabricated master chip. This chips
occupies 4100 m 4300 m including pad area. The master
chip incorporates a common skew-insensitive I/O macro block,
a bus protocol handler, and a self-test circuit for chip and
system diagnostics. The common skew-insensitive I/O macro
block includes a charge-pump PLL for multiphase generation,
oversamplers, I/O buffers, parallel-to-serial converters, and a Fig. 16. Microphotograph of slave chip.
bias generator for internal use. The core area for the skew-
insensitive I/O macro block is 3600 m 700 m for 4-pin
interface. The microphotograph of the fabricated slave chip The rms jitter is 15.7 ps when the tested chip is active. The
is shown in Fig 16. It has the same die size as the master peak-to-peak jitter was measured to be less than 150 ps. This
chip. Many blocks are shared with the master chip. The skew- PLL jitter characteristic is especially important for multiphase
insensitive I/O macro block and the charge pump PLL are the operation.
same as those of the master’s. The slave chip includes a small Fig. 18 shows an output data waveform at 960 Mb/s. The
internal fast SRAM to verify correct read/write operations. master chip is sending data to the bus according to the
The measured charge pump PLL jitter histogram of the predetermined bus protocol. The jitter at the output data is
master and the slave chips is shown in Fig. 17. Since the two larger than the jitter at the charge pump PLL clock due to the
chips use the same PLL, it showed similar jitter performance. extra modulation effect of supply voltage fluctuation to data
698 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 17. PLL jitter histogram.

Fig. 18. Output data waveform.

output. The speed limit came from several reasons. CMOS TABLE I
driving capability limitation and the signal degradation through MAIN FEATURES OF THE CHIP

chip packaging and printed circuit board (PCB) were among Core Area 3.6 mm 2 0.7 mm
the main factors. Technology 0.6-m double-metal CMOS
The skew-insensitive receiving operation was also observed. Supply Voltage 3.3 V
Data Rate 960 Mb/s
There are four high-speed pins in the prototype chip. We
PLL jitter 15.8 ps rms @ 960 Mb/s
made a PCB with four high-speed impedance controlled bus
Power 0.7 W fully active
lines. The length of normal lines is 12 cm. One of the high-
speed signal paths was made intentionally longer than the other pin with a longer trace. The lower waveform is from the
signals by 10 cm. The 960 Mb/s high-speed serial data was normal length pin. Although the two pins have different trace
sent into the receiver. The receiver recovers the serial data lengths, the chips could receive data without errors. The power
into 8-b 120 - MHz parallel data. Fig. 19 shows 120 MHz dissipation at 960 Mb/s was 0.7 W for the master chip. The
recovered parallel data. The upper waveform is from the chip characteristics is summarized in Table I.
KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS 699

Fig. 19. Skew-insensitive I/O operation.

V. CONCLUSION [9] H. Notani et al., “A 622-MHz CMOS phase-locked loop with precharge-
type phase frequency detector,” in Proc. Symp. VLSI Circuits, June 1994,
A new high-speed skew-insensitive I/O scheme has been pp. 129–130.
described in this paper. Two chips that incorporated the new
I/O scheme using the low jitter PLL technique have been
fabricated in a 0.6- m double-metal CMOS process. Three
Sungjoon Kim (S’91) was born in Pusan, Korea, on
times oversampling technique relaxed the strict requirement of June 2, 1970. He received the B.S. and M.S. degrees
setup and hold margins of high-speed chip-to-chip interfaces. in electronics engineering from Seoul National Uni-
Newly designed fast phase frequency detector and a high versity in 1992 and 1994, respectively. Since 1994
he has been working toward the Ph.D. degree in the
noise immunity VCO circuit improved jitter performance of same university.
PLL. The measured PLL rms jitter was 15.7 ps. Accurate He spent the summer of 1995 working on the
multiphase clock generation for oversampling the bus signal limiting factors of CMOS Gb/s transmission at SUN
Microsystems, CA. His research interests include
was made possible by utilizing the low jitter PLL. By using clock and data recovery for high-speed communi-
such techniques, skew-insensitive data transfer was tested. cation and high-speed I/O interface circuits.
This skew-insensitive I/O scheme is useful for high-speed
ASIC-to-memory and ASIC-to-ASIC interfaces. This scheme
will become more important as the chip-to-chip data transfer
speed goes up.
Kyeongho Lee (S’92) was born in Seoul, Korea,
on August 5, 1969. He received the B.S. and M.S.
degrees in electronics engineering from Seoul Na-
REFERENCES tional University in 1993 and 1995, respectively.
He is currently working toward the Ph.D. degree in
[1] M. Horiguchi et al., “An experimental 220 MHz 1 Gb DRAM,” in
electronics engineering of the same university.
ISSCC 1995 Dig. Tech. Papers, pp. 252–253.
[2] M. Horowitz et al., “PLL design for a 500 MB/s interface,” in ISSCC He is working on various CMOS high-speed cir-
1993 Dig. Tech. Papers, pp. 160–161. cuits for data communication. His research interests
[3] T. H. Lee et al., “A 2.5 V CMOS delay-locked loop for an 18 Mbit, include high-speed CMOS interface circuits, high-
500 Megabytes/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. speed video display system, and PLL systems for
1491–1496, Dec. 1994. Gigabit communication.
[4] E. Reese et al., “A phase tolerant 3.8 GB/s data-communication router
for a multiprocessor supercomputer backplane,” in ISSCC 1994 Dig.
Tech. Papers, Feb. 1994, pp. 296–297.
[5] S. Kim et al., “A pseudo-synchronous skew-insensitive I/O scheme for
high bandwidth memories,” in Proc. Symp. VLSI Circuits, June 1994, Yongsam Moon (S’97) was born in Incheon, Korea,
pp. 41–42. on March 1, 1971. He received the B.S. and M.S. de-
[6] M. Bazes and R. Ashuri, “A novel CMOS digital clock and data grees in electronics engineering from Seoul National
decoder,” IEEE J. Solid-State Circuits, vol. 27, pp. 1934–1940, Dec. University in 1994 and 1996, respectively, where he
1992. is currently working toward the Ph.D. degree.
[7] S. Kim et al., “An 800 Mbps multi-channel CMOS serial link with 3 2 He has been working on architectures and CMOS
circuits for microprocessors. His current research
oversampling,” in Proc. IEEE Custom Integrated Circuit Conf., 1995,
pp. 451–454. interests are in clock and data recovery circuits for
[8] I. Young et al., “A PLL clock generator with 5 to 110 MHz lock range for high-speed data communication.
microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1599–1607,
Nov. 1992.
700 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Deog-Kyoon Jeong (S’87–M’89) received the B.S. Hyung Kyu Lim (S’82–M’84) was born February
and M.S. degrees in electronics engineering from 4, 1953, in Kyung-Nam, Korea. He received the B.S.
Seoul National University, Seoul, Korea, in 1981 degree from the Seoul National University, Seoul,
and 1984, respectively, and the Ph.D. degree in Korea, the M.S. degree from the Korea Advanced
electrical engineering and computer sciences from Institute Science and Technology, and the Ph.D.
the University of California, Berkeley, in 1989. degree from the University of Florida, Gainesville,
From 1989 to 1991, he was with Texas Instru- all in electrical engineering, in 1976, 1978, and
ments, Dallas, TX, where he was a member of the 1984, respectively.
technical staff working on the single chip imple- Since 1976, he has been with the Semiconductor
mentation of the SPARC architecture. Since 1991, Research and Development Center, Samsung Elec-
he has been on the faculty of the School of Electrical tronics Co., Kiheung, Korea. From 1978 to 1981,
Engineering and the Inter-University Semiconductor Research Center, Seoul he was engaged in the development of bipolar linear integrated circuits
National University. His main research interests include high-speed circuits, and CMOS watch chips. After finishing his Ph.D. study, he worked mainly
VLSI systems design, microprocessor architectures, and memory systems. in the area of high-density MOS memory development. Starting from a
64 Kb EEPROM design in 1984, he led various memory device research
and development projects that include 256 Kb EEPROM, 16 Mb mask
ROM, 1 Mb high-speed static Ram, and 1/3 inch CCD image sensor. He
is currently responsible for design engineering of all MOS memory research
Yunho Choi was born in Incheon, Korea, on March and development projects in which dynamic RAM and specialty memories are
29, 1960. He received the B.S. degree in electrical added. He has authored or coauthored over 20 technical journal and conference
engineering from Seoul National University, Seoul, papers and holds 23 patents.
Korea, in 1983. Dr. Lim is a member of the IEEE Electron Device Society.
He joined Samsung Semiconductor Inc., Santa
Clara, CA, in 1983, where he was engaged in
the design of the 256K DRAM. Since 1986, he
has been working on the design of high-density
dynamic memory including synchronous DRAM at
the Semiconductor Research Center, Samsung Elec-
tronic Company, Ltd., Kiheung, Korea. Currently he
is in charge of specialty memory design such as graphics memory and merged
DRAM and logic product development.
784 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

A Dual-Loop Delay-Locked Loop Using Multiple


Voltage-Controlled Delay Lines
Yeon-Jae Jung, Seung-Wook Lee, Daeyun Shim, Wonchan Kim, Changhyun Kim, Member, IEEE, and Soo-In Cho

Abstract—This paper describes a dual-loop delay-locked loop


(DLL) which overcomes the problem of a limited delay range by
using multiple voltage-controlled delay lines (VCDLs). A reference
loop generates quadrature clocks, which are then delayed with
controllable amounts by four VCDLs and multiplexed to generate
the output clock in a main loop. This architecture enables the DLL
to emulate the infinite-length VCDL with multiple finite-length
(a)
VCDLs. The DLL incorporates a replica biasing circuit for
low-jitter characteristics and a duty cycle corrector immune to
prevalent process mismatches. A test chip has been fabricated
using a 0.25- m CMOS process. At 400 MHz, the peak-to-peak
jitter with a quiet 2.5-V supply is 54 ps, and the supply-noise
sensitivity is 0.32 ps/mV.
Index Terms—Clock synchronization, delay-locked loop, duty
cycle corrector, replica biasing, voltage-controlled delay lines.
(b)
I. INTRODUCTION Fig. 1. (a) Block diagram of a conventional DLL. (b) Lock-failure cases.

F OR high-performance microprocessors and memory ICs,


the use of phase-locked loops (PLLs) or delay-locked loops
(DLLs) is essential to minimize the negative effects caused by
is described with design concepts and various building blocks.
Section III describes circuits for low-jitter scheme and duty
skews and jitters of clock signals. In applications where the fre- cycle correction. Section IV discusses the prototype chip
quency multiplication is not required, a DLL is a natural choice implementation and shows experimental results. Section V
since it is free from the jitter accumulation problem of an oscil- concludes this paper with a summary.
lator-based PLL. Conventional DLLs, however, suffer from the
problem of their limited delay range since DLLs adjust only the II. ARCHITECTURE
phase, not the frequency.
We propose a new dual-loop DLL architecture that allows un- A. Limited Range Problem of Conventional DLLs
limited delay range by using multiple voltage-controlled delay A simplified block diagram of a conventional DLL [1] is out-
lines (VCDLs). In our architecture, the reference loop gener- lined with its lock-failure cases in Fig. 1. In the normal condi-
ates four evenly spaced clocks, which are then delayed with tion, the DLL forces the output clock ( ) to be aligned
controllable amounts by four VCDLs and multiplexed to gen- with the input reference clock ( ) through the negative
erate the output clock in the main loop. The selection and delay feedback loop, which comprises a voltage-controlled delay line,
control in the main loop permit the DLL to emulate the infi- a phase detector, a charge pump, and a loop filter. The clock
nite delay range with a multiple of finite-length VCDLs. More- buffer (CLK-BUF) is inserted to provide the chip-wide clock.
over, a fully analog control technique can be applied to exploit Although this simple architecture offers many design flexibil-
the established benefits of conventional DLLs such as low skew ities, the main problem in the conventional DLL of Fig. 1(a)
and low jitter. To reduce supply-noise sensitivity further, a new is that the delay time of the VCDL ( ) has a minimum
low-jitter scheme is employed in a replica biasing circuit, which and a maximum boundary. Therefore, the DLL has states in
compensates the delay variation of a delay line against the in- which it does not work, as shown in Fig. 1(b). When
jected supply noise. Finally, a duty cycle corrector immune to has a maximum delay and the leads the , DN
process mismatches is also used. pulses are generated but the VCDL can not produce any more
This paper is arranged as follows. In Section II, following a delay. On the other hand, when has a minimum delay
brief overview of conventional DLLs, the proposed architecture and the lags the , UP pulses are generated but
the VCDL cannot reduce any more delay. These lock-failure
Manuscript received June 27, 2000; revised October 15, 2000. cases arise from the facts that the range of is limited
Y.-J. Jung, S.-W. Lee, D. Shim, and W. Kim are with the School of Electrical and the initial value of is not known at loop startup. An
Engineering, Seoul National University, Seoul 151-742, Korea. additional loop startup control circuitry may solve this problem
C. Kim and S.-I. Cho are with Samsung Electronics Company, Kyungki-do,
Korea. and the DLL acquire lock. Unfortunately, the delay time of the
Publisher Item Identifier S 0018-9200(01)03028-1. clock buffer and following clock distribution tree ( )
0018–9200/01$10.00 © 2001 IEEE
JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP 785

trol voltages generated from two main loop charge pumps. The
multiplexer selects one of four clocks as and this clock
feeds the clock buffer whose function is to convert low swing to
full CMOS-level as well as provide the chip-wide output clock,
. The drives the phase detector which compares
it to the reference clock. The output of the phase detector is used
by two charge pumps and four loop filters to control the delay
time of each main loop VCDL. Four-to-one clock switching
is implemented by the window finder and the state decoder
block. The window finder monitors the boundary where the se-
lected is switched and forces the state decoder to update
the two-bit selection code at the switching event. The selection
code not only controls the clock selection at the multiplexer but
changes the configuration of two charge pumps and four loop
filters to accommodate the clock switching. Duty cycle correc-
tion (DCC) is employed to remove the duty cycle imperfections
Fig. 2. Block diagram of the proposed dual-loop DLL. of the input clock and the output clock . Fi-
nally, although two input clocks, and , can
be merged into one clock input, lower jitter clock source is pre-
deviates from the value at the simulation stage according to ferred as the , if possible, since it determines the jitter
temperature and voltage variations [2]. When the variation of characteristics of the whole DLL.
is excessive, the DLL loses the lock and falls into In this architecture, the clock selection scheme enables the
the lock-failure cases in Fig. 1(b). output clock to cover the entire phase range (modulo ). Fur-
A DLL relying on quadrature phase mixing [3] has been thermore, seamless clock switching is possible by optimizing
proposed to overcome the limited range problem of the con- the main loop VCDL delay control scheme. Moreover, the phase
ventional DLL. The phase mixing technique using quadrature locking is achieved by fully analog control in all loops, so that
clocks provides unlimited phase shift capability. However, we can apply low-skew and low-jitter techniques, established in
phase mixing uses two small slew-rate clocks to obtain linear conventional DLLs.
results. Therefore, this approach has the disadvantage of the
increased dynamic noise sensitivity and jitter. In the semidig-
C. Reference Loop Design
ital DLL [4], a digitally controlled phase interpolator uses
internally generated 30 -spaced clocks through the dual DLL The objectiveness of the reference loop is to provide quadra-
architecture. Although noise sensitivity issues on the phase ture clocks to the main loop. Since the main loop uses these
interpolation could be alleviated by smaller interpolation multiphase clocks as references, the phase distribution in the
intervals, inherent digital nature causes dithering around zero output clocks should be preserved against a possible harmonic
phase error due to continuous control-bit updates. A digital lock. The reference loop phase detector depicted in Fig. 3(a) has
DLL architecture with infinite phase capture ranges [5] is also the capability to detect and escape up to the second harmonic
not free from the same dithering problem and requires a large lock. This design is made of two level-sensitive AND/NAND logic
chip area for fine delay control. which requires 45 and 90 clocks as well as 0 and 180 clocks.
At one period lock, clocks and UP/DN output waveforms are
B. Proposed Dual-Loop DLL shown in Fig. 3(b). The phase detector asserts their UP and DN
outputs for equal duration due to 45 clock in order to avoid a
Fig. 2 shows a block diagram of the proposed dual-loop DLL dead-zone problem, although the phase offset of the reference
architecture [6]. This architecture is based on two loops: the ref-
loop gives negligible effects on the offset of the main loop output
erence loop and the main loop. The reference loop is locked clock. At the second harmonic lock as shown in Fig. 3(c), the
at 180 phase shift through the conventional DLL architecture. phase detector detects that the loop is in the harmonic lock due
Since the reference loop VCDL is composed of four main delay
to 90 clock and asserts only UP output to escape the harmonic
cells, each delay cell generates a 45 phase shift at locked con-
lock. By limiting the delay range of a delay line, there is no pos-
dition. All delay cells including delay buffers are differential el- sibility of harmonic lock over third since the reference loop is
ements commonly controlled by the output of the charge pump. composed only of delay cells with no additional delay elements
The delay cell named “3” means three parallel-connected delay such as the clock buffer.
cells, so that the load balance between 0 and 180 clock is
preserved. The reference loop provides two differential clocks
spaced by 90 to the main loop. To cover the entire 360 phase D. Main Loop Design
range, clocks from the reference loop are partially inverted and The main loop design is focused on the selection control and
inputted to four sets of VCDL in the main loop. Each main loop delay control of the main loop VCDL to achieve the infinite
VCDL is composed of three delay cells and generates low swing delay range by using four finite-length VCDLs. Fig. 4(a) shows
internal clocks- , , , and . These clocks expe- the conceptual timing diagram of the main loop VCDL selection
rience the analog delay time control by two kinds of four con- control. Assuming clock is selected as , the
786 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 4. Selection control of the main loop VCDL. (a) Conceptual timing
diagram. (b) Block diagram of the control logic.

required delay cells. Furthermore, a larger delay causes a larger


Fig. 3. Reference loop phase detector. (a) Block diagram . (b) Operation at 1 jitter. Intuitively, we apply a single control scheme as shown in
period lock. (c) Operation at 2 period lock. Fig. 5(a), where only the rotates and other clocks remain
fixed in phase space. Thus, clock switching occurs at the quad-
moves in the movable range according to the output of the main rant boundaries. Unfortunately, since the required delay range
loop phase detector. Other clocks remain fixed at the initial is from 90 to 90 , this control scheme consumes the same
phase relationship spaced by 90 . When the rising edge of the number of delay cells per VCDL as those in the reference loop.
coincides with that of (or ) clock, “select up” In order to reduce the number of required delay cells, a differ-
(or “select down”) is generated and then is changed to ential delay control scheme is employed. The differential con-
(or ) clock. Now (or ) clock acts as a new trol means that when the rotates counterclockwise, all
selected clock in a right-shifted (or left-shifted) movable range. other clocks rotate clockwise with their phase relationship fixed.
Thus, clock switching at the quadrant boundaries can be re- If all clocks move with same speed, the required delay range
peated in this manner, to cover the entire phase range. Fig. 4(b) is from 45 to 45 , as shown in Fig. 5(b). However, if the
shows a block diagram of the selection control logic. Since the must rotate in the opposite direction after switching due
passes through the MUX stage, a MUX replica is re- to the delay fluctuation of the reference clock or the clock buffer,
quired for delay matching between the and all internal there is the problem of losing the lock since the delay range of
clocks. Therefore, clock waveforms in Fig. 4(a) are validated. a VCDL was already exhausted. In Fig. 5(c), we adopt a differ-
In the window finder, one inverter–one NAND pair makes the ential delay control with 3 speed difference, where the
window which is bounded by rising edges of two input clocks. moves three times faster than other clocks, so that 3/4 of delay
Thus, four windows are generated. Sampled values of these win- cells in the single delay control case satisfy the required delay
dows by the enable the window finder to find which range, 67.5 to 67.5 . Since 3 speed difference provides a
window the belongs to. If the found window is the “se- shared region in the available delay range of two neighboring
lect up” or “select down” region, UP or DN signal is generated, clocks, seamless clock switching is possible in any direction
respectively. Then, the state decoder updates two-bit selection without losing the lock with three delay cells per VCDL.
code to change the in one clock cycle. Although clock Fig. 6 shows the configuration of the main loop phase de-
switching occurs immediately after the switching event, there is tector, charge pumps, and loop filters. Outputs of the phase de-
the possibility of the small delay difference in the since tector are connected to the charge pump1 (CP1) directly and to
the rising edge of old may have a different time position the charge pump2 (CP2) with inversion. Thus, if the CP1 gen-
with that of new after clock switching. This delay differ- erates an increasing control voltage for a VCDL which gener-
ence can be represented as a switching jitter at the lock state. ates the , the CP2 generates a decreasing control voltage
The delay control of the main loop VCDL should be op- for all other VCDLs. As a result, two substantially identical
timized between two conflicting conditions, delay range and charge pumps are used for the differential delay control scheme.
power consumption. More delay cells mean larger delay range Three times speed difference is implemented by the fact that the
but their power consumption is proportional to the number of CP1 has one loop filter and the CP2 has three loop filters. In
JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP 787

Fig. 5. Delay control of the main loop VCDL. (a) Single control with other clocks fixed. (b) Differential control with same speed. (c) Differential control with
2
3 speed difference.

Fig. 7. Example of the main loop VCDL control procedure.


Fig. 6. Configuration of the main loop phase detector, charge pumps, and loop
filters.

to clock, “01” state. In result, the proposed DLL covers


case of clock switching, the selection code alters the connection the entire phase range and remains at the lock state in any di-
between charge pumps and loop filters. Consequently, charge rection switching by optimizing the control schemes of mul-
redistribution occurs between three loop filters except a loop tiple VCDLs. Therefore, since this architecture makes it pos-
filter for the new . This charge redistribution proceeds sible to emulate the infinite-length VCDL by using multiple fi-
rapidly since two different voltages converge into one value. The nite-length VCDLs, the DLL overcomes the problem of conven-
fast VCDL control voltage change prevents possible dithering tional DLLs, described by the limited delay range and the initial
around the clock switching phase. phase relationship constraint.
Fig. 7 shows one example of the main loop VCDL control
procedure starting at the unlock state. Let us assume the III. LOW JITTER SCHEME AND DCC
should be near 180 in phase space to acquire the lock. Initially,
assuming the selection code is “00,” clock is selected as the A. Low-Jitter Scheme
. The rotates counterclockwise in phase space ac- The jitter performance of the DLL is degraded by various
cording to outputs of the phase detector. All clocks excluding noise sources, typically in the form of supply and substrate noise
the rotates clockwise with one-third speed compared to in high speed and highly integrated circuits. To reduce the jitter,
that of the . Before the delay range of the VCDL gen- the loop bandwidth should be set as high as possible but must
erating the is reached at a limit, the is changed have an upper limit for stability issues. Thus, low-jitter DLL
to clock. Thus, the selection code is “01.” All clocks ex- designs strongly depend on the delay characteristics of a delay
cept the new settle near their original phase positions line with supply-noise injection. In order to design the delay
with -phase space by the charge redistribution of loop fil- line with low supply-noise sensitivity, the replica biasing for
ters. After clock switching, the still moves counterclock- the delay control must be considered in noisy environment. The
wise to be switched to clock. Since this “10” state is near replica biasing circuit, which consists of a half-replica of a dif-
the lock state, the DLL can acquire the lock by a minor delay ferential delay cell and an operational amplifier (op-amp), sets
control. However, let us assume the delay time of the the low swing level of the delay cell to the reference voltage,
must decrease due to the delay fluctuation of the reference clock . In the conventional replica biasing, the tracks the
or the clock buffer. Similarly in the delay increase case, be- supply variation with the same amount. Unfortunately, this is
fore a VCDL delay range is exhausted, the is returned not the optimal solution. The variation of the op-amp gain and
788 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 8. (a) Circuit diagram of a replica biasing. (b) Operation of a reference


voltage generator under supply-noise injection. Fig. 9. (a) Block diagram of the duty cycle corrector [3]. (b) Circuit diagram
of the proposed duty detection stage.

the tail-current source distorts the delay characteristics of the


delay cell. The delay equation of this case is described by The sensitivity of over to process variations should
be analyzed to guarantee a reliable operation. For example, the
sensitivity to the threshold voltage variation of the tran-
(1) sistor can be obtained by (3), shown at the bottom of the
page. In (3), means of transistor . The sen-
where sitivity value is in the order of with a of 100 mV.
delay time of a delay cell; Similar analyses with other process parameters also show that
load capacitance; the predetermined is kept nearly constant under moderate
swing voltage of the delay cell; process variations. This replica biasing circuit is commonly ap-
current of the tail-current source. plied to all VCDLs of the reference loop and the main loop to
For a positive supply variation of , since is positive achieve the low-jitter characteristics through the whole DLL.
and negative, greatly decreases.
In the design depicted in Fig. 8(a), an additional reference B. Duty Cycle Corrector
voltage generator is attached to the replica biasing circuit. The
The duty cycle of clock signals within the DLL deviates from
reference voltage generator is composed of one transistor and
its ideal value of 50% due to various asymmetries in signal paths
two resistors and generates the reference voltage, , in the
and voltage offsets in an off-chip generated reference clock.
nominal supply condition. When there is a supply variation of
For applications in which the timing of both edges of the clock
, the reference voltage generator produces a predetermined
is critical, a duty cycle corrector (DCC) is required to maxi-
variation of , which is a reduced swing compared to ,
mize timing margins. A DCC [3] in Fig. 9(a) is configured as
as shown in Fig. 8(b). The reduced swing compensates the
the error-voltage feedback with a corrector stage and a duty
delay variation due to the aforementioned variations induced by
detection stage. The duty detection stage outputs the differen-
supply noise. Thus, supply-noise sensitivity can be minimized.
tial control voltage ( ), which is proportional to the
For a given supply noise, the desired is a function
duty cycle error of inputted clocks . This differ-
of across transistor as follows:
ential control voltage then effectively introduces offset voltage
to clock inputs at the corrector stage to correct the
(2) duty cycle of output clocks.

(3)
JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP 789

Fig. 12. Selection code waveforms with the refCLK input grounded.

Fig. 10. Simulated mismatch sensitivity characteristics of the DCC with the
proposed duty detection stage.

Fig. 11. Prototype chip layout.

As the clock frequency is increased, tighter bound is placed


(a)
on the performance of the DCC. Even worse, process mis-
matches between transistors work as a serious error factor in the
DCC especially under deep-submicron technology. Although
process mismatches plague all devices, special care must be
paid to the duty detection stage since near-ideal performance
of this stage can remove the duty cycle distortion caused by
the mismatches of all other nonideal blocks. The proposed
duty detection block is based on two stacked source-coupled
pairs configuration, as shown in Fig. 9(b). The source-cou-
pled pair is immune to device mismatches due to its current
steering capability, i.e., since for fairly large input signals, the
source-coupled pair conducts the current set by the tail-current
source through only one branch, various mismatch effects
in transistors can be hidden. The common-mode problem of
this approach is solved by the transistors in boxed area, com-
prising the self-biasing technique [7], which enables the output
common mode to be dynamically adjusted by input clocks. Two
transistors with source and drain tied are added to eliminate the (b)
load imbalance caused by the self-biasing circuit. Fig. 10 shows Fig. 13. Jitter histograms at 400 MHz. (a) Quiet supply. (b) Added 2.5-MHz
the simulated mismatch sensitivity characteristics of the DCC 300-mV square-wave supply noise.
with the proposed duty detection stage over typical process
mismatch parameters, mV m
IV. EXPERIMENTAL RESULTS
and m [8]. Under 50% duty cycle
of the input clock, the duty cycle error is less than 2 ps, which The test chip has been fabricated using a 0.25- m five-metal
guarantees a robust operation against process mismatches. CMOS process. The threshold voltages in this process are
790 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

TABLE I achieves 54-ps peak-to-peak jitter and 0.32-ps/mV jitter


PERFORMANCE CHARACTERISTICS OF THE PROTOTYPE DLL supply-noise sensitivity.

REFERENCES
[1] M. Johnson and E. Hudson, “A variable delay line PLL for CPU-co-
processor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp.
1218–1223, Oct. 1988.
[2] T. Yoshimura, Y. Nakase, N. Watanabe, Y. Morooka, Y. Matsuda, M.
Kumanoya, and H. Hamano, “A delay-locked loop and 90 phase shifter
for 800-Mb/s double data rate memories,” in Symp. VLSI Circuits Dig.
Tech. Papers, June 1998, pp. 66–67.
[3] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and
T. Ishikawa, “A 2.5-V CMOS delay-locked loop for an 18-Mb 500-
Mbyte/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496,
Dec. 1994.
[4] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked
loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997.
[5] K. Minami et al., “A 1-GHz portable digital delay-locked loop with in-
finite phase capture ranges,” in ISSCC Dig. Tech. Papers, Feb. 2000, pp.
350–351.
[6] Y.-J. Jung, S.-W. Lee, D. Shim, W. Kim, C.-H. Kim, and S.-I. Cho, “A
0.57 V (nMOS) and 0.55 V (pMOS). The gate-oxide thick- low-jitter dual-loop DLL using multiple VCDLs with a duty cycle cor-
ness is 5.8 nm. Fig. 11 shows the layout of the prototype chip. rector,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 50–51.
The active area of the DLL occupies 0.13 mm . [7] M. Bazes, “Two novel fully complementary self-biased CMOS differen-
tial amplifiers,” IEEE J. Solid-State Circuits, vol. 26, pp. 165–168, Feb.
Waveforms depicted in Fig. 12 shows two-bit selection code 1991.
with the reference clock input grounded, while running the input [8] M. J. M. Pelgrom, H. P. Tuinhout, and M. Vertregt, “Transistor matching
clock at its nominal frequency of 400 MHz. In this configura- in analog CMOS applications,” in IEDM Dig. Tech. Papers, Dec. 1998,
pp. 915–918.
tion, the main loop phase detector always asserts DN signals.
Therefore, the selection code is continuously updated in accor-
dance with sequences of “00,” “01,” “10,” and “11.” This means
the infinite times rotation of the output clock throughout the full
Yeon-Jae Jung was born in Korea in 1974. He re-
0 –360 range. ceived the B.S. and M.S. degrees from the School
Fig. 13(a) and (b) shows the jitter histograms of the DLL of Electrical Engineering, Seoul National University,
clock output at 400 MHz. Fig. 13(a) shows 6.7 ps RMS and Seoul, Korea, in 1997 and 1999, respectively, where
he is currently working toward the Ph.D. degree.
54 ps peak-to-peak jitter characteristics with a quiet power He has worked on architectures and CMOS circuits
supply. With a 300-mV 2.5-MHz square-wave supply noise, the for high-speed I/O interfaces. His current research in-
peak-to-peak jitter increases to 150 ps, as shown in Fig. 13(b). terests include high-speed CMOS circuits and com-
munication ICs.
The ratio of the peak-to-peak jitter to the RMS jitter is well
maintained in spite of supply-noise injection. Supply-noise
sensitivity is measured to be 0.32 ps/mV.
Table I summarizes the DLL performance characteristics.
The DLL operates from 150- to 600- MHz frequency range
Seung-Wook Lee was born in Seoul, Korea, in 1971.
with a 2.5-V supply. Static phase error between the reference He received the B.S. and M.S. degrees in electronics
clock and the output clock of the DLL is less than 20 ps. engineering from Seoul National University, Seoul,
Operating at 400 MHz, the DLL dissipates 60 mW. Korea, in 1995 and 1997, respectively, where he is
currently working toward the Ph.D. degree in the
School of Electrical Engineering.
His research interests include CMOS RF circuit
V. CONCLUSION design and high-speed communication interfaces.
Mr. Lee is the winner of the Bronze Prize of the
We have described a dual-loop DLL architecture that allows IC design contest held by the Federation of Korean
the unlimited delay range by using multiple VCDLs. The Industries in 1995.
reference loop generates four evenly spaced clocks without
a possible harmonic lock. Clock selection in the main loop
enables the DLL to cover the entire phase range and seamless
clock switching is achieved by optimizing the main loop Daeyun Shim was born in Seoul, Korea, in 1962. He
received the B.S., M.S., and Ph.D. degrees in elec-
VCDL delay range control. Thus, this architecture can emulate tronics engineering from Seoul National University,
the infinite-length VCDL with multiple finite-length VCDLs. Seoul, Korea, in 1985, 1987, and 2000, respectively.
To obtain low supply-noise sensitivity, the low-jitter scheme His Ph.D. dissertation was related to the design of
high-speed locking clock generators.
generates a reduced swing voltage compared to supply noise Since 1987, he has been working on digital video
for the delay compensation of a delay line. Finally, a duty cycle signal processing and ASIC design at Samsung Elec-
corrector presents a high immunity to process mismatches with tronics Corporation. His research interests are video
signal processing and compression, high-speed dig-
the help of two stacked source-coupled pairs configuration. ital circuit design, and high-speed locking systems.
A prototype fabricated using 0.25- m CMOS technology He is currently working on DVD-PRML system design.
JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP 791

Wonchan Kim was born in Seoul, Korea, in Soo-In Cho was born in Seoul, Korea, in 1957. He received the B.S. degree in
1945. He received the B.S. degree in electronics electronics engineering from Seoul National University, Seoul, Korea, in 1979.
engineering from Seoul National University, Seoul, He joined the Semiconductor Research and Development Center, Samsung
Korea, in 1972. He received the Dip.-Ing. and Electronics Company, Ltd., Kyungki-Do, Korea, in 1979, where he was engaged
Dr.-Ing. degrees in electrical engineering from the in the design of CMOS logic LSI. Since 1983, he has been working on MOS
Technische Hochschule Aachen, Aachen, Germany, dynamic memory design.
in 1976 and 1981, respectively.
In 1972, he was with Fairchild Semiconductor
Korea as a Process Engineer. From 1976 to 1982, he
was with the Institut für Theoretische Electrotecnik
RWTH, Aachen. Since 1982, he has been with
the School of Electrical Engineering, Seoul National University, where
he is currently a Professor. His research interests include development of
semiconductor devices and design of analog/digital circuits.

Changhyun Kim (M’85–S’90–M’95) received the


B.S. and M.S. degrees in electronics engineering
from Seoul National University, Seoul, Korea, in
1982 and 1984, respectively and the Ph.D. degree
in electrical engineering from the University of
Michigan, Ann Arbor, in 1994.
In 1984 he joined Samsung Electronics Company,
Ltd. (SEC), Kyungki-do, Korea, where he was in-
volved with the circuit design for high speed dynamic
RAM, ranging from 64 kb to 16 Mb densities. From
1989 until 1994, he was a Research Assistant in the
Center for Integrated Sensors and Circuits, University of Michigan. His present
research interest is in the area of circuit design for low-voltage high-perfor-
mance gigascale DRAMs and future DRAM architecture. He has served as the
Committee Member of the Symposium on VLSI circuits since 1995.
Dr. Kim received the Grand Prize of the Samsung group for the successful
development of 1-Mb and 1-Gb DRAMs in 1986 and 1996, respectively. His
work on the characterization of submicron devices and reliability issues in high-
density DRAM, including reducing soft-error rate and reducing sensitivity to
electrostatic discharge problems, earned a technical achievement award from
the Samsung R&D Center in 1988. At the Center for Integrated Sensors and
Circuits in 1991 and 1993, he won first prizes for design excellence in student
VLSI design contests sponsored by several U.S. companies.
582 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

A Low Jitter 0.3–165 MHz CMOS PLL


Frequency Synthesizer for 3 V/5 V Operation
Howard C. Yang, Lance K. Lee, and Ramon S. Co

Abstract— This paper describes a phase-locked loop (PLL)- In Section II of this paper, the PLL architecture is described
based frequency synthesizer. The voltage-controlled oscillator with an analysis of the loop stability and loop optimization.
(VCO) utilizing a ring of single-ended current-steering amplifiers The circuit design techniques for the PLL are considered in
(CSA) provides low noise, wide operating frequencies, and opera-
tion over a wide range of power supply voltage. A programmable Section III. The measured results are discussed in Section IV.
charge pump circuit automatically configures the loop gain and Finally, conclusions are made in Section V regarding this
optimizes it over the whole frequency range. The measured PLL work.
frequency ranges are 0.3–165 MHz and 0.3–100 MHz at 5 V and 3
V supplies, respectively (the VCO frequency is twice PLL output).
The peak-to-peak jitter is 81 ps (13 ps rms) at 100 MHz. The chip
II. PLL ARCHITECTURE
is fabricated with a standard 0.8-m n-well CMOS process. It is often difficult to design a PLL that can operate over a
Index Terms—CMOS phase-locked loop, current-steering am- wide frequency range due to the practical limit of the capacitor
plifier, current-steering logic, frequency synthesizer, low noise, size that can be integrated for the loop filter. One method to
low voltage VCO. widen the frequency range is to vary the PLL bandwidth as
a function of the desired output frequency. This principle is
I. INTRODUCTION applied in our design by utilizing a current D/A converter
which controls the charge pump current. With this technique,

W ITH the ever-increasing performance and decreasing


price of microprocessors and PC/workstation systems,
much more stringent requirements have been placed on the de-
we can also optimize the loop performance, including damping
factor and loop gain, over the entire operating frequency range.
A block diagram of a conventional frequency synthesizer is
sign of system clock synthesizers. Today’s high-performance enclosed in the dashed-line box in Fig. 1. The output frequency
clock synthesizers are often required: 1) to operate over a wide is synthesized as
frequency range (high frequency for increased performance
and low frequency for power saving) using a crystal oscillator (1)
input with constant frequency; 2) to have small phase jitter Using linear approximations, the loop equations of the PLL
and frequency variation; 3) to operate from 3 V/5 V [2] for stability analysis are
supplies for both portable and desktop systems; 4) to have
smooth transition between high and low frequencies; and (2)
5) to have integrated loop filter on the chip. To satisfy all
these requirements simultaneously, a stable loop over the
entire operating frequency range is needed; and a low jitter, (3)
wide frequency range, and variable supply voltage-controlled
oscillator (VCO) circuit is essential. where is the loop gain, is the damping factor, is the
Several design techniques that improve the performance of a VCO gain, is the charge pump current, is the loop filter
phase-locked loop (PLL) are presented in this paper. A current resistor, and is the integration capacitor of the loop filter. In
D/A converter is implemented to control the PLL bandwidth order to have an adequate margin of stability, (2) and (3) must
so that the loop performance is optimized over the operating satisfy the constraints and , where
frequency range of 0.3–165 MHz. The VCO is formed by is the operating frequency of the phase/frequency detector. The
a ring of single-ended current-steering amplifier (CSA) cells, former constraint is required to prevent aliasing effects as a
which were first introduced as a current-steering logic (CSL) result of the Nyquist Criterion, while the latter constraint is
family for low noise and low power supply applications [1]. required to have a satisfactory transient response. In practice,
This VCO circuit can operate over a wide frequency range with a loop gain which is ten times less than the phase/frequency
low phase jitter at variable power supply. To achieve smooth detector frequency [2] is more than adequate.
frequency transitions, a pulse width limiting circuit is used to For a clock synthesizer with a frequency range of
control the pulse width of the phase/frequency detector output. 0.3–165 MHz, the feedback divider in the loop equations
can vary by more than 20 times. The tolerances of the
Manuscript received June 25, 1996; revised October 4, 1996. loop parameters ( , and ) would also have to
H. C. Yang and L. K. Lee are with the Shanghai Belling Microelectronics be accounted for. These conditions make the constraints for
Manufacturing Co., Ltd., Shanghai 200233, P.R. China. stability margin extremely difficult to satisfy if is kept
R. S. Co is with the Kingston Technology Corp., Fountain Valley, CA 92708
USA. within a reasonable size (a few hundred picofarads) for on-
Publisher Item Identifier S 0018-9200(97)02477-3. chip integration. As shown in Figs. 1 and 2, the proposed
0018–9200/97$10.00  1997 IEEE
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 583

Fig. 1. Block diagram of frequency synthesizer.

loop on the VCO can also be used for reducing the PLL
jitter; however, it is likely to cause glitches or overshoots
whenever the frequency transition mode is activated due to
complicated feedback loops [8]. Hence, it is not a suitable
approach for microprocessor applications, wherein a smooth
transition between frequencies is usually required. While the
second type of VCO rejects power supply noise well, the
frequency range of operation may not be sufficient for some
applications. To widen the frequency range of differential-pair-
based VCO’s, complex MOS resistors [7] can be used at the
cost of higher supply and more complex design.
The VCO design in this work utilizes a simple CSA circuit
[1]. Fig. 3(a) shows a CSA cell which consists of a current
source, , and a pair of NMOS devices. is the input device
Fig. 2. Programmable charge pump with D/A converter control. and is the load. When is high, turns on, sinking
the bias current , while shuts off. Under this condition,
the on resistance of defines the output low voltage, .
PLL architecture uses a current D/A converter to control the When is low, turns off and is steered to .
charge pump current, . The product is optimized Under this condition, the resistance of the diode-connected
for given values of and such that the stability margin defines the output high voltage, . By varying the bias
constraints are satisfied by (2) and (3). Using a decoding current, , a current-controlled CSA-based ring oscillator is
table, the optimization is performed by the logic block in formed with an output voltage swing of
Fig. 1 which sets the charge pump current upon examination
of and . Typically, a loop bandwidth which is ten times
less than for the entire frequency range can be easily (4)
achieved using this architecture.
Equation (4) indicates that (typically between 1 and 2 V)
varies with . Thus, the voltage swing of the CSA cell
III. CIRCUIT DESIGN TECHNIQUES
increases correspondingly with frequency. This is a desirable
feature because the signal level improves at high frequency
A. VCO Circuit when the power supply switching noise becomes worse. Since
Two types of VCO based on the ring oscillator topology the voltage swing is limited by the diode-connected ,
are commonly used in CMOS PLL design: the current-starved the current source always operates in the saturation region;
inverter based VCO [3]–[5] and the differential-pair based consequently, very small switching noise is generated. For an
VCO [6]–[8]. In spite of a wide frequency range, the first n-well process, the PMOS current source can be guarded by its
type of VCO is sensitive to power supply noise. Although own well and isolated from the noisy p-substrate. The current
an on-chip voltage regulator can be used to reduce the effect source also buffers the output from , thereby reducing
of power supply noise [5], it is not effective for operation at the noise injected from to the output. Any ground noise
high frequency since a voltage regulator inherently has poor (coupled from other circuitry within the chip) is rejected by
ac rejection. Another drawback of using an on-chip regulator the CSA as a common mode noise because both its output
is that it reduces the useful power supply range, making and input are referred to the same ground. By referring the
it undesirable for low-supply applications. A local feedback charge pump, loop filter, V/I converter, VCO, and other analog
584 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

(a)

Fig. 5. Histogram of PLL period jitter at 100 MHz.

supply for proper operation. At V, the frequency


of the CSA ring oscillator is practically independent of power
(b)
supply. The maximum useful frequency of the VCO is limited
by the saturation voltage ( 0.5 V) of the cascoded PMOS
Fig. 3. (a) Current steering amplifier (CSA) cell. (b) VCO using a three-stage
CSA ring oscillator.
current source in the charge-pump circuit.
The measured VCO performance is illustrated in Fig. 4. The
frequency range of the VCO was observed from 174 kHz to
378.8 MHz at 5 V supply (the VCO frequency is twice the
PLL output frequency). At V, the VCO frequency
achieved 200 MHz indicating that the PLL can operate at
100 MHz for 3 V power supply.

B. Phase/Frequency Detector
A modified dead-zone free, phase/frequency detector (PFD)
is used in this design [2]. In the frequency transition mode,
a pulse width limiting circuit in the PFD limits the pulse
width of the UP and DN signals. Such limited UP/DN pulses
provide a finite amount of charge to the integration capacitor
Fig. 4. Measured VCO performance.
of the loop filter in order to slow down the frequency ramp
of the VCO. The UP/DN pulses, however, cannot be made
circuits in the PLL to the same ground, i.e., p-substrate, the arbitrarily narrow. The noise level in the chip may dominate
ground noise can be substantially rejected [9]. over the minute UP/DN correction pulses causing the PLL
Fig. 3(b) shows the VCO circuit using a three-stage CSA not to properly acquire in frequency. The maximum frequency
ring oscillator. The current sources in the CSA ring oscillator transition rate of less than 0.1%, i.e., the difference in period
are cascoded with high-swing bias. In the V/I converter, between two consecutive clock cycles, is achieved by using
a single degenerated stage provides a first-order linear this technique.
relationship between the oscillation frequency and the control
voltage. is forced in the linear region to provide the high- C. Charge Pump and Loop Filter
swing cascoded bias for the VCO. This VCO is suitable for The charge pump shown in Fig. 2 is designed using cas-
low supply voltage operation since it only needs about 2.5 V coded current sources with CMOS switches. The amount
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997 585

Fig. 6. Measured PLL output at 155 MHz after 5 ms delay, i.e., after 775 000 clock cycles. The rms jitter (standard deviation) is 28 ps as shown.

Fig. 7. Measured frequency transition from 33.3 to 100 MHz.

of the current, typically 10–150 A, is controlled by the frequency divided by two, a PLL output with 50% duty cycle
current digital-to-analog converter (DAC). A bandgap current is guaranteed. Fig. 5 shows the histogram of PLL period
reference circuit is used to compensate the variation over jitter (also referred to as short-term or clock-to-clock jitter)
temperature. The opamp in the charge pump circuit reduces the at 100 MHz after 42 321 hits using a standard 14.318 MHz
transients caused by the charge transfer [4] as is switched. crystal as the input reference frequency. The peak-to-peak jitter
The capacitors in the loop filter are formed by NMOS devices is 81 ps with an rms jitter of 13 ps. The long-term jitter of the
with the sources and drains connected to ground and the gate
PLL was also measured. An rms jitter of 28 ps is observed
connected to the filter output node. The capacitor C2 is about
at 155 MHz with 5 ms delay, i.e., measured after a delay of
400 pF.
775 000 clock cycles from the triggered clock cycle, as shown
in Fig. 6. In order to appreciate the noise rejection capability
IV. MEASURED RESULTS of this PLL, the chip was used to generate the clock for an HP
The measured results show that the PLL operates from 0.3 laser printer. During the printing process, which is very noisy
to 165 MHz and 0.3 to 100 MHz at 5 V and 3 V power electrically and thermally, both the period jitter and the long-
supplies, respectively. Since the output frequency is the VCO term jitter of the clock must be low for good printing quality. A
586 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

TABLE I element in the design that provides all these features is the
SUMMARY OF MEASURED PLL PERFORMANCE CSA-based VCO circuit. A programmable current DAC is also
Frequency Range 0.3–165 MHz used to optimize the loop gain of the PLL. Smooth frequency
Period (Short-Term) Jitter at 100 MHz 13 ps rms, 81 ps peak-to-peak transition is realized by using a modified PFD with a pulse
Long-Term Jitter at 155 MHz 28 ps rms after 5 ms delay
width limiting circuit. The chip is implemented in a standard
Supply Voltage Range 2.5 V to 7 V
Output Duty Cycle 50%, VCO frequency is twice 0.8- m CMOS process.
PLL output
VCO Linearity 2% from 10–200 MHz REFERENCES
VCO Current Consumption 500 A at 200 MHz
Crosstalk between 2 PLL’s 50 dB down [1] D. J. Allstot, G. Liang, and H. C. Yang, “Current-mode logic techniques
for CMOS mixed-mode ASIC’s,” in Proc. IEEE Custom Integrated
Circuits Conf., 1991, pp. 25.2.1–25.2.4.
[2] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun.,
comparison of test results between the use of this PLL and the vol. 28, pp. 1849–1858, Nov. 1980.
original crystal clock showed no perceptible difference even [3] R. Shariatdoust, K. Nagaraj, M. Saniski, and J. Plany, “A low jitter 5
MHz to 180 MHz clock synthesizer for video graphics,” in Proc. IEEE
under high magnification. Fig. 7 shows a smooth frequency Custom Integrated Circuits Conf., 1992, pp. 24.2.1–25.2.5.
transition of the PLL from 33 to 100 MHz. No frequency [4] M. G. Johnson and E. L. Hudson, “A variable delay line PLL for CPU-
glitches and overshoots were observed during the transition coprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp.
1218–1223, Oct. 1988.
time. Since there are two independent PLL’s on the same [5] K. M. Ware, H.-S. Lee, and C. G. Sodini, “A 200-MHz CMOS phase-
chip, crosstalk between the two PLL’s was also measured. locked loop with dual phase detectors,” IEEE J. Solid-State Circuits,
The signal coupling between the two PLL’s is at least 50 dB vol. 24, pp. 1560–1568, Dec. 1989.
[6] B. Kim, D. N. Helman, and P. R. Gray, “A 30-MHz hybrid analog/digital
down. The measured performance of the PLL is summarized clock recovery circuit in 2-m CMOS,” IEEE J. Solid-State Circuits,
in Table I. vol. 25, pp. 1385–1394, Oct. 1990.
[7] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator
with 5 to 110 MHz of lock range for microprocessors,” IEEE J. Solid-
V. CONCLUSION State Circuits, vol. 27, pp. 1599–1606, Nov. 1992.
[8] D. Mijuskovic et al., “Cell-based fully integrated CMOS frequency
In this paper, we demonstrated the design of a fully in- synthesizers,” IEEE J. Solid-State Circuits, vol. 29, pp. 271–279, Mar.
tegrated CMOS PLL circuit that achieves wide operating 1994.
[9] D. J. Allstot and W. C. Black Jr., “A substrate-referenced data-
frequency range and low jitters (both short-term and long- conversion architecture,” IEEE Trans. Circuits Syst., vol. 38, pp.
term) over a wide range of power supply voltage. The key 1212–1217, Oct. 1991.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999 513

A Low-Jitter PLL Clock Generator for


Microprocessors with Lock Range of 340–612 MHz
David W. Boerstler

Abstract— A fully integrated, phase-locked loop (PLL) clock II. INTRODUCTION


generator/phase aligner for the POWER3 microprocessor has
been designed using a 2.5-V, 0.40-m digital CMOS6S process. This paper describes a fully integrated PLL-based clock
The PLL design supports multiple integer and noninteger fre- generator/phase aligner used for the POWER3 microprocessor.
quency multiplication factors for both the processor clock and The microprocessor is fabricated in IBM CMOS6S technology
an L2 cache clock. The fully differential delay-interpolating and contains approximately 12 million transistors. With the mi-
voltage-controlled oscillator (VCO) is tunable over a frequency croprocessor actively executing instructions, this PLL achieved
range determined by programmable frequency limit settings,
enhancing yield and application flexibility. PLL lock range for cycle–cycle jitter of 10.0 ps rms, 80 ps P-P in its application
the maximum VCO frequency range settings is 340–612 MHz. environment and 8.4 ps rms, 62 ps P-P with the microprocessor
The charge-pump current is programmable for additional control in a reset state with a portion of the clock tree active.
of the PLL loop dynamics. A differential on-chip loop filter with A simplified block diagram of the PLL clock generator is
common-mode correction improves noise rejection. Cycle–cycle shown in Fig. 1. The external reference or BUSCLK enters
jitter measurements with the microprocessor actively executing
instructions were 10.0 ps rms, 80 ps peak to peak (P-P) measured a receiver and is divided by two by divider stage before
from the clock tree. Cycle-cycle jitter measured for the processor entering the phase/frequency detector (PFD) as The in-
in a reset state with the clock tree active was 8.4 ps rms, 62 ps P-P. ternal feedback signal from divider is compared to
PLL area is 1040 2 640 m2 . Power dissipation is <100 mW. by the PFD, which generates an error signal , which
Index Terms— Clock generator, clocking, microprocessors, is used by the charge-pump and filter network to control
phase-locked loop (PLL). the voltage-controlled oscillator (VCO). The output frequency
of the VCO is divided by and is used as the main
processor clock (PCLK) after passing through four levels of
I. BACKGROUND
clock buffering in an H-tree clock distribution network. The

T HE use of phase-locked loops (PLL) for generating


phase-synchronous, frequency-multiplied clocks in mi-
croprocessors has been prevalent in industry [1]–[4]. In recent
processor clock is passed through a delay-matching receiver
before entering divider completing the feedback path.
Since at equilibrium the inputs of the PFD will be matched
years, the trend toward ever increasing clock frequency has in frequency (and phase), the processor-to-bus frequency ratio
made PLL’s even more attractive due to the difficulties in is equal to the ratio which is equal to the ratio
distributing high-frequency clocks through several levels of allowing integer or noninteger frequency synthesis by
packaging [5], [6], but the jitter penalty for using a PLL has changing divider ratios. Since the technique does not require
not kept pace with the rate of reduction in processor cycle clock choppers [2], the duty cycle and phase alignment are
time. Until this year,1 the best reported microprocessor PLL relatively insensitive to environment and process tolerances.
jitter penalties ranged from 82 to 83 ps peak to peak (P-P) The output of the VCO is also connected to frequency divider
for inactive processors [1], [5], and a PLL on a small (600-K which is used for the L2 cache clock (L2CLK). Since
transistor) graphics display chip has been reported with 80 ps the processor-to-L2
P-P jitter for a quiet supply at 320 MHz [7]. Many examples clock-frequency ratio is also adjustable to integer or noninteger
of higher jitter PLL designs exist in the literature. Power- ratios. Other phase-synchronous clocks may be designed in
supply noise created from the digital switching activity on a similar fashion, and quadrature or interstitial clocks may be
microprocessor is recognized as a major source of PLL jitter, created by a polarity change at the divider input. Using the
and the primary focus of designers has been directed toward structure of Fig. 1, the VCO frequency is equal to
reducing this sensitivity. times the processor clock frequency For cases when
is even, the processor clock edges are generated from only
one VCO clock edge; hence a nearly ideal 50% processor
clock duty cycle may be achieved through its independence
Manuscript received December 10, 1997; revised August 10, 1998.
The author is with the IBM Research Division, IBM Austin Research
from the VCO duty cycle.
Laboratory, Austin, TX 78758 USA.
Publisher Item Identifier S 0018-9200(99)02429-4.
1 Recent announcements of a 1-GHz microprocessor PLL [12], [13] and III. PROCESS TECHNOLOGY
a PLL with an on-chip regulator [14] reported jitter of <6 9 ps (quiet
<6
conditions)/ 36 ps (processor active) and <6 10 ps (sinusoidal external The microprocessor and integral clock generator PLL are
<6
noise)/ 20 ps (square wave external noise), respectively. fabricated in a five-layer CMOS process with 0.4- m feature
0018–9200/99$10.00  1999 IEEE
514 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 1. PLL block diagram.

TABLE I A rising edge first appearing on similarly asserts DOWN


CMOS6S PROCESS SUMMARY until a rising edge arrives at followed by a subsequent
reset. Complementary outputs are generated by the PFD for
use in the differential charge-pump stage that follows the
PFD. The pulse width of the output varies proportionally with
the phase error between the two inputs, except for the dead-
zone region as the difference approaches zero. This dead zone
exists when the phase error becomes small relative to the
combined response time of the PFD, charge pump, and filter
circuits. Circuit simulation results show a nominal dead zone
of 25 ps. Concerns of current mismatch in the charge-pump
and filter networks are reduced at the expense of increased
dead zone by preventing simultaneous assertions of UP and
DOWN

sizes. Table I lists some of the relevant attributes of this B. Power-Supply Isolation
process technology.
A separate analog power connection (AVDD) is used for
The PLL clock generator is shown in the microprocessor
the analog circuits [current reference, charge pump, common-
die photograph of Fig. 2(a). The dimensions of the entire PLL
mode rejection (CMR), filter initialization, and VCO circuits]
are 1040 640 m . It is shown with the major features
to increase the isolation of the sensitive circuits from the logic-
identified in Fig. 2(b).
induced switching noise present on the main power supply.
To allow the detection of potential defects using conventional
IV. PLL CLOCK GENERATOR COMPONENTS testing, the AVDD pin is held low, disabling the analog devices
that normally draw dc current. Both on-chip and on-module
A. Phase/Frequency Detector decoupling is used on AVDD.
The digital PFD generates a signal that conveys relative
phase and frequency error information about its inputs to the C. Reference Circuit
charge pump and filter. The PFD design is based on a three- A thermal voltage-referenced current source is used to
state machine structure [8], as depicted in Fig. 3(a). From the provide temperature- and supply-independent biasing for the
initial reset state, a rising edge on the input will assert analog circuits in the PLL. The circuit contains an array of P
the UP output until the rising edge of appears, which diffusions in the N-well connected to form two forward-biased
deasserts UP and forces a reset of both flip-flops [Fig. 3(b)]. diodes with areas that differ by a factor of ten. When connected
BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR 515

(a)

(b)
(a) Fig. 3. (a) PFD state diagram. (b) Phase detector implementation.

(b) Fig. 4. Reference circuit.


Fig. 2. (a) Die photograph of POWER3 microprocessor. (b) PLL layout.
D. Process and Temperature Compensation
as shown in Fig. 4, the current through each leg has two Variations in due to process are monitored using the
stable operating points, A circuit shown in Fig. 5(a). All of the current sources are gener-
or The startup circuit prevents the zero current state ated directly from the reference circuit current A constant
from occurring by injecting current into one leg during initial current is passed through a branch containing short-channel
power-on. The resistor is implemented using the precision NMOS devices, creating a monitoring voltage , which
resistor available in the process, which has a temperature is sensitive to NMOS device length variations. This voltage
coefficient (TC) of 2000 ppm/ C. The positive TC’s of the is compared to a reference voltage generated by a
thermal voltage term and the resistor tend to cancel, providing constant current through a long-channel NMOS device that is
a reference current TC of 785 ppm/ C at 85 C. The reference relatively insensitive to length variations. The devices and bias
current is used for subsequent generation of reference currents used for length sensing are sized so that and
currents and the PMOS bias voltage through mirroring. are equivalent for a nominal process. To minimize
Sensitivity to power-supply change is 1.7%/V for 20% temperature sensitivity, the bias currents correspond to the
change on VDD. zero-temperature coefficient (0-TC) region of the devices.
516 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 6. Charge-pump circuit.

(a)
invariant with temperature. The current in one leg of the
differential amplifier varies proportionally with temperature
and is mirrored and added to the summing junction of the
resistor A constant bias current is also added to the
summing junction to establish the correct weighting of the
various compensating currents and to correct for the TC of
the summing junction resistor.
Using a statistical process model, the process compensation
was designed to favor the stabilization of the “best case” side
of the distribution over the “worst case” side in anticipation
of future process trends. Given the limited range over which
a circuit may be practically compensated, the performance for
the “best case” devices was not sacrificed at the expense of
extensive compensation of the poorest performing devices. For
the unsorted population, this approach allowed a reduction in
the sensitivity of the VCO to process variability by a factor of
3.6 (55.4–15.2%) over the uncompensated VCO; temperature
sensitivity was reduced by a factor of 4.7 (38.6–8.2%).
(b)
Fig. 5. (a) Process compensation circuit. (b) Temperature compensation
circuit. E. Charge Pump
The reference circuit is used to generate the currents
The two voltages are compared using a differential amplifier, and for use within the charge pump. The peak charge-
which generates a current proportional to the NMOS pump current may be adjusted in 30- A increments from 30
offset from nominal. This current is mirrored to produce a to 240 A by scaling the mirror currents as shown in Fig. 6.
current that is injected into a precision resistor The error signals and generated by the PFD are used
used for combining various process monitors to generate a to switch the peak current selected. Adjusting the charge pump
compensating reference voltage. The compensating reference allows for optimization of the loop characteristics for different
voltage is connected to the active load elements of the VCO, divider and VCO settings. Differential outputs P and P are
which control the VCO’s voltage swing. A current included for high CMR in the subsequent analog circuits.
generated from a similar PMOS circuit also is injected into
the resistor. F. Loop Filter
Weighted combinations of standard bias circuits with dif- The differential loop filter and initialization circuits are
fering voltage and temperature coefficients have been used shown in Fig. 7. Currents to and from the charge-pump circuit
previously to compensate reference circuits for VCO’s [9]. enter the filter at nodes P, P. The input to the filter
In this case, however, temperature was monitored directly contains NMOS transmission-gate clamping devices to limit
by comparing the voltage of two series-connected devices the maximum filter voltage to where is
biased by current below their 0-TC operating point to the NMOS threshold voltage for a large source-bulk voltage.
the voltage of two parallel devices biased by current For the CMOS6S process, the clamps prevent the filter voltage
significantly above their effective 0-TC point [Fig. 5(b)]. The from exceeding approximately 1.8 V, eliminating concern for
devices and bias currents are sized so that both branches of the VCO input stage’s shutting off. The filter capacitors are
the differential amplifier are balanced at for nominal accumulation-mode gate-oxide devices, and are interleaved
temperature conditions. The inset shows the I–V character- to improve the matching. Both loop-filter capacitors together
istics as a function of temperature for the series (subscript occupy an area of approximately 865 280 m and are
2) and parallel (subscript 1) connected devices; the 0-TC approximately 450 pF each. Precision resistors (1.2 K each)
points correspond to the crossing point where the current is are used to produce a zero in the filter transfer function.
BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR 517

Fig. 9. Voltage-controlled oscillator.

Fig. 7. Loop filter and filter initialization.

(a)

Fig. 8. Common-mode control.

The filter output is connected to the VCO control input at


nodes An initialization circuit activated during the
initial system power-on-reset is used to precharge the filter
capacitors to the nominal common-mode voltages at nodes
(b)
Fig. 10. (a) Delay element. (b) Mixer circuit.
G. Common-Mode Control
It is possible for common-mode voltages to develop in
the filter from leakage, drift, or device mismatch. Since the H. Voltage-Controlled Oscillator
common-mode voltage can introduce frequency offsets in the The VCO design is based upon a delay-interpolating ring
VCO or even inhibit operation for extreme cases, the circuit oscillator structure [9]–[11], as shown in Fig. 9. In contrast
shown in Fig. 8 was used in conjunction with the filter clamps to the current-starved and current-modulated VCO’s, which
described earlier. The common-mode voltage of the filter are very commonly used for microprocessor clock generators,
is sensed by generating currents proportional to and delay-interpolating VCO’s have relatively low-to-moderate
and summing them across a load device to produce VCO gains and are well suited to fully differential control
A differential amplifier compares to a reference and signal path circuit implementations. The lower VCO
voltage and generates a current , which is proportional to gain of the delay-interpolating VCO’s produces significantly
the common-mode voltage. The current is mirrored by two less jitter due to coupled noise than higher gain structures.
identical current sources, which bleed current from both filter The limited operating frequency range for delay-interpolating
capacitors simultaneously without affecting the differential VCO’s, which must be less than 2 : 1 to ensure monotonicity,
voltage between them. The maximum drain currents for this may be effectively augmented by selecting suitable divider
structure, which corresponds to the case when both clamps ratios or by adding programmability to the VCO signal paths.
have activated, are approximately 16 A. For typical cases The frequency limits of the VCO are determined by the
where the common-mode voltage is below 600 mV, the bleed longest and shortest path delays through the structure. Fig. 9
currents are 1 A. Stability of the network is assured by shows an example high-frequency limit of period composed
heavy dominant-pole compensation. of three delay units and one mixer unit, and a low-frequency
518 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

(a)

(b)
Fig. 11. Cycle–cycle processor clock jitter: (a) quiet processor and (b) active processor.

limit of period composed of six delay units and one mixer the common-mode control circuit. Nominal VCO gain for the
unit. These frequency limits also affect the VCO gain (for settings that produce the maximum VCO range is 185 MHz/V.
a given mixer design) as well as the center frequency. The The worst case VCO power dissipation is 30 mW.
frequency limits may be independently controlled using the
multiplexers shown in Fig. 9, allowing flexible control of the I. Dividers and Receivers
VCO operating range and greater than ten-to-one adjustment
Dividers and (Fig. 1) may be individually
range for VCO gain.
programmed and support division by 2, 3, 4, 5, 6, 8, or 10.
The delay elements and mixer designs are based upon The dividers are placed in pairs within the layout to improve
PMOS source-coupled pair differential amplifiers with NMOS device matching between and and between and
load networks [Fig. 10(a) and (b)] which allow voltage- The receivers shown in Fig. 1 are also placed together
controlled swing adjustment through effective load-line and are located near the I/O pad for BUSCLK.
translation by adjusting the voltage The high impedance
provided by the current source improves the supply noise
rejection for the source-coupled pair, and the N-well improves V. PLL MEASUREMENTS
the isolation to the p bulk substrate noise. The variation of The damping factor, loop gain, and natural frequency of
the threshold voltage due to bulk effect is eliminated using the PLL may be adjusted over a wide range to match the
bulk-to-source biasing throughout the structure. Sensitivity of application by changing the charge-pump and VCO gain as de-
the VCO to low repetition rate, 100-mV steps on VDD and scribed above. System testing was conducted with 90-A peak
AVDD is 0.418 ps/mV. Center-frequency common-mode volt- charge-pump current using the maximum frequency and range
age sensitivity is 3.5% over the full input range dictated by on the VCO with a variety of divider settings and BUSCLK
BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR 519

frequencies. The processor clock was accessed from the clock divider implementation, R. Kodali for circuit simulation and
tree through a series of inverters. A time-interval measurement specification, D. Woeste and J. Strom for the divider and lock
(TIM) system was used to measure cycle–cycle period jitter detector circuits, and S. Dhong and M. Papermaster for their
statistics for a number of packaged die representing various continuous support of this work.
process skews. The processor was operated using an array
initialization program loop with the fixed-point and floating- REFERENCES
point processors active for the “active” processor tests, and [1] I. Young, M. Mar, and B. Bhushan, “A 0.35 m CMOS 3-880 MHz
was also operated in a “quiet” mode reset state. All tests PLL N /2 clock multiplier and distribution network with low jitter for
were performed at room temperature with ambient forced- microprocessors,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 330–331.
[2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, “A wide-
air cooling. Conventional first-cycle oscilloscope-based jitter bandwidth low-voltage PLL for powerPC microprocessors,” IEEE J.
measurements were performed periodically and provided P- Solid-State Circuits, vol. 30, pp. 383–391, Apr. 1995.
P jitter results that were consistent with those measured on [3] J. Cho, “Digitally-controlled PLL with pulse width detection mechanism
for error correction,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp.
the TIM system. The external clock was provided by a high- 334–335.
frequency pulse generator, with 7.3 ps rms, 36 ps P-P jitter. [4] I. Young, J. Greason, and K. Wong, “A PLL clock generator with 5–110
Fig. 11(a) shows a histogram of cycle–cycle period mea- MHz of lock range for microprocessors,” IEEE J. Solid-State Circuits,
vol. 27, pp. 1599–1607, Nov. 1992.
surements taken with the processor in an inactive reset state [5] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, “A 320 MHz,
but with the clock tree active. The frequencies of the reference 1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,”
clock, processor clock, and VCO are 85, 170, and 340 MHz, in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 132–133.
[6] P. E. Gronowski, P. Bannon, M. Bertone, R. Blake-Campos, G.
respectively, which corresponds to a 3-dB loop bandwidth of Bouchard, W. Bowhill, D. Carlson, R. Castelino, D. Donchin, R.
2 MHz. The distribution of samples in the histogram follows Fromm, M. Gowan, A. Jain, B. Loughlin, S. Mehta, J. Meyer, R.
Mueller, A. Olesin, T. Pham, R. Preston, and P. Rubinfeld, “A 433
a Gaussian distribution with period jitter of 8.4 ps rms, 62 ps MHz 64b quad-issue RISC microprocessor,” in ISSCC Dig. Tech.
P-P. The minimum period measured for this sample size Papers and Slide Supplement, Feb. 1996, pp. 222–223.
was 26.2 ps less than the mean (3.1 sigma away). [7] Z. Zhang, H. Du, and M. Lee, “A 360 MHz 3V CMOS PLL with 1
V peak-to-peak power supply noise tolerance,” in ISSCC Dig. Tech.
Assuming that cycle-time failures only occur on the minimum Papers, Feb. 1996, pp. 134–135.
period side, the worst case clock jitter penalty for this system [8] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs,
(i.e., a “quiet” processor) is 26.2 ps at 3.1 sigma confidence NJ: Prentice-Hall, 1991, pp. 59–61.
[9] J. F. Ewen, A. Widmer, M. Soyuer, K. Wrenner, B. Parker, and H.
(or 25.2 ps penalty at 3.0 sigma). Since a peak-to-peak jitter Ainspan, “Single-chip 1062 Mbaud CMOS transceiver for serial data
approximately equal to the PFD dead zone can exist for the communication,” in ISSCC Dig. Tech. Papers, Feb. 1995, pp. 32–33.
PLL, the 25 ps simulated value for the dead zone may be a [10] B. Lai and R. Walker, “A monolithic 622 Mb/s clock extraction and data
retiming circuit,” in ISSCC Dig. Tech. Papers, Feb. 1991, pp. 144–145.
significant component of the measured jitter. [11] S. K. Enam and A. Abidi, “NMOS IC’s for clock and data regenera-
Fig. 11(b) shows a clock-jitter histogram for the processor tion in gigabit-per-second optical fiber receivers,” IEEE J. Solid-State
Circuits, vol. 27, pp. 1763–1774, Dec. 1992.
executing the array initialization routine for a large population [12] D. W. Boerstler and K. Jenkins, “A phase-locked loop clock generator
A Gaussian curve has been superimposed on for a 1 GHz microprocessor,” in Symp. VLSI Circuits Dig. Tech. Papers,
the histogram for comparison purposes. The frequencies of June 1998, pp. 212–213.
[13] J. Silberman, N. Aoki, D. Boerstler, J. Burns, S. Dhong, A. Essbaum, U.
the reference clock, processor clock, and VCO are 90, 180, Ghoshal, D. Heidel, P. Hofstee, K. Lee, D. Meltzer, H. Ngo, K. Nowka,
and 360 MHz, respectively. For this system (i.e., an “active” S. Posluszny, O. Takahashi, I. Vo, and B. Zoric, “A 1.0 GHz single-
processor), the period jitter has increased to 10.0 ps rms, 80 issue 64b PowerPC integer processor,” in ISSCC Dig. Tech. Papers,
Feb. 1998, pp. 230–231.
ps P-P, and the worst case clock-jitter penalty is 37.1 ps at 3.7 [14] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, “A 600
sigma confidence (or 30.1 ps at 3.0 sigma). The effective noise MHz CMOS PLL microprocessor clock generator with a 1.2 GHz
penalty for running the array initialization routine is 4.9 ps at VCO,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 396–397.
3.0 sigma.

VI. CONCLUSION David W. Boerstler received the B.S. degree


in electrical engineering from the University of
This work demonstrates the viability of a low-jitter PLL Cincinnati, Cincinnati, OH, in 1978 and the M.S.
design approach amenable to high-speed microprocessors. degree in computer engineering and in electrical
engineering from Syracuse University, Syracuse,
Measured jitter for the design was 8.4 ps rms, 62 ps P-P NY, in 1981 and 1985, respectively.
for quiet conditions and 10.0 ps rms, 80 ps P-P for the Since joining IBM in 1978, he has held a
processor active. A tunable, moderate-gain VCO with active variety of assignments, including the design of
high-frequency PLL’s for clock generation and
process and temperature compensation provides high power- recovery, fiber-optic transceiver and system design,
supply rejection and low sensitivity to temperature and process and other analog, digital, and mixed-signal bipolar
variability. A differential design approach maintains noise and CMOS circuit development projects. He currently is a Research Staff
Member with the High-Performance VLSI group at the IBM Austin
immunity in both control and signal paths within the analog Research Laboratory, Austin, TX. His current research interests include
portions of the PLL. high-frequency synchronization techniques and signaling approaches for
high-speed interconnect.
Mr. Boerstler has received IBM Outstanding Technical Achievement
ACKNOWLEDGMENT Awards for his work on the design of the serializer/deserializer for the
ESCON fiber-optic channel products and for the clock-generator design of
The author wishes to thank J. Peter for layout of the PLL, IBM’s 1-GHz PowerPC microprocessor prototype. He has received seven
N. James and H. Casal for the hardware characterization and IBM Invention Achievement Awards.
726 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

A Low-Jitter Wide-Range Skew-Calibrated


Dual-Loop DLL Using Antifuse Circuitry
for High-Speed DRAM
Se Jun Kim, Sang Hoon Hong, Jae-Kyung Wee, Joo Hwan Cho, Pil Soo Lee, Jin Hong Ahn, and Jin Yong Chung

Abstract—This paper describes a delay-locked loop (DLL) cir- of DLLs have been developed [3]–[6]. However, such DLLs
cuit having two advancements, a dual-loop operation for a wide resulted in complex architectures that faced such problems as
lock range and programmable replica delays using antifuse cir- increased area, added power consumption, and degradation of
cuitry and internal voltage generator for a post-package skew cali-
bration. The dual-loop operation uses information from the initial jitter performance.
time difference between reference clock and internal clock to select For these issues, a novel dual-loop architecture, which in-
one of the differential internal loops. This increases the lock range creases the lock range having no degradation of jitter perfor-
of the DLL to the lower frequency. In addition, incorporation of mance with a relatively small overhead in area and power, is
the programmable replica delay using antifuse circuitry and the proposed in this paper. Another enhancement in the proposed
internal voltage generator allows for the elimination of skews be-
tween external clock and internal clock that occur from on-chip DLL is the post-package skew calibration. Process variations in
and off-chip variations after the package process. The proposed on-chip and trivial mismatches in off-chip parameters can result
DLL, fabricated on 0.16- m DRAM process, operates over the in a large static skew in addition to the phase offset of the phase
wide range of 42–400 MHz with 2.3-V power supply. The measured detector. In the proposed DLL, an improved scheme using anti-
results show 43-ps peak-to-peak jitter and 4.71-ps rms jitter con- fuse circuitry is applied for reducing the skew. It enables a prac-
suming 52 mW at 400 MHz.
tical calibration of inevitable skews after the package process.
Index Terms—Delay-locked loop, dual-loop operation, This paper is arranged as follows. The limited range
high-speed DRAM, programmable replica delay, skew calibration.
problem of the conventional DLL is described in Section II.
In Section III, the concept of the proposed dual loop for wide
I. INTRODUCTION locking range is briefly explained, followed by presentation
of the architecture and physical implementation based on the
T HE DELAY-LOCKED loop (DLL) has become an indis-
pensable component in high-speed synchronous DRAMs
such as DDR SDRAM. Since the DLL determines the opera-
concept. The skew calibration method using antifuse circuitry
is described in Section IV. Section V discusses the fabricated
tion range of the DRAM and has a large effect on the data valid chip and shows the experimental results. Finally, the paper is
window, a high-performance DLL that has a wider range and concluded in Section VI.
lower jitter is essential for increasing the speed of DRAM. A
DLL can be categorized into either of two types, the digital II. LIMITED RANGE PROBLEM OF CONVENTIONAL DLL
and the analog type. Although the digital DLL has robustness,
Fig. 1(a) shows the architecture of the conventional analog
process portability, and design simplicity, it is difficult to use on
DLL and the delay characteristic of the voltage-controlled
a very high-bandwidth DRAM (over 600 Mb/s) due to poor jitter
delay line (VCDL). When (minimum delay
performance [1], [2]. Therefore, in spite of sensitivity on process
of VCDL) (maximum delay of VCDL), the
variation, the analog DLL, which ensures lower jitter by the con-
range of (operation frequency of DLL) is determined by
tinuous characteristics of analog operation, is more suitable in
(control voltage of loop filter) at the initial state. When
the higher speed DRAM. In addition to the jitter performance,
(minimum control voltage of loop filter) at the
another important issue of the DLL is the lock range. Process
initial state and (the cycle time
variation makes the lock range of the analog DLL more limited
of reference clock), the lock failure occurs because the phase
and results in a narrower operation range of the DRAM. The
detector produces a DN pulse which discharges the capacitor in
limited range of the DLL limits the flexibility of implementation
the loop filter, as shown in Fig. 1(b). Therefore, in this case, it
on memory applications and increases test costs in mass produc-
must be at the initial state for sat-
tion. For solving the limited lock-range problems, various types
isfying the condition without lock failure. Therefore, the range
of is . In the
Manuscript received October 2, 2001; revised January 29, 2002. other case, when (maximum control voltage of
S. J. Kim, S. H. Hong, J. H. Cho, P. S. Lee, J. H. Ahn, and J. Y. Chung
are with the Advanced Design Team, Memory Research and Development, loop filter) at the initial state and ,
Hynix Semiconductor Inc., Ichon-si, Kyoungki-Do 467-701, Korea (e-mail: the lock failure occurs because of the UP pulse of the phase
sejun.kim@hynix.com). detector shown in Fig. 1(c). In this case, the range of
J.-K. Wee is with the Department of Electronics Engineering, Hallym Uni-
versity, Chunchun-si, Kangwon-Do 200-702, Korea. is when
Publisher Item Identifier S 0018-9200(02)04934-X. the initial is . For utilizing the full range of
0018-9200/02$17.00 © 2002 IEEE
KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL 727

(a)

(b)

(c)
Fig. 1. (a) Block diagram and delay characteristic of conventional DLL. Cases of lock failure at initial control voltage: (b) initial V =V and (c) initial
V =V .

without the lock failures as in Fig. 1(b) and (c), the since it is stuck/harmonic lock free and the delay cell
initial must be set at a level such that the initial has a fast slew-rate that produces less phase noise [7]. But in
is approximately . reality, is very sensitive to process, voltage, and
In this condition, the range of is determined as temperature (PVT) variation. As a result, designing
. But this method to be in the target range becomes more careful and difficult
can cause stuck/harmonic lock and makes the jitter perfor- work as the operation frequency becomes higher. Therefore,
mance worse. Therefore, if the range of is desired at considering the PVT variation, the range of becomes
the higher frequency range, the initial should be set to more limited with the higher operating range.
728 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a)

Fig. 3. Architecture of proposed DLL.

switched from ICLK to ICLKB (the differential clock of ICLK),


becomes , and FCLK is also switched
from ICLK to ICLKB. This means that ICLKB is synchronized
to REFCLK. Therefore, in our proposed DLL, the locking fre-
quency range is .
As a consequence, although the same delay source was used,
the operation range of the proposed DLL can be extended to a
lower frequency than that of conventional DLLs. As an anal-
(b) ogous concept, the phase inversion technique was developed
for the wide range [8]. It uses instantaneous phase inversion
in VCDL input at the final moment, when it realizes current
lock-in process cannot meet the range by monitoring its control
voltage. The proposed concept achieves faster lock-in time
since it utilizes the dual-loop operation at the beginning by
selecting an optimized path.

B. Architecture and Implementation of the Proposed DLL


Fig. 3 shows the proposed dual-loop architecture of the
DLL for the wide lock range. Unlike conventional DLLs, the
proposed DLL is composed of dual negative feedback loops
(c) (Loop1, Loop2). Loop1 and Loop2 are the feedback loops of
the differential internal clocks. For the correct operation of
Fig. 2. (a) Concept of proposed DLL. Two cases of loop selection according to
T
the initial time difference: (b) (1=2) <T <T <
and (c) 0 dual loops, a loop selector, an initial circuit, a reset controller,
T < = T(1 2) . and a 2 : 1 MUX are implemented. In the proposed dual-loop
operation, the DLL determines one of the two differential
III. PROPOSED DLL IN HIGH-SPEED DRAM internal clocks in Loop1 and Loop2 according to the initial
time difference between the internal clock and the reference
A. Range of the Proposed DLL clock (shown in Fig. 3). Before the DLL starts up, the initial
The concept of the proposed dual-loop DLL is shown in circuit sets VBP (control voltage of loop filter in Fig. 3) to
Fig. 2(a). After the DLL starts up at the initial , the minimum value, which minimizes the delay of the VCDL
the initial time difference between REFCLK and ICLK is to ensure harmonic lock-free operation. After RESET (DLL
monitored at the first REFCLK cycle. The first REFCLK enable signal) transits from low state to high state, CLK and
cycle refers to the cycle of the REFCLK when ICLK CLKB (the external differential clocks) are provided to the reset
is produced first in the loop after the DLL starts up. If controller. The reset controller outputs IRESET to the clock
as shown in Fig. 2(b), buffer in the time between the falling edge of the next CLK and
is adjusted to by the phase detector and charge the next rising edge, as shown in Fig. 4(c). Since RESET can
pump like the conventional DLL. As a result, ICLK becomes be asserted at any time, the direct application of this signal to
FCLK of the synchronized output clock of DLL. In the other the clock buffer is not feasible because it can make the clock
case, if as shown in Fig. 2(c), buffer produce internal clocks with a distorted cycle and cause
the input clock for phase comparison in the phase detector is incorrect initial time difference in the loop selection cycle, as
KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL 729

(a)
(a)

(b)
Fig. 5. (a) Delay cell of the replica bias circuit. (b) Cross-sectional view of
bias line in the proposed DLL.
(b)

and LCLK is also input to the loop selector as shown in


Fig. 6(a). If the time difference between the first LCLK (the
first produced LCLK after DLL is enabled) and REFCLK is in
as shown in Fig. 6(b), Lsel, the
output of the loop selector, preserves the low state initialized by
IRESET. Lsel at the low state enables PD1 (the phase detector
of Loop1) and disables PD2 (the phase detector of Loop2).
(c)
Furthermore, the state makes the MUX select DCLK as FCLK
Fig. 4. (a) Case where the clock buffer produces an incorrect initial time (the output clock of DLL). Since only PD1 is enabled, the phase
difference without reset controller. (b) Schematic and (c) timing diagram of the
reset controller. of LCLK is compared with that of REFCLK. The selected
PD1, as shown in Fig. 7(a), produces the UP/DN pulses having
a pulsewidth matching the phase difference between REFCLK
shown in Fig. 4(a). Fig. 4(b) shows the schematic of the reset and LCLK, as shown in Fig. 7(b). This PD has small phase
controller. When IRESET is asserted, the clock buffer produces offset due to the fast operation and precision of dynamic logic.
the three clocks (ICLK, ICLKB, and REFCLK). ICLK and Also, it does not have phase dithering problems because no
ICLKB, which are the differential clocks, are input to VCDL pulses are produced at the locked state. The simulation results
and REFCLK is input to the phase detectors as the reference show about 40-ps phase offset at the worst case. The UP/DN
clock. Fig. 5(a) shows the delay cell and the biasing scheme pulse of PD1 is transferred to the charge pump and generates
used in the proposed DLL. To reduce the supply voltage sensi- VBP on the loop filter. The linear capacitor of the loop filter is
tivity, the VCDL is implemented as a series of the differential designed to achieve a large capacitance value in a small area
delay cell with symmetric loads, as shown in Fig. 5(a) [9]. VBP while minimizing the substrate noise, as shown in Fig. 8 [11].
is a control voltage from the loop filter and VBN is generated Although compromising the linearity, if a MOS capacitor is
by the replica bias circuit [boxed area in Fig. 5(a)]. The replica used, larger capacitance can be achieved in smaller area. The
bias circuit makes the constant swing independent of VBP, replica bias generator produces VBN to control the current
which provides better jitter performance and a wider operation source transistor of delay cell according to VBP. Finally, the
range [10]. For shielding the analog biases from the external phase of LCLK is synchronized with that of REFCLK. In the
noise, VBP and VBN are physically enclosed with inter- and other case, where the time difference between the first LCLK
intra-layers as shown in Fig. 5(b). This shielding technique and REFCLK is as shown in
improves the jitter characteristic. DCLK and DCLKB in Fig. 3, Fig. 6(c), Lsel is changed from the low state initialized by
which are converted from a small swing output of VCDL to a IRESET to the high state. It enables PD2 and disables PD1,
full swing output by amplifiers, each forms negative feedback and makes the MUX select DCLKB as FCLK. In contrast to
loops and also are changed to LCLK and LCLKB by replica the prior case, the phase of LCLKB is compared with that
delays. LCLK and LCLKB are input to the phase detectors of REFCLK. Through the same locking process, LCLKB is
730 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a)

(a)

(b)

(c)
Fig. 6. (a) Schematic and timing diagrams at (b) (1=2) 2T <T <
T <T
, (c) 0 < = 2T
(1 2) of the loop selector. (b)
Fig. 7. (a) Schematic and (b) timing diagram of the phase detector.

synchronized with REFCLK. Once one of the two loops is


selected, with the exception that the DLL is disabled or the off-chip such as output load, clock slew-rate, and so on, may re-
power is down, the selected loop is never changed by the time sult in unavoidable skew. There are two methods for skew elimi-
difference between LCLK and REFCLK after the loop selection nation, wafer trimmed by laser [12] and post-package tuning by
cycle. Therefore, malfunction is avoided on the lock-in process. antifuse [13]. The wafer-level tuning is not effective because the
Consequently, the proposed dual-loop operation eliminates the wafer tester is not precise, and the off-chip condition cannot be
initial condition, , between the reference considered. Although post-package tuning by antifuse is more
clock and the internal clock and enables the delay of VCDL practical, the previous post-package method has some problems.
to be utilized fully. The lock range is also extended to lower The previous post-package method uses external high voltage
frequency without compromising the jitter characteristic. through pins for rupturing the antifuse. Providing sufficient high
voltage for rupturing the antifuse can cause physical damage to
other circuits connected to the pin and can negatively affect the
IV. SKEW CALIBRATION BY PROGRAMMABLE REPLICA DELAY
reliability of the device. To remove the high-voltage problem,
AND ANTIFUSE CIRCUITRY
the antifuse programming scheme by internal negative voltage
Although the replica delay is well matched with the sum of is used [14]. Fig. 9 shows the programmable replica delay cir-
on-chip and off-chip delay at design process, process variation cuit. The circuitry has three functional parts, the replica delay
of on-chip and unexpected change in the circumstance of of clock, the replica delay of the output buffer, and the tun-
KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL 731

Fig. 10. Antifuse circuit for skew calibration and SEM photograph of the
Fig. 8. Linear capacitor in the loop filter.
antifuse.

Fig. 9. Replica delay including the programmable delay.

able delay circuit. The tunable delay circuit is connected to the Fig. 11. Flow of skew calibration after package process.
antifuse circuitry and the antifuse is made of ONO (oxide–ni-
tride–oxide) dielectrics, as shown in Fig. 10. The sequence of
skew calibration is explained as follows. When DLL is enabled
by RESET for the test, nodes fd [1]–[8] and bd [1]–[8] in Fig. 9
are all fixed at the high state, because the initial program voltage
is at ground level, and RESET initializes node A and B as
level. In this state, no address code can have an effect on the
fixed levels of fd [1]–[8] and bd [1]–[8]. First, the skew between
the external clock and the data strobe signal is measured. The
measured skew is estimated by selection of optimal number of
delay loads. After the program mode (PGM) is activated, the
program code signifying the estimated number of delay loads
is applied to the address pins and the skew is remeasured. This
process is iterated to increase or decrease replica delay times by
left-shift–right-shift (LSRS) for minimizing the skew. When the
skew is almost eliminated, the inserted program address code is
fixed and the on-chip negative voltage generator is enabled to
produce a program voltage ( V) for rupturing the
Fig. 12. Microphotograph of the proposed DLL.
antifuses. The replica delay is tuned through the flow shown in
Fig. 11. According to the simulation results, the programmable
fabricated chip. The active area of DLL occupies 0.27 mm .
tuning range using the eight antifuses is from 350 to 350 ps
The loop filter consumes 50 of total area. For high-fre-
and the minimum tuning resolution is approximately 10 ps.
quency measurements of the proposed DLL, a chip-on-board
(COB) has been fabricated both to reduce parasitics and to
V. EXPERIMENTAL RESULTS match 50- impedance of the measurement instrument. The
The proposed DLL has been fabricated using 0.16- m proposed DLL operates from 42 to 400 MHz with a 2.3-V
DRAM process. Fig. 12 shows a microphotograph of the power supply. Fig. 13 shows the synchronized waveforms at
732 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a) (b)
Fig. 13. Synchronized waveforms at (a) 42 MHz and (b) 400 MHz.

(a) (b)
Fig. 14. Measured jitter characteristics at 400 MHz in (a) a quiet supply and (b) with injected 1-MHz 6300-mV square wave noise.
42 and 400 MHz. At 400 MHz, the peak-to-peak jitter is 43 ps reduced to 9 ps with measured peak-to-peak jitter of 46 ps. In
and the rms jitter is 4.71 ps, as shown in Fig. 14(a). When theory, error reduction resulting in negative phase shift will
a 300-mV 1-MHz square wave is injected externally on have increased jitter due to increased load in replica delay,
the power supply, the peak-to-peak jitter and the rms jitter is but error reduction resulting in positive phase shift will have
measured to be 80 and 7.46 ps, respectively, at 400 MHz, as decreased jitter by decreased load in replica delay. However,
shown in Fig. 14(b). Fig. 15(a) shows a skew that is composed from analyzing the measured results, the amount of increased
of the phase offset of the phase detector and replica mismatch jitter by negative phase reduction is insignificant compared
by process variation before skew calibration and the tuned skew to the reduced phase error. Fig. 15(b) shows the resolution
after skew calibration. Before the calibration, the measured and partial range of the skew calibration through antifuse
skew is 55 ps. After the calibration, the remeasured skew is programming. Minimum resolution is about 10 ps and total
KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL 733

(a) (b)
Fig. 15. (a) Measured skew at 400 MHz before skew calibration and after skew calibration. (b) Range (full range not displayed for limitation of tester) and
resolution of skew calibration.

TABLE I clock and the internal clock of the DLL. Also, an improved skew
PERFORMANCE CHARACTERISTICS OF THE PROPOSED DLL calibration method demonstrated a practical post-package skew
calibration using the antifuse circuitry and the internal negative
voltage generator. The proposed DLL, fabricated on 0.16- m
DRAM process, achieves a wide range from 42 to 400 MHz,
and 43 ps peak-to peak jitter and 4.71 ps rms jitter at 400 MHz
that is applicable to high-speed DRAMs.

ACKNOWLEDGMENT
The authors are grateful to H. Ryu and Dr. Y. Kim for helpful
discussion about COB-type PCB design.

REFERENCES
[1] A. Hatakeyama et al., “A 256-Mb SDRAM using a register-controlled
digital DLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 1728–1734, Nov.
1997.
[2] Y. Okajima et al., “Digital delay-locked loop and design technique for
high-speed synchronous interface,” IEICE Trans. Electron., vol. E79-C,
calibration range is from 350 to 350 ps, as expected from pp. 798–807, June 1996.
simulation. These results show that the skew by variation in [3] T. H. Lee et al., “A 2.5-V CMOS delay-locked loop for an 18-Mbit 500-
on-chip or off-chip can be eliminated through programmable Mbyte/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496,
Dec. 1994.
replica delays using the antifuse circuitry, and also verifies [4] S. Tanoi et al., “A 250–622-MHz deskew and jitter-suppressed clock
that the improved skew calibration technique can effectively buffer using two-loop architecture,” IEEE J. Solid-State Circuits, vol.
eliminate the skews after packaging without degradation of 31, pp. 487–493, Apr. 1996.
[5] S. Sidiropoulos et al., “A semi-digital dual delay-locked loop,” IEEE J.
the jitter characteristic. The power dissipation of the proposed Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997.
DLL is 52 mW at 400 MHz. Table I summarizes the measured [6] Y. Okuda et al., “A 66–400-MHz adaptive-lock-mode DLL circuit with
duty-cycle error correction,” in Symp. VLSI Circuits Dig. Tech. Papers,
characteristics of the proposed DLL. June 2001, pp. 37–38.
[7] C. H. Park et al., “A low-noise 900-MHz VCO in 0.6-m CMOS,” IEEE
VI. CONCLUSION J. Solid-State Circuits, vol. 34, pp. 586–591, May 1999.
[8] T. Yoshimura et al., “A delay-locked loop and 90-degree phase shifter
In this paper, the dual-loop architecture with the improved for 800-Mb/s double data rate memories,” in Symp. VLSI Circuits Dig.
skew calibration method was presented. The dual-loop architec- Tech. Papers, June 1998, pp. 66–67.
[9] J. G. Maneatis, “Low-jitter and process-independent DLL and PLL
ture enabled the wide range of the DLL by using the loop selec- based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31,
tion decided by an initial time difference between the reference pp. 1728–1732, Nov. 1998.
734 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

[10] I. A. Young et al., “A PLL clock generator with 5 to 110 MHz of lock Joo Hwan Cho was born in Seoul, Korea, in 1968.
range for microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. He received the B.S. degree in electronic materials
1599–1607, Nov. 1992. engineering from Kwang-Woon University, Seoul, in
[11] F. Herzel et al., “A study of oscillator jitter due to supply and substrate 1992.
noise,” IEEE Trans. Circuits Syst. II, vol. 46, pp. 56–62, Jan. 1999. He joined the Semiconductor Research and Devel-
[12] T. Hamamoto et al., “A skew and jitter suppress DLL architecture for opment Center, Hynix Semiconductor Inc, Ichon-si,
high-frequency DDR SDRAMs,” in Symp. VLSI Circuits Dig. Tech. Pa- Kyungki-Do, Korea, in 1992. Since then, he has been
pers, June 2000, pp. 76–77. working on DRAM design and failure analysis.
[13] S. Kuge et al., “A 0.18-m 256-Mb DDR-SDRAM with low-cost post-
mold-tuning method for DLL replica,” in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2000, pp. 402–403.
[14] K. S. Min et al., “A post-package bit-repair scheme using static latches
with bipolar-voltage programmable antifuse circuit for high-density
DRAMs,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2001, pp.
67–68. Pil Soo Lee was born in Seoul, Korea, in 1963. He
received the B.S. and M.S. degrees from Inchon Uni-
versity, Korea, in 1990 and 1992, respectively.
In 1993, he joined KEC, Kumi, Korea, where
Se Jun Kim was born in Seoul, Korea, in 1974. He he worked on power device design and analysis. In
received the B.S. and M.S. degrees in electronics en- 1997, he joined Hynix Semiconductor Inc., Ichon-si,
gineering from Hanyang University, Seoul, in 1998 Kyungki-Do, Korea, where he has been working on
and 2000, respectively. signal integrity analysis of high-frequency devices,
In 2000, he joined the Memory Research and circuits, and boards.
Development Division, Hynix Semiconductor
Inc., Kyungki-Do, Korea, as a Research Engineer,
where he has been working on CMOS circuit and
architecture for high-speed digital/analog interface.
His current interests include clock recovery circuits,
data converters, clock distribution, and I/O circuits Jin Hong Ahn was born in Busan, Korea, in 1958.
for high-speed digital/analog interface. He received the B.S. and M.S. degrees in electronic
engineering from Seoul National University, Seoul,
Korea, in 1982 and 1984, respectively.
He joined Gold-Star Semiconductor Company,
Sang Hoon Hong received the B.S. degree in Gumi, Korea, in 1984. From 1986 to 1990, he was
electronic engineering from Yonsei University, involved in designing SRAMs and mask ROMs.
Seoul, Korea, in 1993. He received the M.S. and In 1991, he moved to the DRAM design group,
Ph.D. degrees in engineering sciences from Harvard Gold-Star Electron Company, Seoul. From 1991 to
University, Cambridge, MA, in 1998 and 2001, 1998, he managed several generations of advanced
respectively. DRAM design projects, including 64-Mb, 256-Mb,
He is currently with the Memory Research and MML, and intelligent RAM. His interests in DRAM design include new
Development Division of Hynix Semiconductor DRAM architectures, next-generation DRAM circuit technologies, and
Inc., Ichon-si, Kyongki-Do, Korea, working on low-cost DRAM design techniques. In 1999, he joined the Memory Research
high-speed dynamic memories with a partic- and Development Group, Hynix Semiconductor Inc., Ichon-si, Korea, where
ular interest in low-voltage/power circuits and he was engaged in the development of 0.15-m 256-M DRAM. He is currently
architectures. a Technical Director in DRAM Design technology.

Jae-Kyung Wee was born in Seoul, Korea, in 1966. Jin Yong Chung received the B.S.E.E. degree from
He received the B.S. degree in physics from Yonsei Seoul National University, Seoul, Korea, in 1974 and
University, Seoul, in 1988 and the M.S. degree the M.S.E.E. degree from Korea Advanced Institute
from Seoul National University in 1990. In August of Science and Technology, Taejon, Korea, in 1976.
1998, he received the Ph.D. degree in electronics From 1976 to 1978, he worked for Korea Semicon-
engineering on modeling and characterization ductor Inc., which later became Semiconductor Busi-
of interconnects for high-speed and high-density ness Unit of Samsung Electronics, where he was in-
circuits from Seoul National University. volved in the design of timepieces and custom CMOS
In 1990, he joined Hyundai Electronic Company chip designs. Since 1979, he was involved in memory
working on the process integration of 16 MDRAM design area and worked for various companies in-
and LOGIC devices. In 1996, he was engaged in the cluding National Semiconductor, Synertek, Vitelic,
development of the manufacturable 0.35-m CMOS logic technology for high- developing CMOS SRAMs, 4 K to 64 K and mask ROMs and CMOS DRAMs.
performance logic products at Hyundai Electronics. In August 1998, he became In 1987, he joined LG Semiconductor, Korea, where he developed 256 K to 16 M
a Project Leader of the Antifuse Repair Circuit Development Team. From Au- DRAMs and other standard logic products. In 1992, he joined Mosel-Vitelic,
gust 1999 to June 2000, he was a Project Leader of 1-G DDR SDRAM using 2
where he developed high-speed DRAMs and the 256 K 8 high-speed DRAM
0.13-m technology. Beginning in July 2000, he also worked on next-generation became the first semi-standard DRAM, which helped the company to go public.
DRAM and its related systems. He is currently with the faculty of Hallym Uni- Since 1996, he has worked for Hynix Semiconductor Inc., Ichon-si,
versity, Chunchun-si, Kangwon-Do, Korea. His research interest is in the area of Kyoungki-Do, Korea, as a Senior Vice President and Chief Architect in the
future DRAM architecture including high-speed DRAM with 200 400 MHz Memory Research and Development Division. His current research interest is
clock, interconnect modeling, charge pump, DLL, I/O, and module designs for in development of ultrahigh-speed, super low-voltage and low-power memory
high-speed chips. He holds several patents and is an author or co-author of sev- products, novel device research in ferroelectric and magnetic memories, and
eral papers. new-generation 3-D devices.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000 1137

A Low-Noise Fast-Lock Phase-Locked Loop with


Adaptive Bandwidth Control
Joonsuk Lee, Student Member, IEEE, and Beomsup Kim, Senior Member, IEEE

Abstract—This paper presents a salient analog phase-locked respond properly to unpredictable phase fluctuation, instant
loop (PLL) that adaptively controls the loop bandwidth according frequency shift, and time-varying jitter because the sequence
to the locking status and the phase error amount. When the phase was calculated with preknown fixed noise statistics.
error is large, such as in the locking mode, the PLL increases the
loop bandwidth and achieves fast locking. On the other hand, Discrete-time PLL’s, which are programmed on DSP proces-
when the phase error is small, this PLL decreases the loop band- sors, based on a recursive least squared (RLS) algorithm [5]
width and minimizes output jitters. Based on an analog recursive or the Kalman filter algorithm [6] can respond to such unpre-
bandwidth control algorithm, the PLL achieves the phase and dictable jitter variations, but require enormous amount of hard-
frequency lock in less than 30 clock cycles without pre-training, ware. The outputs generated from the discrete-time PLL’s are
and maintains the cycle-to-cycle jitter within 20 ps (peak-to-peak)
in the tracking mode. A feed forward-type duty-cycle corrector in a digital domain, and therefore the discrete-time PLL’s re-
is designed to keep the 50% duty cycle ratio over all operating quire digital-to-analog converters (DAC) and an analog-to-dig-
frequency range. ital converter (ADC) to sample input signals for detection. Slow
Index Terms—Adaptive bandwidth PLL, analog imple- signal-processing speed of the digital-to-analog conversion in
mentation, clock recovery, fast locking time, frequency hop- the discrete-time PLL’s limits the operating frequency and con-
ping, gear-shifting algorithm, low jitter, phase-locked loops, fines the use of the PLL’s to the applications dealing with low-
time-varying channel. frequency signals like digital wireless base stations.
This paper presents a new analog adaptive PLL (AAPLL) ar-
I. INTRODUCTION chitecture capable of varying the loop bandwidth according to
an adaptively updated control sequence under a time-varying

P HASE-LOCKED loops (PLL’s) have been widely used


in high-speed data communication systems such as Eth-
ernet receivers, disk drive read/write channels, digital mobile
noise environment. Since the control sequence is generated from
analog signal processing, the PLL operates at several hundred
megahertz and can be easily modified to run at gigahertz fre-
receivers, high-speed memory interfaces, and so forth, because
quency ranges.
PLL’s efficiently perform clock recovery or clock generation
This paper consists of five sections including the present sec-
with relatively low cost. Those PLL’s used in the systems are
tion. Section II describes the AAPLL architecture and the analog
required to generate low-noise or low-jitter clock signals and at
adaptive bandwidth-control algorithm. Stability and jitter anal-
the same time need to achieve fast locking.
ysis for the AAPLL are given in Section III. AAPLL locking be-
Conventional analog PLL’s in clock recovery applications
haviors are also discussed in this section. Section IV shows the
use a narrow-band loop filter to reduce output jitters at the
AAPLL IC implementation and measurement results. Finally, a
expense of elongated locking time. In order to improve the
brief summary of this paper is given in Section V.
locking-time characteristics, digital or hybrid analog/digital
PLL’s with a loop bandwidth stepping capability have been
studied [1], [2]. Since the stepping hardware is implemented II. RECURSIVE EQUATION AND ANALOG LOOP BANDWIDTH
with complex digital building blocks, these PLL’s usually CONTROLLER
suffer from high power dissipation, low operating speed and In this section, a recursive bandwidth update algorithm for the
large die size. In order to reduce consuming power and die analog adaptive controller and its implementation are described.
size, simpler algorithms such as a gear-shifting or a lock-de-
tection algorithm were attempted [3], [4]. The PLL’s with such A. Adaptive Bandwidth Control
algorithms control the loop bandwidth according to a prestored
As mentioned in the introduction, a common approach to
charge-pump current control sequence in memory during the
improve the locking speed of a PLL is to use a gear-shifting
start-up mode. However, in clock recovery applications such
method for loop bandwidth control. In such a PLL, when fast
as HDD and DVD, where the channel characteristics vary in
locking is required, as in the initial frequency/phase acquisition
time, the prestored control sequence cannot make the PLL’s
mode, the loop bandwidth of the PLL is expanded by the in-
creased charge-pump current or the phase detector gain [2]–[4].
Manuscript received October 29, 1999; revised February 23, 2000.
J. Lee was with the Boston Design Center, IBM Microelectronics, Lowell,
Zero phase start (ZPS) is also helpful to reduce the phase ac-
MA 01851 USA. He is now with the Korea Advanced Institute of Science and quisition time [4], but limited to the case when the initial-fre-
Technology, Taejon 305-701, Korea. quency locking has been already established. For the case where
B. Kim is with the Korea Advanced Institute of Science and Technology,
Taejon 305-701, Korea (e-mail: bkim@ee.kaist.ac.kr). rapid initial-frequency locking is required, various techniques
Publisher Item Identifier S 0018-9200(00)06435-0. with a prestored gear-shifting sequence have been studied [4],
0018–9200/00$10.00 © 2000 IEEE
1138 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 1. Linearized model of a CP-PLL.

equation is required and used in the proposed AAPLL. The up-


date equation is given by (2), in terms of the loop gain
that is proportional to the loop bandwidth.1

(2)

Here, is a forgetting factor that has a positive value close to


but less than unity. is a coefficient that normalizes and con-
verts the absolute value of input–output phase errors from ra-
dians to dimensionless numbers. The loop gain
is calculated by a recursive manner according to (2). When the
input–output phase error becomes zero, the forgetting factor
makes the loop gain converge to zero as the discrete time in-
creases. Equation (2) reflects the most recent input–output phase
error most significantly. This recursive
relation is similar to the RLS algorithms commonly used for an
estimator [8].
The loop gain at time is calculated as
Fig. 2. Conceptual diagram of an analog adaptive controller. the weighted sum of the present loop gain and the ab-
solute value of the present input–output phase error,
[5]. However, in the case where the channel characteristics vary at time . The equation indicates that the loop
in time, such as in a disk drive, the prestored gear-shifting se- gain, thus the loop bandwidth, rapidly grows when the recent
quence is not helpful. Unpredictable phase fluctuation, instant absolute phase errors become large, and greatly improves the
frequency shift, and varying input jitter force such a PLL to PLL loop tracking capability. When the recent absolute phase
use an indefinite wide loop bandwidth in order not to lose the errors become small, the loop gain shrinks, as does the loop
locking. Although a discrete-time adaptive PLL can adjust the bandwidth, because the first part of (2) dominates. The reduced
bandwidth according to the input noise statistics, it still requires loop bandwidth improves the PLL’s input jitter rejection capa-
complex hardware and its applications are limited to the low fre- bility. Therefore, (2) satisfies the necessary loop bandwidth con-
quency operating systems [5]. trol under the presence of unpredictable jitter variation.
A linearized model of a charge-pump PLL (CP-PLL) is
B. Analog Adaptive Bandwidth Control
shown in Fig. 1. The transfer function for domain is repre-
sented by Equation (2) is achieved by a CP-PLL with a small amount of
extra hardware. The second term of (2), an absolute phase error,
is obtained from outputs of a phase frequency detector (PFD).
(1)
Since the PFD and the following charge-pump circuit generate
up/down current signals proportional to the input–output phase
where . Here and difference, simply combining these up/down signals through
are the phase detector gain given by and the voltage- an OR gate gives the absolute phase error signal
controlled oscillator (VCO) gain given by , respectively, at time . Fig. 2 shows a conceptual
and is the -transform of the sampled version of , diagram of how the recursive loop gain of the AAPLL is calcu-
where is the PLL loop filter in domain given by lated in the controller. The bandwidth voltage becomes the
if a simple passive low-pass filter ( ) is assumed bias voltage of the following current source in the charge-pump
to be used as a loop filter. The quantity is called the circuit and controls the amount of charge-pump currents. The
PLL loop gain . current switch steers the current proportional to the phase
Discrete-time PLL’s that have an adaptive stepping capability error and increases the voltage across the capacitor by the
can control the loop bandwidth by a loop-gain update equation corresponding amount at a constant rate while the resistor
minimizing the RLS error [7]. However, it is difficult to fully exponentially discharges the capacitor. The resistor and the ca-
implement the update equation used in the discrete-time PLL pacitor realize the first part of (2) with the forgetting factor . As
because it requires a significant amount of die size and power 1Here, the loop bandwidth and the loop gain have the following relationship:
consumption. A simpler but still an effective loop gain update K =W =f .
LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP 1139

Fig. 3. AAPLL total block diagram.

derived in the Appendix, in the steady state the voltage across


the capacitor at time is given by

(3)

Here, the forgetting factor equals , is


, and is the amount of the charging
current in the controller.
The loop gain is asymptotically proportional to the bandwidth
voltage governed by (3) because the charge-pump current
is directly controlled by this voltage. It means that the bandwidth
of the AAPLL follows (2).

III. CIRCUIT IMPLEMENTATION


This section describes the circuit implementation of the
adaptive bandwidth controller, the charge-pump circuit, the
VCO, and the duty cycle correction circuit. Fig. 3 shows
the overall block diagram of the AAPLL, which modifies
a conventional PLL by attaching an analog adaptive band-
width-controlling block. Due to the minor change, the AAPLL
is easily applicable to various PLL applications and still takes
advantage of the full adaptability. Fig. 4. PFD schematic with simplified TSPC D-flip–flops.

A. Adaptive Bandwidth Controller and Charge Pump


The well-designed PFD is used as a phase-detecting block
instead of a mixer, though the input signal frequency is high,
in order to achieve a wideband capturing capability. The PFD
shown in Fig. 4 consists of two simplified true single-phase
clock (TSPC) D-flip–flops and one NOR gate. Since the input
frequency of the AAPLL is selected to recover the clock signal
in DVD systems, whose clock frequency is about 250 MHz, the
PFD should generate up and down signals at such a high speed.
In order to minimize the abnormal operation of the PFD, TSPC
D-flip–flops are used as leaf cells since these intrinsic delays
are smaller than those of conventional ones.
The adaptive bandwidth controller shown in Figs. 3 and 6
consists of an OR gate and a differential switch, which takes the
differential signals from the OR gate. The OR gate sums the up Fig. 5. Up/Down and phase-error signal diagram.
and down signals generated from the PFD and gives the abso-
lute phase error. The differential switch controls current paths of the capacitor exceeds the discharging rate. Hence the
from to the bandwidth capacitor according to the capacitor voltage of the capacitor and the pumping
phase difference for one clock period. current in the charge pump increase. As a result, the phase de-
When the phase difference signal is mostly on over a period, tector gain increases and so does the loop bandwidth. On the
such as in the initial-phase acquisition state, the charging rate other hand, when the phase error signal is off for the most part
1140 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 6. Adaptive bandwidth controller and CP schematics.

of one period, such as in the tracking state, the discharging rate


exceeds the charging rate and the capacitor voltage decreases.
Therefore both the phase-detector gain and the loop bandwidth
decrease. In the steady-state tracking mode, the AAPLL loop
bandwidth can be very narrow because the phase error becomes
zero. However, the AAPLL still maintains the minimum loop
bandwidth even in such a case because of the up/down signals
generated from the set/reset type PFD, as shown in Fig. 4. In the
zero-phase-error and perfect locking case in Fig. 5, the OR-gated
effective phase-error signal can still supply currents to the band-
width capacitor. The statistical variation of the input signal also Fig. 7. VCR schematic.
contributes to maintain this minimum bandwidth. Fig. 5 shows
the relation between the phase error and the bandwidth control
of the AAPLL. conventional differential VCO’s with a current bias, this VCO al-
Fig. 6 shows the circuit diagram of the analog adaptive band- lows the AAPLL to operate under a single 1.5-V power supply,
width controller with a charge-pump circuit. As mentioned be- consuming 1.5 mA. Because the output signal of the VCO swings
fore, the voltage across the capacitor in parallel rail-to-rail, no additional level shifter with a carefully designed
with a resistor controls the phase-detection gain . In replica bias circuit is required to generate CMOS level outputs.
order to control the discharging rate of the capacitor ,a The latch, configured with pMOS , and , sharpens the
voltage-controlled resistor (VCR) , as shown in Fig. 7, is edge of the output signal so that the added noise has little chance
used. The VCR is designed to have fully linear I–V character- to be converted as jitters. Eventually this latch helps the reduction
istics for a given power supply range. By adjusting the magni- process of the VCO jitter [9], [10].
tude of the bias current in the VCR branches, the resistance of
is changed and so does the discharging rate of the capac- C. Duty-Cycle Corrector
itor . Maintaining a 50% duty-cycle ratio for a clock signal is
The output node of is connected to the gates of nMOS extremely important in most high-speed clock recovery and
transistors, and controls the charge-pump current by adjusting clock generation applications because several systems, such
the bias point of and in Fig. 6. The charge pump consists as double-data rate (DDR) SDRAM’s and pipelined micro-
of two differential input stages, a mirror stage, an output stage, processors, use negative transition edges of a clock signal
and two small extra current sources. These two small current in order to increase total system throughputs. This is often
sources help the rapid turn-on/off operation for MOS , and achieved by a VCO running at twice as high as the desired clock
. The differential PFD signals drive the charge-pump inputs. frequency, and then dividing the VCO frequency by 2. Other
When the down signal goes high, the current controlled by the approaches use a feedback-type duty-cycle corrector. Since
voltage of the bandwidth capacitor is drawn from the loop precise placement of the falling edge between two successive
filter. When the up signal goes high, the same amount of current rising edges of the VCO output signal is generally controlled by
is supplied to the loop filter. an additional feedback loop, the duty-cycle correctors require
an extra training period to stabilize the feedback loop.
B. Voltage-Controlled Oscillator (VCO)
The AAPLL uses a feed forward-type duty-cycle corrector
A four-stage VCO as shown in Fig. 8 is used for the AAPLL. instead of the feedback type in order to eliminate the extra feed-
The basic delay cell consists of six transistors. The cross-coupled back hardware and the training period, as shown in Fig. 9(a).
pMOS transistors, and , guarantee the differential opera- The duty cycle corrector utilizes multiphase signals generated
tion of the delay cell without a tail-current bias. Auxiliary pMOS from a multistage differential VCO. The signal in Fig. 9(b)
transistors, and , control the oscillation frequency. Unlike selected from the multiphase signals turns on MOS , and
LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP 1141

Fig. 8. Four-stage VCO.

(a)

Fig. 10. Stability diagram for loop gain K .

A. Stability
Since the AAPLL automatically changes the loop bandwidth,
(a) a careful loop stability analysis is required. As mentioned in
the previous section, an analog adaptive controller adjusts
Fig. 9. Feed forward-type duty-cycle corrector. (a) Duty-cycle corrector the phase-detector gain of the CP-PLL. Therefore, stability
schematic. (b) Conceptual diagram of the correcting operation.
checking for the PLL for each different phase gain should
be accomplished first. A complete stability analysis for the
, and charges the output node of the duty-cycle corrector CP-PLL is cumbersome because a PLL operates in both a
almost instantaneously, because the discharge path of the node linear and a nonlinear region. A simplified stability analysis for
is already off due to the signal . The signal , which is also a second-order CP-PLL [8] is used in this section. When the
selected from the multiphase signals, is the one whose rising criterion is extended to include the logic delay effect, it can be
edge is shifted by 180 in phase from that of . Similarly, the expressed as
signal rapidly discharges the node and delivers the desired
50% duty-cycle signal. Since this duty-cycle correction circuit (4)
consists of only two transmission gates and two inverters, the sil-
icon area is minimal and the power consumption is negligible. Here, , , and are the clock period, the logic delay, and the
In HSPICE simulation, the proposed duty-cycle corrector keeps RC time constant of a loop filter respectively. The stability limit
the output duty cycle almost perfectly at 50% with the input duty for the loop gain of the AAPLL is derived and simulated
cycles varying from 10 to 90%. using this criterion as shown in Fig. 10. The adaptively gener-
ated loop gain sequence by the recursive equation is also shown
IV. ANALYSIS AND SIMULATION in the same figure to verify the AAPLL stability. The sequence
converges to the minimum bandwidth and the amplitude of this
In this section, the stability of the AAPLL is analyzed for the bandwidth is almost similar to that derived from the MMSE cri-
adaptively generated loop sequence, and behavioral simulation terion [4]. Equation (4) can be written to obtain the stability cri-
results for fast lock and large jitter reduction are described. terion for the bandwidth voltage by solving a MOS I-V
1142 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 12. Simulation setup for the locking behavior measurement.


Fig. 11. Stability limit graph for bandwidth voltage V .

characteristic equation for , in Fig. 6 assumiing a satu-


ration condition.

(5)

Here, , , , and are , the


VCO gain, the nMOS threshold voltage, and the size of a resistor
of the loop filter in Fig. 3, respectively. Here, is the
size of , . The stability limit for the bandwidth voltage,
which is one of the observable values in the measurement setup,
is visualized in Fig. 11 for various capacitor and resistor
values. The figure shows that all the sequences and (a)
are within the stable region for the various resistor and
capacitor values.

B. Output Jitter
Recently, it was reported that a CP-PLL has an optimum loop
bandwidth that generates minimum jitter in the steady state [11].
A clean tone, that is assumed to have only noise floor and no
random walking phase noise, is used as a reference signal for the
jitter derivation. Because the AAPLL eventually achieves the
steady state locking with a clean reference signal like other con-
ventional PLL’s, the output cycle-to-cycle jitter of the AAPLL
can be calculated by
(b)

Fig. 13. Simulation results for the locking behavior of the AAPLL and
conventional ones. (a) Fixed narrow-bandwidth PLL. (b) Fixed wide-bandwidth
PLL.

(6)
extensively tested by the circuit simulator. Fig. 12 shows the sim-
Here, , , , and are the internal jitter ulation setup for the AAPLL. Fig. 13 compares the simulated
from the VCO, the jitter of the input signal, the rms value of locking behavior of the AAPLL with that of a conventional PLL.
the charge-pump current variation, and the rms value of VCO The bandwidths of the conventional PLL are selected to have two
control voltage noise in the steady state, respectively. typical values. One is optimized for the initial locking, and the
other for the steady-state tracking. The gray line in Fig. 13(a) in-
C. Behavioral Simulation of a Locking Feature dicates an incoming signal in the phase domain. The solid line
Closed-form analysis of locking behaviors for the AAPLL is in the figures shows the phase of the AAPLL output signal from
difficult because of its nonlinear operation. In this paper, a sim- initial locking to steady-state tracking. The phase variation of
ulation-based approach like the Monte Carlo Method is used in- the conventional PLL optimized for steady-state tracking with a
stead. The AAPLL is modeled in a SPICE circuit simulator and narrow bandwidth is shown in the same figure as a dashed line.
LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP 1143

Fig. 17. Experimental results of the locking for a 150–200-MHz input signal.

Fig. 14. Micrograph of the fabricated AAPLL chip.

Fig. 18. Experimental results of the locking for a 180–220-MHz input signal
by four steps.

Fig. 15. Control voltage change for a 0–250 MHz frequency input.

Fig. 16. Loop bandwidth voltage change for a 0–250-MHz frequency input.

In Fig. 13(b), the phase change of the conventional PLL opti-


mized for initial locking is also shown as a dashed line. This sim-
ulation result gives several characteristics of the AAPLL. The
AAPLL controlled by the recursive algorithm achieves fast lock
in the initial locking period, comparable to the speed obtained
from a wide-bandwidth PLL because the consecutive error sig-
nals rapidly increase the loop bandwidth of the AAPLL. In the
steady-state tracking mode, the AAPLL substantially rejects the
Fig. 19. 2.544-ps (rms) and 20-ps (peak-to-peak) cycle-to-cycle jitter at
input jitter due to the narrower loop bandwidth. 250-MHz input.

V. EXPERIMENTAL RESULTS is shown in Fig. 14. To get the forgetting factor ,


The AAPLL is fabricated in a 0.6- m single-poly triple-metal k , 20 pF are used. And 100 A
n-well CMOS process [12]. The die size for the AAPLL is is selected to get . The locking-speed measure-
0.11 mm . The total power consumption is less than 15 mW ment is carried out using an abrupt change of the input signal
with a single 3-V supply. A microphotograph of the AAPLL frequency from 0 to 250 MHz. Fig. 15 shows the corresponding
1144 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 20. 50% duty-cycle correction operation over the entire frequency range.

Fig. 22. Comparison between recently reported PLL’s and DLL’s and this
work.

TABLE I
AAPLL CHARACTERISTICS SUMMARY

Fig. 21. VCO linearity.

VCO control-voltage variations. In order to measure the locking


speed precisely, the running cycles of the output waveform are
counted from the frequency triggering point in the initial locking
state. The AAPLL requires less than 30 clock cycles for both fre-
50% duty-cycle ratio within 2% error for the region. Fig. 21
quency and phase lock in this case. The voltage of , repre-
shows a VCO linearity diagram. The VCO gain is about 100
senting the AAPLL bandwidth, is also measured. Fig. 16 shows
MHz/V at a 250-MHz input frequency. The AAPLL operates
the measured voltage and describes the adaptation of the loop
from 80 to 290 MHz with a 3-V supply voltage. Fig. 22 com-
bandwidth in the AAPLL. Figs. 17 and 18 show the measured
pares the normalized peak-to-peak jitter and the lock time of the
control voltages and the corresponding output signals when the
AAPLL with those of recently reported PLL’s and DLL’s. Mea-
input frequencies vary from 150 to 200 MHz and from 180 to
sured characteristics are summarized in Table I.
220 MHz by four steps respectively. In this case, the frequency
and the phase locking require less than 10 symbol periods be-
VI. CONCLUSION
cause the frequency steps are much smaller compared to the pre-
vious case. The measured cycle-to-cycle jitters of the AAPLL This paper presents the design of a 250-MHz low-jitter
output signal at a 250-MHz input signal are 2.54 ps (rms) and 20 fast-lock analog adaptive bandwidth-controlled PLL on a single
ps (peak-to-peak) as depicted in Fig. 19. This jitter value con- chip. The chip is implemented in a 0.6- m standard CMOS
tains the inherent measurement setup jitter [13]. process. Simple recursive control logic is proposed to control
In order to test the performance of the duty cycle correction the bandwidth effectively. The measured locking time is less
circuit, the duty cycle ratio of the output signal is measured with than 10 cycles in a 10-MHz frequency step and less than 30
the input signals from 90 to 260 MHz. Fig. 20 shows the mea- cycles from an unknown frequency signal to the 250–MHz
sured result of the corresponding duty cycle ratio. This result signal respectively. The measured output jitters are 2.6 ps (rms)
indicates the feed forward-type duty-cycle corrector maintains and 20 ps (peak-to-peak). All the components are designed
LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP 1145

using analog technique and hence the required die size and the [8] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice
power consumption are minimal. Hall, 1995.
[9] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in
CMOS ring oscillators,” in Proc. Int. Symp. Circuit and Systems, vol. 4,
APPENDIX London, U.K., June 1994, pp. 27–30.
[10] C. H. Park and B. Kim, “A low-noise 900-MHz VCO in 0.6-m CMOS,”
As shown in Fig. 2, the OR gate gives the control signal for IEEE J. Solid-State Circuits, vol. 34, pp. 586–591, May 1999.
the switch according to the phase error signal in the band- [11] K. Lim, C. H. Park, and B. Kim, “Low noise clock synthesizer design
using optimal bandwidth,” in Proc. Int. Symp. Circuit and Systems, Mon-
width controller. When the phase error of the signal is high, the terey, CA, June 1998, pp. 163–166.
controller signal from the OR gate feeds current to the band- [12] J. Lee and B. Kim, “A 250 MHz low jitter adaptive bandwidth PLL,”
width capacitor and the voltage across the capacitor in- ISSCC Dig. Tech. Papers, pp. 346–347, Feb. 1999.
[13] J. McNeil, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits, vol.
creases at a constant rate . As a result, the bandwidth 32, pp. 870–879, June 1997.
voltage increases proportional to the normalized phase
error . After the charging process, the controller signal
from the OR gate disconnects the path from the current source
and connects to the resistor. So the capacitor discharges Joonsuk Lee (S’99) received the B.S. and M.S. de-
through the resistor . The switching action occurs every grees in electrical engineering and computer sciences
from Korea Advanced Institute of Science and Tech-
clock cycle period. nology (KAIST), Taejon, Korea, in 1995 and 1997,
The voltage of the bandwidth capacitor at time respectively. Since 1997 he has been working toward
can be written as the Ph.D. degree at the same university.
From 1999 to 2000, he was with IBM Microelec-
tronics, Boston, MA, as an Analog and Mixed Signal
Designer involved in a high performance sigma–delta
ADC/DAC project with Motorola, Lowell, MA. His
research interests include PLL/DLL, timing recovery
(7) algorithms, high-speed SDRAM interface, and LAN and mixed-mode signal
processing technique for telecommunication IC’s.
where is the voltage of the previous capacitor voltage Mr. Lee is the Gold Medal winner of the Human-Tech Thesis Prize from Sam-
sung Electronics Co. Ltd. in 1997, the Gold Medal winner of the Chip Design
at time . The voltage equation can be simplified to (8). Contest from LG Semicon Co. Ltd. in 1998, and the Gold Medal winner of the
Integrated Design Center (IDEC) Award in 1998.
(8)

Here ,
. In the initial locking mode, Beomsup Kim (S’87–M’90–SM’95) received the
the AAPLL does the locking operation based on (8). Once B.S. and M.S. degrees in electronic engineering
from Seoul National University, Seoul, Korea, in
the AAPLL finished the phase and frequency locking, the 1983 and 1985, respectively, and the Ph.D. degree in
phase error is far less than . In this case, the forgetting electrical engineering and computer sciences from
factor and the proportional coefficient can be be replaced by the University of California, Berkeley, in 1990.
From 1986 to 1990, he worked as a Graduate Re-
and . searcher and Graduate Instructor at Department of
Electrical Engineering and Computer Sciences, Uni-
REFERENCES versity of California, Berkeley. From 1990 to 1991,
he was with Chips and Technologies, Inc., San Jose,
[1] J. Dunning et al., “An all-digital phase-locked loop with 50-cycle lock CA, where he was involved in designing high speed-signal processing IC’s for
time suitable for high-performance microprocessors,” IEEE J. Solid- disk drive read/write channels. From 1991 to 1993, he was with Philips Re-
State Circuits, vol. 30, pp. 412–422, Apr. 1995. search, Palo Alto, CA, where he was conducting research on digital signal pro-
[2] B. Kim, D. N. Helman, and P. R. Gray, “A 30-MHz hybrid analog/digital cessing for video, wireless communication, and disk drive applications. During
clock recovery circuit in 2-m CMOS,” IEEE J. Sold-State Circuits, vol. 1994, he was a Consultant, developing the partial-response maximum likeli-
25, pp. 1385–1394, Dec. 1990. hood detection scheme of the disk drive read/write channel. In 1994, he became
[3] M. Mizuno et al., “A 0.18 m CMOS hot-standby phase-locked loop an Assistant Professor with the Department of Electrical Engineering, Korea
using a noise immune adaptive-gain voltage-controlled oscillator,” Advanced Institute of Science and Technology (KAIST), Taejon, Korea, and
ISSCC Dig. Tech. Papers, pp. 268–269, Feb. 1995. is currently an Associate Professor. During 1999, he took a sabbatical leave
[4] G. Roh, Y. Lee, and B. Kim, “An optimum phase-acquisition technique and stayed at Stanford University, Stanford, CA, and also consulted for Marvell
for charge-pump phase-locked loops,” IEEE Trans. Circuit Syst. II, vol. Semiconductor Inc., San Jose, CA, on the Gigabit Ethernet and wireless LAN
44, pp. 729–740, Sept. 1997. DSP architecture. His research interests include mixed-mode signal processing
[5] B. Chun, Y. Lee, and B. Kim, “Design of variable loop gain of dual-loop IC design for telecommunications, disk drive, local area network, high-speed
DPLL,” IEEE Trans. Commun., vol. 45, pp. 1520–1522, Dec. 1997. analog IC design, and VLSI system design.
[6] P. F. Driessen, “DPLL bit synchronizer with rapid acquisition using Dr. Kim is a corecipient of the Best Paper Award (1990–1991) for the IEEE
adaptive Kalman filtering techniques,” IEEE Trans. Commun., vol. 452, JOURNAL OF SOLID-STATE CIRCUITS, and received the Philips Employee Reward
pp. 2673–2675, Sept. 1994. in 1992. Between June 1993 and June 1995, he served as an Associate Editor for
[7] B. Kim, “Dual-loop DPLL gear-shifting algorithm for fast synchroniza- the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL
tion,” IEEE Trans. Circuits Syst. II, vol. 44, pp. 577–586, July 1997. SIGNAL PROCESSING.
632 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

A Portable Digital DLL for


High-Speed CMOS Interface Circuits
Bruno W. Garlepp, Kevin S. Donnelly, Associate Member, IEEE, Jun Kim, Pak S. Chau,
Jared L. Zerbe, Charles Huang, Chanh V. Tran, Clemenz L. Portmann, Member, IEEE,
Donald Stark, Yiu-Fai Chan, Member, IEEE, Thomas H. Lee, Member, IEEE, and Mark A. Horowitz

Abstract— A digital delay-locked loop (DLL) that achieves the interface cells to provide internal on-chip clocks that
infinite phase range and 40-ps worst case phase resolution at are aligned in phase with an external system clock. The
400 MHz was developed in a 3.3-V, 0.4-m standard CMOS clock alignment circuits must provide a phase resolution
process. The DLL uses dual delay lines with an end-of-cycle detec-
tor, phase blenders, and duty-cycle correcting multiplexers. This better than 50 ps and produce a worst case long-term jitter
more easily process-portable DLL achieves jitter performance of less than 250 ps peak-to-peak (p–p). To facilitate the
comparable to a more complex analog DLL when placed into use of many different application-specific integrated-circuit
identical high-speed interface circuits fabricated on the same controllers with the memory system, the clock alignment
test-chip die. At 400 MHz, the digital DLL provides <250 ps
circuit should be easily portable across multiple processes
peak-to-peak long-term jitter at 3.3 V and operates down to 1.7 V,
where it dissipates 60 mW. The DLL occupies 0.96 mm2 : without compromising performance.
The clock alignment function can be provided using either
Index Terms—Delay circuits, delay-locked loops (DLL’s), dig-
ital control, digital DLL, phase blending, phase control, phase
phase-locked loops (PLL’s) or DLL’s. Because frequency syn-
synchronization. thesis is not needed in this application, DLL’s are preferred for
their unconditional stability, lower phase-error accumulation,
and faster locking time. In previous designs of the interface
I. INTRODUCTION cells for this memory system, we have used an analog DLL
with a two-step coarse/fine architecture. A high-level drawing
I N RECENT years, there has been a great deal of interest
in delay-locked loops (DLL’s) for clock alignment. Both
analog and digital DLL’s have been developed [1]–[6], with
of this approach is shown in Fig. 1. This analog DLL includes
a quadrature generator, which produces four reference signals
analog loops generally providing better jitter performance spaced 90 apart in phase to evenly cover the full 360
at the expense of greater complexity. This paper describes of phase space. A phase interpolator circuit in the analog
a digital DLL that achieves jitter performance comparable DLL receives these reference signals and selects a phase
to an analog DLL. Although the digital DLL uses more adjacent pair that define a phase quadrant for interpolation to
area and power than the analog DLL, its greater simplicity, produce an output signal phase-aligned to a reference signal,
easier portability, and lower minimum required supply voltage RefClk.
makes it very attractive in many clock alignment applications. Analog DLL’s constructed with this approach provide sev-
Additionally, the digital DLL not only operates at lower supply eral significant benefits. Because most of the elements in the
voltages than the analog DLL but it also demonstrates that signal path can be made from differential analog blocks with
digital DLL’s have the potential for good power-consumption good power-supply rejection ratio (PSRR), the analog DLL
scaling as supply voltage is decreased. architecture of Fig. 1 can provide very good jitter performance.
The motivation for the development of this digital DLL Additionally, it can be carefully designed to occupy relatively
was the need for a clock alignment circuit for use in the little area and consume relatively little current. Furthermore,
CMOS interface cells [6] of a high-speed memory system the analog DLL can provide very small phase steps when
as in [7].1 The memory system operates at 400 MHz, with locked ( 50 ps). Finally, the architecture of Fig. 1 provides
data transferred on both edges of the clock, producing an infinite phase range, and one set of quadrature reference
effective 800-Mb/s/pin transfer rate. This corresponds to a signals can be fed to multiple phase interpolators, allowing
1.25-ns bit time. With such tight timing requirements, it phase alignment to multiple reference signals simultaneously.
becomes imperative to include clock alignment circuits in However, because of the relatively high analog complexity of
this DLL and its individual elements, the analog DLL of Fig. 1
Manuscript received September 15, 1998; revised December 23, 1998. requires a detailed, process-specific implementation, making it
B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang,
C. V. Tran, C. L. Portmann, D. Stark, and Y.-F. Chan are with Rambus, Inc., relatively labor intensive to port across multiple processes.
Mountain View, CA 94040 USA. Although we have traditionally used analog DLL’s to pro-
T. H. Lee and M. A. Horowitz are with the Center for Integrated Systems, vide the clock alignment function in the CMOS interface
Stanford University, Stanford, CA 94305 USA.
Publisher Item Identifier S 0018-9200(99)03668-9. cells of the memory system described above, we decided to
1 Documentation is available at http://www.rambus.com/html/direct_docu- consider using a digital DLL. Digital DLL’s are characterized
mentation.html. by their use of a digital delay line and are typically made from
0018–9200/99$10.00  1999 IEEE
GARLEPP et al.: PORTABLE DIGITAL DLL 633

Fig. 1. Block diagram of a two-step, coarse/fine analog DLL architecture.

simple, digital circuit elements. This facilitates their design and CMOS interface cells on the same test-chip die. Section VI
portability across multiple processes. Additionally, because concludes this paper.
phase information in a digital DLL is stored as a digital The terms phase and delay are used throughout this paper
state, digital DLL’s can provide very fast timing recovery after to describe the DLL’s operation. It is helpful to recall that at a
being placed into a low power mode. However, conventional given system frequency, the two quantities are related by the
digital DLL’s provide only moderate phase resolution and jitter simple equation
performance [8], [9].
(1)
Another benefit of digital DLL’s is their ability to readily
operate at lower voltages than analog DLL’s. Because analog where is phase in degrees, is delay in seconds, and
DLL’s require the use of saturated current sources, they is frequency in hertz.
experience voltage headroom problems as supply voltages
decrease. Digital DLL’s, on the other hand, need only enough II. DIGITAL DELAY CIRCUIT TECHNIQUES
voltage to ensure the proper operation of their digital gate
elements. For the same reason, digital DLL’s better utilize A. Conventional Digital Delay Lines
the power-saving benefits of digital CMOS voltage scaling
than analog DLL’s. The power of an analog DLL is typically As mentioned above, the purpose of a DLL in a clock
alignment application is to provide an output clock signal that
distributed between IV power (where I is power and V is
is aligned in phase with a reference clock signal of the same
voltage) from the constant current (differential) stages and
frequency. To do this, the DLL must include a mechanism for
CV f power (where C is capacitance and f is frequency) from
providing a variable delay to an input signal. The DLL then
the CMOS (single-ended) stages (if any). The power of digital
adjusts this variable delay such that the input signal passes
DLL’s, on the other hand, is determined primarily by CV f
through the delay mechanism and emerges at the output of the
power, which decreases quadratically with supply voltage.
DLL aligned in phase with the reference signal.
This paper describes a digital DLL [10] used as the clock
Digital DLL’s generally incorporate a tapped digital delay
alignment circuit in the CMOS interface cells of a high-speed line as the variable-delay mechanism. The delay line receives
memory system. This work improves upon the performance of an input clock signal (e.g., a buffered version of the reference
previous digital DLL’s by paralleling the two-step coarse/fine signal) and passes it through a series of delay elements. The
analog DLL architectures presented in [4], [5], [7], and [11], outputs of the delay elements are tapped and buffered to
allowing the digital DLL to achieve jitter performance com- provide a series of phase-adjacent signals. The DLL then
parable to the analog DLL’s. selects the delay-line tap that provides the signal that produces
This paper is arranged as follows. Section II describes an output with a phase that most closely matches the desired
delay-generation techniques used in conventional digital phase.
DLL’s and describes the improved techniques implemented A conventional delay line suitable for a CMOS digital DLL
in the new DLL. This section also describes infinite phase is shown in Fig. 2. The delay elements could be implemented
generation with the new delay-line scheme. Section III with almost any circuit block, but because the phase resolution
describes several new circuit techniques used for enhancing of the delay line is determined by the delay through the delay
the phase resolution and signal quality in the new digital DLL. elements, delay elements that provide minimal delay are gen-
Section IV describes the overall DLL architecture. Section V erally preferred. Thus, the delay line of Fig. 2 uses inverters,
discusses our test chip and measured results, with special since they provide the shortest delay of any CMOS digital gate.
attention given to making a direct, side-by-side comparison of Because of the inverting characteristic of all standard CMOS
the new digital DLL with an analog DLL placed into identical gates, the delay line is tapped only at every other inverter
634 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Fig. 2. Conventional digital delay line with inverter delay elements.

Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.

output to ensure that each successive tap provides a signal ClkInb. Because of the use of complementary inputs, the two
that is adjacent in phase to the signals at its adjacent taps. delay lines are tapped after every inverter to provide phase-
Although conventional delay lines are attractive for their adjacent signals separated by only one inverter delay, thereby
simplicity, DLL’s designed around such conventional delay improving the phase resolution by a factor of two. An example
lines suffer from several significant limitations. First, the delay of how this delay-line scheme provides single inverter delay
line provides fairly coarse phase resolution. For example, the resolution is shown by the shaded paths in Fig. 3. The signal
delay line in Fig. 2 provides a minimum phase step corre- that emerges from Tap 2 has passed through three inverter
sponding to two inverter delays. Such coarse phase resolution delays, while the signal that emerges from Tap 3 has passed
is not fine enough for our clock alignment application. Second, through four inverter delays. However, ClkInb is exactly 180
conventional delay lines deliver only a finite phase range. out of phase with ClkIn, providing the additional inversion
Typically, in order to cover at least one full cycle of phase, the required to ensure that the signals emerging from Taps 2 and
delay-line length and element delays are adjusted to provide 3 are indeed separated in phase by exactly one inverter delay.
at least 360 of phase under the fastest process, voltage, This complementary delay-line architecture also allows the
and temperature (PVT) conditions and minimum operating delay lines to be made shorter. The true taps from the delay
frequency More often, however, the delay line can provide the first 180 of phase, while the complement
line is designed with as much as 720 (i.e., two cycles) taps can provide the second 180 of phase. Thus, each of
of phase under these conditions. This requires the use of a the two delay lines can be tuned for only 180 of phase
long delay line, occupying a large silicon area and dissipating under the fastest PVT conditions and Shorter delay
additional power as the input signal propagates through the lines provide the additional benefits of reduced maximum
many delay elements. Additionally, because inverters offer jitter accumulation, smaller silicon area, and lower power
poor PSRR, voltage supply noise-induced jitter can accumulate consumption. The problem that this design creates is a need to
as the signal propagates down the delay line. This causes determine when to switch from the true taps to the complement
the signals available from the later taps in the delay line taps and vice versa to ensure full and even coverage of the
to be more jitter prone than the signals from the earlier entire 360 phase plane. This is particularly important because
taps. Last, even with an extended delay line, the DLL can the number of delay elements (and output taps) needed to cover
nonetheless run out of phase range and lose lock in a system 180 changes with PVT conditions and operating frequency.
with slowing drifting phase (e.g., spread-spectrum clocking).
These limitations prohibited the use of a conventional delay
line in our DLL design. C. Infinite Phase Generation
To solve the problem of determining when to switch be-
tween the true and complement taps of the complementary
B. Delay-Line Improvements delay line, we developed an end-of-cycle (EOC) detector, as
To overcome some of these limitations, we developed a shown in Fig. 4, for use with the complementary delay line. An
complementary delay line as shown in Fig. 3 for our DLL. EOC detector is essentially a bank of data flip-flops arranged
In this architecture, two parallel delay lines with weak cross as a time-to-digital converter for measuring the delay through
coupling are driven by complementary input signals ClkIn and the delay line. The EOC detector produces a thermometer code
GARLEPP et al.: PORTABLE DIGITAL DLL 635

Fig. 4. EOC detector circuit (180 ).

In other words, to travel counterclockwise around the phase


plane, the DLL would successively select Taps 1–4, then Taps
1b–4b, then Taps 1–4, etc., to provide infinite phase range.
In this manner, all phase steps are equivalent to at most one
inverter delay (i.e., 50 ), except for the Tap 4 to Tap 1b and
the Tap 4b to Tap 1 transitions, which are less (30 ).

III. RESOLUTION-ENHANCING CIRCUIT TECHNIQUES

A. Phase Blending
Although the delay-line improvements discussed above re-
duced the required power and area of the delay line, improved
its jitter accumulation performance, enabled infinite phase
range, and improved the available phase resolution by a factor
Fig. 5. Phasor diagram with phasors of signals from the taps of a comple- of two, this phase resolution was still not good enough to
mentary delay line with one inverter delay= 50 : meet the requirements of our memory system. In the 0.4- m
process we used, the propagation delay of one inverter over all
indicating the first 180 of delay in the delay lines. The first anticipated PVT conditions varied from 100 to 300 ps. This
state transition in the EOC code indicates the first true tap is much larger than the worst case phase step specification of
from the delay line that provides a signal with phase that 50 ps. Therefore, to ensure compliance with this specification,
lags the phase of the signal from Tap 1 by more than 180 the DLL’s phase resolution needed to be improved by at least
With this information, the DLL logic knows when to switch six times over what the delay line provided.
between the true and complement taps of the delay line to To solve this problem, we used inverter phase blend-
ensure full coverage of all 360 of phase space, with phase ing. A simple, single-stage phase-blender circuit is shown
steps of at most one inverter delay. Use of the EOC code also in Fig. 6(a). This circuit receives two phase-adjacent input
prevents negative phase steps in the phase-transfer function as signals, and , which are separated in phase by one
taps are successively selected from the delay line. This allows inverter delay. The phase blender directly passes these two
the complementary delay lines to provide infinite, monotonic signals with a simple delay to produce output signals and
phase range for the DLL. The clocking signal for the EOC However, it also uses a pair of phase-blending inverters to
detector, SampClk, is synchronized to the signal from Tap 1 interpolate between these two input signals to produce a third
by a replica timing network (not shown). output signal, , having a phase between that of and
To illustrate the principle of infinite phase generation using This effectively doubles the available phase resolution.
the EOC code with this delay-line scheme, refer to Fig. 5, However, it is not sufficient to use equal-sized inverters
which shows a phasor diagram of the signals from the first for the phase blending. Fig. 6(b) illustrates a simple model
five true and complement taps of a complementary delay line [12] used for determining the ideal relative sizes of the two
like the one shown in Fig. 3. The figure assumes that the phase-blending inverters to ensure that the phase of lies
PVT conditions and operating frequency are such that the directly between that of and The model approximates
propagation delay of each inverter stage is equal to 50 of the two inverters with two simple switched current sources
phase. In the figure, the solid lines correspond to signals from sharing a common resistance–capacitance (RC) load. For two
the true taps, while dashed lines correspond to signals from rising edge input signals separated in time by the model
the complement taps. Because Tap 5 delivers a signal that is yields the equation
delayed by 200 from the signal at Tap 1, the EOC detector’s
thermometer code would indicate that Tap 5 is the first true
tap to provide a signal with phase beyond 180 relative to the
signal from Tap 1. With this information, the DLL knows to
(2)
switch between the true and complement taps after four stages.
636 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a) (c)

(b) (d) (e)


Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot of
signal voltages in the simple model for w = WA =(WA + WB ) = 0:50, (d) phase-blender output signal edges for w = 0:50, and (e) phase-blender
output signal edges for w = 0:60:

where is the total resistive load, is the output capacitance, Another design constraint of the phase-blender circuit is that
is the total pulldown current of the two phase-blending all paths through the circuit must provide precisely the same
inverters, is the unit step function, and is the phase- loading and delay to ensure that the phase relationship between
blending inverter relative size ratio [refer to Fig. 6(a), where and is maintained by and
is the ratio of the device widths in The phase-blender idea can be extended to multiple cas-
inverter to the total device widths in both inverters and caded stages for further phase-resolution improvement, with
]. Equation (2) is the sum of two decaying exponential terms, each additional stage improving the resolution by a factor of
and Fig. 6(c) shows a plot of the resulting waveform according two. Fig. 7 shows a two-stage cascaded phase-blender circuit
to this equation for the case where Because the that provides a 4x improvement in phase resolution from input
second exponential term is delayed in time by relative to to output. Although it is theoretically possible to increase phase
the first, it only begins to affect the slope of the decay after resolution indefinitely by adding more and more phase-blender
this delay has elapsed. Therefore, without explicitly solving stages, there is a practical limit. The number of inverters in
the equation for each case of and it is not each signal path increases by two with each additional phase-
obvious when will cross blending stage, making the circuit increasingly susceptible
For input signals separated in phase by one inverter delay to voltage supply noise-induced jitter due to the additional
(i.e., ), the model specifies that in order to ensure delay in the signal path. Therefore, it is prudent to increase
that the phase of lies directly in between that of the number of blending stages to improve phase resolution
and the phase-blending inverters must be sized in a only until the output phase step size from the phase blender
ratio, such that the leading phase is is approximately equivalent to the anticipated voltage supply
coupled to an inverter that is bigger than the one that receives noise-induced jitter.
the lagging phase. This ratio was also confirmed empirically There are several design limitations that must be considered
with simulations. The effect of the relative sizing of the phase- when designing a cascaded phase blender. First, the impor-
blending inverters is illustrated in Fig. 6(d) and (e), which tance of proper (asymmetrical) sizing of the phase-blending
shows the resulting output signal edges for and inverters grows with the number of cascaded blending stages
, respectively. Clearly, the phase of output signal because edge misplacement has a compounding effect as the
is closer to that of than to that of when the signals travel through the multiple stages. Additionally, close
phase-blending inverter size ratio is Although attention must be paid to ensuring equal loading for equal
asymmetrical inverter sizing ensures good, evenly delay through all paths, requiring the use of dummy devices
spaced edge placement of the three output signals, it requires on otherwise unbalanced paths. Finally, like a single-stage
that lead Reversing the phase of these two input phase blender, a cascaded phase blender also requires the
signals would result in a severely misplaced since the phase of to lead that of to ensure even output phase
effective sizing ratio would then be spacing.
GARLEPP et al.: PORTABLE DIGITAL DLL 637

Fig. 7. Two-stage, cascaded phase-blender circuit for 4x phase-resolution improvement.

Fig. 8. Three-stage, symmetrical phase-blender circuit.

To overcome these design limitations of the cascaded phase phase. Beginning with output outputs
blender, we developed a symmetrical phase blender. A block can be successively selected to evenly span
diagram of a three-stage symmetrical phase blender is shown the phase range between and Once is selected,
in Fig. 8. This circuit is essentially two parallel cascaded can be changed to another signal that lags This
phase-blender circuits, sharing some common paths. When switching is possible without affecting the signal be-
leads the outputs provide cause has no dependence on or coupling from Then
equal output phase spacing. When leads the out- outputs can be successively se-
puts provide equal output phase lected to evenly span the phase range between and
spacing. Therefore, the circuit provides phase blending with an Once is selected, can be changed to yet another
8x improvement in phase resolution and equally spaced output signal that lags Again, this is possible without any change
signals regardless of which input signal leads in phase. in the signal because has no dependence on or
Additionally, the symmetrical blender allows for seamless coupling from This process can continue indefinitely.
input switching for continuous phase blending over multiple Also, because all paths through the symmetrical phase blender
input delays. For example, assume that leads in are inherently balanced, no dummy devices are needed.
638 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a)

(b)
Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.

B. Signal Selection and Duty-Cycle Correction cycle correcting functionality to the multiplexing circuitry, we
Since the digital DLL was to be placed into a memory implemented duty-cycle correction while requiring minimal
system that exchanges data on both edges of the clock, good additional power, area, and delay.
duty cycle (i.e., close to 50%) is required to ensure that the A 16 : 1 duty-cycle correcting multiplexer is shown in
data exchanged on either edge of the clock have equal bit Fig. 9(a) with a corresponding control circuit in Fig. 9(b). To
times. Duty-cycle distortion is usually addressed in PLL’s by facilitate understanding of this circuit’s operation, consider an
simply running the PLL’s voltage-controlled oscillator (VCO) example. Assume that signal is selected and has duty-
at twice the system frequency and using a postdivider triggered cycle distortion such that output signal has a high
on one edge of the VCO output to produce the output clock duty cycle. Assume also that is sensed by a duty-cycle
from the PLL [13]–[15]. This ensures good, 50% duty cycle. In error detector, which produces a differential output error signal
a DLL, however, no frequency multiplication is possible. The proportional to the difference in duty cycle be-
duty cycle of the output signal must be directly corrected to tween and the ideal 50%. Thus, in our example,
50%, for example, by using a duty-cycle correcting amplifier will be greater than causing more current to be steered
in the signal path as in Fig. 1 and in [4]. through the right branch of the control signal in Fig. 9(b) than
Although duty-cycle correction can be addressed by placing through the left side. This in turn increases the strength of
a duty-cycle corrector at the output of the DLL, this approach and compared to and in the duty-cycle
has several limitations. First, since duty cycle is corrected only correcting multiplexer of Fig. 9(a). These transistors alter the
at the output of the DLL, internal DLL signals may have duty cycle of the signal as it passes from to driving
poor duty cycle. It is good practice, however, to maintain to the ideal 50% duty cycle. The use of both PMOS and
50% duty cycle throughout the signal path to maximize signal NMOS devices to perform the duty-cycle correction ensures
propagation as frequency is increased. Second, performing all a symmetrical duty-cycle correction range. Furthermore, be-
the duty-cycle correction in one stage at the output of the cause duty-cycle correction has been distributed through two
DLL places a great deal of strain on the duty-cycle correcting stages, the requirements on each individual duty-cycle correct-
circuit; it must have a large duty-cycle correction range to ing stage are reduced. By combining both necessary functions
compensate for all the duty-cycle distortion that can accumu- of signal selection and duty-cycle correction, this circuit
late in the signal path. Finally, adding a duty-cycle corrector minimizes signal path delay, jitter accumulation, circuit area,
directly into the signal path increases signal path delay, and and power compared to performing both functions separately.
thus susceptibility to voltage supply noise-induced jitter.
To address the issue of duty cycle, we developed the IV. DLL ARCHITECTURE
idea of duty-cycle correcting multiplexers. Since multiplexers Fig. 10 is a block diagram of the entire digital DLL, with
would be needed in our DLL regardless, by adding duty- shading indicating the circuit blocks that were described in
GARLEPP et al.: PORTABLE DIGITAL DLL 639

Fig. 10. Complete block diagram of the new digital DLL.

greater detail above. The DLL receives an input clock ExtClk


and passes it through a clock amplifier and splitter to provide
the two complementary input signals (ClkIn and ClkInb) to a
16-stage, 32-tap complementary delay line with EOC detector.
The delay line provides 32 signals at its output taps, which then
feed into two 32 : 1 duty-cycle correcting multiplexers. Each
multiplexer selects one of a pair of phase-adjacent signals
from the delay line. The two selected signals then pass to
a three-stage, 2 : 16 symmetrical phase-blender circuit, which (a) (b)
improves the phase resolution by a factor of eight. A final 16 : 1
duty-cycle correcting multiplexer selects one of the phase-
blender output signals and passes it through a clock tree to
provide the DLL’s output signal ClkOut. The digital DLL also
includes two independent duty-cycle correction loops as shown
in the figure. By using two separated duty-cycle correcting
loops, duty-cycle correction is distributed throughout the signal
path. This ensures a good duty cycle throughout the signal path
and reduces the duty-cycle correcting requirements of any one Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLL
of [6] and on the right side (b) the new digital DLL integrated into identical
stage. interface cells.
The DLL uses bang-bang-type, all-digital feedback to lock
the phase of its output signal ClkOut to that of a reference through the signal path of the circuit from ExtClk to ClkOut
signal RefClk. A phase detector compares the phase of ClkOut to provide a phase-locked output signal.
to RefClk and produces a binary error signal, which passes It is important to recognize the role of the EOC detector and
through an optional digital filter to a control logic circuit. The code in this architecture. Because the delay line and blender
digital filter is a simple majority detector, which has no effect are uncontrolled, open-loop circuits, the architecture relies on
when the loop is acquiring lock but reduces dithering once the control circuit’s use of the EOC code to ensure proper
lock is acquired. The control logic is composed of simple coarse phase selection, small maximum phase step size, and
combinational logic and counters that drive the multiplexers phase transfer function monotonicity. The EOC code enables
to select the two phase-adjacent coarse phase signals from the the control logic to determine when to switch between the true
delay line and the fine phase signal from the phase blender and complement taps of the delay line to ensure that phase-
that minimize the phase error between ClkOut and RefClk. adjacent taps are always selected by the coarse multiplexers
Because the phase information is stored in this DLL as a for the phase blender. The EOC code also enables the control
digital state, the DLL can quickly recover from low-power logic to determine which set of blender taps provides evenly
modes, requiring only enough time for the signals to propagate spaced output signals.
640 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a) (b)
Fig. 12. Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.

V. MEASURED PERFORMANCE the signal path of the digital DLL. (Note: I/O circuit duty-
cycle distortion produced the unequal eyes in both diagrams.
A. Test Chip This is unrelated to the DLL’s.)
Both the digital DLL presented here and an implementation Fig. 13(a) and (b) shows receive shmoo diagrams for the
of the analog DLL of Donnelly et al. [6] were integrated into two interfaces with the analog and digital DLL’s, respectively.
identical high-speed CMOS interface cells on opposite sides The diagrams indicate the CMOS interfaces’ valid timing win-
of a single test chip. A micrograph of this test chip is shown in dows for receiving data. On the diagrams, the -axis is supply
Fig. 11. The test chip I/O was laid out symmetrically so that voltage (2.5 V 4.0 V) while the -axis indicates input
either interface cell could be tested on the same hardware by data positioning along a bit period ( Mb/s ns).
simply removing the test chip from the test socket, rotating The normal data position is in the center of the bit period. A
it 180 and reinserting it into the socket. This allowed a black dot in the diagram indicates incorrectly received data for
true side-by-side comparison of the two DLL’s operating in a that combination of bit position and Ideally, the window
system. The test-chip circuits were fabricated using a standard should be entirely white, but realistically, it is limited by jitter
0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages. from the DLL and other sources. Therefore, this test measures
the amount of tolerable skew on the input timing over a range
B. Test Results of supply voltages. Although the interface with the analog DLL
delivers better timing performance than the interface with the
Unless indicated otherwise, all test results described in this
section were measured with the analog and digital DLL’s digital DLL (1.02 versus 0.92 ns), both meet the component
operating in their respective high-speed interface cells at 3.3 V specification of 0.85 ns.
and 400 MHz (800 Mb/s/pin) using the same test vectors. Fig. 14 is a circle plot of the measured phase of the DLL’s
Additionally, the test chip included noise-generator circuits, output signal ClkOut, illustrating the DLL’s ability to provide
which produced digital switching noise during the testing of infinite phase range. The -axis indicates delay [or phase, as in
both interfaces. (1)] of the ClkOut signal relative to a fixed 400-MHz signal.
Fig. 12(a) and (b) shows eye diagrams of the two interfaces The -axis indicates cycle count. These data were measured by
with the analog and digital DLL’s, respectively. The diagrams probing the on-chip DLL output signal (ClkOut) and forcing
indicate the output timing performance of the interface cells the DLL’s phase-detector output low. This caused the DLL’s
in the test system. Although the interface with the analog output phase to continually advance over time. The term circle
DLL provided slightly better timing performance, 320 ps p–p plot is used because this diagram is equivalent to sweeping a
versus 380 ps p–p for the interface with the digital DLL, the phasor that represents the phase of ClkOut around the phase
performances of both interfaces (and therefore, both DLL’s) plane, thereby drawing a circle in the phase plane. Because
were comparable. This is surprisingly good considering the the phase of ClkOut is measured relative to a fixed 400-MHz
extensive use of poor PSRR elements, such as inverters, in signal, the plotted delay appears modulo 2.5 ns, where ns
GARLEPP et al.: PORTABLE DIGITAL DLL 641

(a) (b)
Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6]
and (b) the new digital DLL.

Fig. 14. Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.

at 400 MHz. The absolute value of delay (i.e., from 3.4 the delay line. The slope of the transfer function depends on
to 5.9 ns) is irrelevant since it includes some test-system setup PVT conditions and system frequency, since these conditions
time. The data were measured and plotted using a time-interval determine how many delay-line taps are required to provide
analyzer. 180 of phase. In this case, nine taps were required, resulting
The circle plot illustrates the DLL’s phase transfer function, in an average phase step size of 20 ps or 2.9
showing its reasonably good linearity, monotonicity, and lack Table I presents a summary of many of the measured and
of discontinuities. The small bumps in the transfer function simulated results of the analog and digital DLL’s operating in
indicate a change in coarse reference phase selected from their respective CMOS interfaces. Although the analog DLL
642 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a) (b)
Fig. 15. Measured DLL power consumption (a) as a function frequency for VDD = 3:3 V and (b) as a function supply voltage for f = 400 MHz.

TABLE I relatively higher CV f term. This indicates that digital DLL’s


ANALOG AND DIGITAL DLL PERFORMANCE SUMMARY AT 3.3 V AND 400 MHz have the potential for providing better power scaling than
analog DLL’s as supply voltages decrease in the future.
Finally, we have shown in Table I and in Fig. 15(b) that
the digital DLL operates at lower supply voltages than the
analog DLL. Although the operation of the digital DLL was
limited to 1.7 V, this limitation was due to our use of several
analog elements in the digital DLL (i.e., it was a mostly digital
DLL). The digital DLL used an analog clock amplifier, two
analog duty-cycle error detectors (see Fig. 10), and an analog
quadrature phase detector (in a second loop, not shown). Using
an analog design for these circuit blocks in the digital DLL
was faster to implement without preventing evaluation of the
key digital blocks in the DLL, but their use determined the
uses less power and area, and provides better timing perfor- minimum supply voltage of the digital DLL.
mance (smaller long-term jitter) and phase resolution (smaller
maximum phase step), both DLL’s enable the interface cells to VI. CONCLUSION
meet the component requirements when operating in the test We have described the architecture of a portable digital
system. Additionally, the digital DLL has a higher maximum DLL and demonstrated that it provides jitter performance
operating frequency, works at lower supply voltages, and comparable to an analog DLL when fabricated in the same
requires much less effort to port to other processes (one versus 3.3-V, 0.4- m standard CMOS process. Several circuits were
four man-months). developed to enable the DLL to provide very fine phase
Fig. 15(a) and (b) shows plots of measured DLL power resolution, infinite phase range, and good duty-cycle perfor-
versus frequency at V and measured DLL power mance throughout the signal path. Despite its relatively simple
versus voltage supply at MHz, respectively. Although architecture, the digital DLL meets all system specifications,
both plots show that the digital DLL dissipated more power and it operates down to lower supply voltages than its analog
than the analog DLL for all measured conditions, the plots il- counterpart. Utilizing essentially only simple digital CMOS
lustrate the different characteristics of the power consumed by gates, the DLL can be ported to new processes in mini-
the two DLL’s. As mentioned earlier, the power of both DLL’s mal time. For these reasons, this digital DLL provides an
is distributed between IV power in the constant-current stages alternative to analog DLL’s for clock alignment applications.
and CV f power in the CMOS stages. The curves in Fig. 15(a)
show that the digital DLL’s power dissipation has a greater ACKNOWLEDGMENT
dependence on frequency than does the analog DLL’s power. The authors thank J. McBride and P. Gordon for layout
The curves in Fig. 15(b) show that the digital DLL’s power support and S. Sidiropoulos for helpful insights.
dissipation has a predominantly square-law dependence on
supply voltage, whereas the analog DLL’s power dissipation REFERENCES
has a mixed square-law and linear dependence. These trends
[1] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, “Multifrequency
confirm that the power of the analog DLL has a relatively zero-jitter delay-locked loop,” IEEE J. Solid-State Circuits, vol. 29, pp.
higher IV term, whereas the power of the digital DLL has a 67–70, Jan. 1994.
GARLEPP et al.: PORTABLE DIGITAL DLL 643

[2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo, Kevin S. Donnelly (A’93) was born in Los Angeles,
“Skew minimization techniques for 256 Mb synchronous DRAM and CA, in 1961. He received the B.S. degree in elec-
beyond,” in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192–193. trical engineering and computer science from the
[3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H. University of California, Berkeley, in 1985 and the
Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura, M.S. degree in electrical engineering from San Jose
K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T. State University, San Jose, CA, in 1992.
Anezaki, M. Hasegawa, and M. Taguchi, “A 256 Mb SDRAM using He was with Memorex, Sipex, and National Semi-
register-controlled digital DLL,” in ISSCC 1997 Dig. Tech. Papers, Feb. conductor, specializing in bipolar and BiCMOS
1997, pp. 72–73. analog circuits for disk-drive read/write and servo
[4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, “A channels. In 1992, he joined Rambus, Inc., Moun-
2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,” tain View, CA, where he has designed high-speed
IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. CMOS PLL circuits for clock recovery and data synchronization, and high-
[5] S. Sidiropoulos and M. Horowitz, “A semidigital dual delay-locked speed I/O circuits. He currently manages a group developing I/O circuits
loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997. and PLL’s. His interests include PLL’s and DLL’s, I/O circuits, and data
[6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P. converters. He is a Member of the ISSCC Digital Subcommittee. He has
Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M. received several circuit design patents.
Johnson, “A 660MB/s interface megacell portable circuit in 0.3 m–0.7 Mr. Donnelly is a coauthor of the paper that won the Best Paper Award
m CMOS ASIC,” IEEE J. Solid-State Circuits, vol. 31, pp. 1995–2003, at the 1994 ISSCC.
Dec. 1996.
[7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase,
T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin,
M. Horowitz, T. Lee, and V. Lee, “A 500-Megabyte/s data-rate 4.5M
DRAM,” IEEE J. Solid-State Circuits, vol. 28, pp. 490–508, Apr. 1993. Jun Kim was born in Tokyo, Japan, on November
[8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H. 14, 1966. He received the B.S.E.E. degree from the
Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T. University of California, Berkeley, in 1989.
Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M. From 1989 to 1991, he was with Vitelic, Inc.,
Horiguchi, and Y. Nakagome, “A 256 Mb SDRAM with subthreshold where he worked on SRAM and DRAM develop-
leakage current suppression,” in ISSCC 1998 Dig. Tech. Papers, Feb. ment. Between 1991 and 1994, he was with Sun
1998, pp. 80–81. Microsystems, where he was involved in micropro-
[9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara, cessor and digital circuit design. Since 1994, he
T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan, has been with Rambus, Inc., Mountain View, CA,
M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi, as a Designer of high-speed CMOS I/O and DLL
T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose, circuits.
T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, “A 2.5
ns clock access 250 MHz 256 Mb SDRAM with synchronous mirror
delay,” ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374–375.
[10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran,
C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, “A portable Pak S. Chau was born in Hong Kong in 1966.
digital DLL architecture for CMOS interface circuits,” in VLSI Circuits He received the B.S. degree in computer system
Dig. Tech. Papers, June 1998, pp. 214–215. engineering from the University of Massachusetts,
[11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G. Amherst, in 1989 and the M.S. degree in electri-
Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee, cal engineering from the University of California,
V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera, Davis, in 1991.
T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, “A process He was with National Semiconductor and Chron-
independent 800 MB/s DRAM bytewide interface featuring command tel, Inc., where he worked as an Analog Circuit
interleaving and concurrent memory operation,” in ISSCC 1998 Dig. Designer. In 1994, he joined Rambus, Inc., Moun-
Tech. Papers, Feb. 1998, pp. 156–157. tain View, CA, where he has engaged in designing
[12] S. Sidiropoulos, “High-performance interchip signalling,” Ph.D. dis- high-speed I/O and DLL circuits.
sertation, Computer Systems Laboratory, Stanford University, Stan-
ford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 from
http://elib.stanford.edu/.
[13] I. Young, M. Mar, and B. Bhushan, “A 0.35 m CMOS 3-880 MHz
PLL N/2 multiplier and distribution network with low jitter for micro- Jared L. Zerbe was born in New York, NY, in
processors,” in ISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330–331. 1965. He received the B.S. degree in electrical en-
[14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, “A 320 MHz, gineering from Stanford University, Stanford, CA,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,” in 1987.
in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132–133. He joined VLSI Technology, Inc., in 1987, where
[15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, “A 600 he worked on semicustom ASIC design. In 1989, he
MHz CMOS PLL microprocessor clock generator with a 1.2 GHz joined MIPS Computer Systems, where he designed
VCO,” in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396–397. high-performance floating-point blocks. Since 1992,
he has been with Rambus Inc., Mountain View, CA,
where he has specialized in the design of high-
speed I/O and PLL/DLL clock recovery and data
synchronization circuits.

Bruno W. Garlepp was born in Bahia, Brazil, on


October 29, 1970. He received the B.S.E.E. degree
from the University of California, Los Angeles, Charles Huang received the B.S. degree in elec-
in 1993 and the M.S.E.E. degree from Stanford trical engineering from the University of Fuzhou,
University, Stanford, CA, in 1995. China, in 1982 and the M.S. degree in electrical
In 1993, he joined the Hughes Aircraft Advanced engineering from the University of Arkansas, Fayet-
Circuits Technology Center, Torrance, CA. There, teville, in 1990.
he designed high-precision analog integrated circuits He was with ULSI and SGI, working in the area
for A/D applications, as well as CMOS, bipolar, of PLL and cache circuit design. He joined Rambus,
and SiGe RF circuits for wide-band communica- Inc., Mountain View, CA, in 1994, where he has
tions applications. In 1996, he joined Rambus, Inc., being engaged in high-speed CMOS DLL and I/O
Mountain View, CA, where he designs and develops high-speed CMOS circuit design.
clocking and I/O circuits for synchronous chip-to-chip communication.
644 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Chanh V. Tran was born in Vietnam in 1964. He Yiu-Fai Chan (S’76–M’78) received the B.S. and
received the B.S. degree in electrical engineering M.S. degrees in electrical engineering and computer
and computer science form the University of Cali- science (with highest honors) from the University
fornia, Berkeley, in 1989. of California (UC), Berkeley, in 1972 and 1973,
From 1989 to 1992, he was with National Semi- respectively.
conductor Corp., Santa Clara, CA, where he worked He joined Rambus, Inc., Mountain View, CA, in
on CMOS mixed-signal IC design in the Data 1992, where he is Director of Engineering, respon-
Acquisition Group. In 1992, he joined Rambus Inc., sible for the development, application engineering,
Mountain View, CA, where he has been involved in and customer support of high-speed mixed-signal
DLL and high-speed I/O design. circuits, device packaging, signal integrity, and sys-
tem engineering. Prior to that, he was with Tera
Microsystems in charge of developing chips for workstations based on the
Sparc architecture. He was with Altera Corp. from 1983 to 1990, where he
led a team of engineers to develop the industry’s first CMOS programmable
Clemenz L. Portmann (S’92–M’95) received the logic devices. From 1976 to 1983, he held various technical and management
B.S.E.E. degree from the University of Washington, positions at Intersil, Inc. (later a division of General Electric), where he was
Seattle, in 1986, the M.S.E.E. degree from the engaged in the development of various CMOS memories, microprocessors,
University of Hawaii at Manoa, Honolulu, in 1988, and peripheral devices. It was there that he developed the first EPROM devices
and the Ph.D. degree in electrical engineering from in CMOS technology. From 1974 to 1976, he designed calculator and TV
Stanford University, Stanford, CA, in 1995. game integrated circuits at National Semiconductor. He has received several
From 1988 to 1989, he was a Visiting Researcher patents in circuits and systems technologies.
at Nagoya University, Nagoya, Japan, and the Toy- Mr. Chan is a member of Tau Beta Pi, Phi Beta Kappa, and Eta Kappa
ohashi University of Technology, Toyohashi, Japan, Nu. He received the University Science Fellowship from UC Berkeley and
under the Monbusho (Ministry of Education) schol- conducted research on solid-state devices and microwave acoustics. He has
arship program. From 1989 to 1990, he was a published in various IEEE technical publications and presented papers at IEEE
Design Engineer for VLSI Technology, Inc., San Jose, CA, where he designed technical conferences.
standard cell libraries and SRAM’s for ASIC designs. In 1995, he joined
Rambus, Inc., Mountain View, CA, where he is engaged in the design of
high-speed I/O circuits and DLL’s for DRAM interfaces.
Thomas H. Lee (S’87–M’87), for a photograph and biography, see this issue,
p. 585.

Donald Stark received the B.S. degree from the


Massachusetts Institute of Technology, Cambridge,
in 1985 and the M.S. and Ph.D. degrees from
Stanford University, Stanford, CA, in 1987 and Mark A. Horowitz, for a photograph and biography, see p. 528 of the April
1991, respectively, all in electrical engineering. 1999 issue of this JOURNAL.
His research interests at Stanford included circuit
design and CAD tools for analysis of voltage and
current distributions in VLSI circuits. From 1987
to 1991, he was also a Member of the Western
Research Laboratory, Digital Equipment Corp., Palo
Alto, CA, working on CAD development and ECL
circuit design. From 1991 to 1993, he was with the Semiconductor Device
Engineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAM
design. In 1993, he joined Rambus, Inc., Mountain View, CA, where he
currently works on DRAM, high-speed I/O design, and CAD.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999 565

A Register-Controlled Symmetrical DLL for Double-Data-Rate DRAM


Feng Lin, Jason Miller, Aaron Schoenfeld, Manny Ma, and R. Jacob Baker

Abstract— This paper describes a register-controlled symmet-


rical delay-locked loop (RSDLL) for use in a high-frequency
double-data-rate DRAM. The RSDLL inserts an optimum delay
between the clock input buffer and the clock output buffer,
making the DRAM output data change simultaneously with the
rising or falling edges of the input clock. This RSDLL is shown to
be insensitive to variations in temperature, power-supply voltage,
and process after being fabricated in 0.21-m CMOS technology.
The measured rms jitter is below 50 ps when the operating
frequency is in the range of 125–250 MHz.
Index Terms—Delay-locked loops, double-data rate, DRAM.

Fig. 1. Data timing chart for DDR DRAM.


I. INTRODUCTION

I N synchronous DRAM, the output data strobe (DQS)


should be locked to the data outputs (DQ outputs) for
high-speed performance. The clock-access and output-hold
diminish the undefined by synchronizing both rising
and falling edges of the DQS signal with the output data DQ.
times of conventional DRAM designs are determined by the The target specifications for the DLL described in this paper
delay time of the internal circuits such as the clock input are:
and output buffers. Variations in temperature and process 1) robust operation eliminating the need for postproduction
shifts will change the access time and make the valid data tuning (something required in an analog implementa-
window small. To optimize and stabilize the clock-access tion);
and output-hold times, an internal register-controlled delay- 2) operating frequency ranging from 143 (286 Mb/s/pin) to
locked loop (RDLL) [1], [2] has been used to adjust the 250 MHz (500 Mb/s/pin);
time difference between the output and input clock signals in
3) tight synchronization (skew less than 5% of the cycle
SDRAM. Since the RDLL is an all-digital design, it provides
time) between the output clock and data on both rising
robust operation over all process corners. Another solution to
and falling edges of the output clock;
the timing constraints found in SDRAM was given in [3] with
the synchronous mirror delay (SMD). Compared to RDLL, 4) low skew between the input and output clocks (with low,
SMD does not provide as tight of locking but has the advantage 5% duty cycle distortion);
that the time to acquire lock between the input and output 5) power-supply-voltage operating range from 2.5 to 3.5 V;
clocks is only two clock cycles. As the clock speeds used in 6) portability for ease of use in other processes.
DRAM continue to increase, the skew becomes the dominating
concern, outweighing the disadvantage of the added time to
acquire lock needed in an RDLL. II. RSDLL ARCHITECTURE
This paper describes a modified register-controlled sym- Fig. 2 shows the block diagram of the RSDLL. The replica
metrical delay-locked loop (RSDLL) used to meet the re- input buffer dummy delay in the feedback path is used to
quirements of double-data-rate (DDR) SDRAM (read/write match the delay of the input clock buffer. The phase detector
accesses occur on both rising and falling edges of the clock). (PD) is used to compare the relative timing of the edges of
Here, “symmetrical” means that the delay line used in the the input clock signal and the feedback clock signal, which
DLL has the same delay whether a high-to-low or a low-to- comes through the delay line, controlled by the shift register.
high logic signal is propagating along the line. The data output The outputs of the PD, shift-right and shift-left, are used to
timing diagram of a DDR SDRAM is shown in Fig. 1. The control the shift register. In the simplest case, one bit of the
RSDLL is used to increase the valid output data window and shift register is high. This single bit is used to select a point
of entry for CLKIn in the symmetrical delay line (more on
this later). When the rising edge of the input clock is within
Manuscript received September 3, 1998; revised November 2, 1998. This
work was supported by Micron Technology, Inc. the rising edges of the output clock and one unit delay of the
F. Lin and R. J. Baker are with the Microelectronics Research Center, output clock, both outputs of the PD, shift-right and shift-left,
University of Idaho, Boise, ID 83712 USA (e-mail: danlin@uidaho.edu). go to logic LOW and the loop is locked. The basic operation
J. Miller, A. Schoenfeld, and M. Ma are with Micron Technology, Inc.,
Boise, ID 83707-0006 USA. of the PD is shown in Fig. 3. The resolution of this RSDLL
Publisher Item Identifier S 0018-9200(99)02438-5. is determined by the size of a unit delay used in the delay
0018–9200/99$10.00  1999 IEEE
566 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 4. Symmetrical delay element used in RSDLL.


Fig. 2. Block diagram of RSDLL.

Fig. 5. Delay line and shift register for RSDLL.

switches from LOW to HIGH. An added benefit of the two-


NAND delay element is that two point-of-entry control signals
are now available. Both are used by the shift register to solve
the possible problem caused by the power-up ambiguity in the
shift register.
Fig. 3. Phase detector used in RSDLL.
B. Control Mechanism of the Shift Register
line. The locking range is determined by the number of delay As shown in Figs. 4 and 5, the input clock is a common
stages used in the symmetrical delay line. Since the DLL input to every delay stage. The shift register is used to
circuit inserts an optimum delay time between CLKIn and select a different tap of the delay line (the point of entry
CLKOut, making the output clock change simultaneously with for the input clock signal into the symmetrical delay line).
the next rising edge of the input clock, the minimum operating The complementary outputs of each register cell are used to
frequency to which the RSDLL can lock is the reciprocal select the different tap: is connected directly to the input
of the product of the number of stages in the symmetrical of a delay element, and is connected to the previous
delay line with the delay per stage. Adding more delay stages stage of input . From right to left, the first LOW-to-HIGH
will increase the locking range of the RSDLL at the cost of transition in the shift register sets the point of entry into the
increased layout area. delay line. The input clock will pass through the tap with
a high logic state in the corresponding position of the shift
III. CIRCUIT IMPLEMENTATION register. Since the of this tap is equal to a LOW, it will
disable the previous stages; therefore, it does not matter what
A. Basic Delay Element the previous states of the shift register are (shown as “don’t
Instead of using an AND gate as the unit-delay stage (NAND cares,” , in Fig. 5). This control mechanism guarantees that
inverter), as was done in [1], we used a NAND-gate-based only one path is selected. This scheme also eliminates power-
delay element. The implementation of a three-stage delay line up concerns since the selected tap is simply the first, from the
is shown in Fig. 4. The problem when using a NAND inverter right, LOW–HIGH transition in the shift register.
as the basic delay element is that the propagation delay through
the unit delay resulting from a HIGH-to-LOW transition is C. Phase Detector
not equal to the delay of a LOW-to-HIGH transition ( To stabilize the movement in the shift register, after making
). Further, this delay varies from one run to another. If the a decision, the phase detector will wait at least two clock cycles
skew between and is 50 ps, for example, the total before making another decision (Fig. 3). A divide by two was
skew of the falling edges through ten stages will be 0.5 ns. included in the phase detector so that every other decision,
Because of this skew, the NAND inverter delay element resulting from comparing the rising edges of the external clock
cannot be used in a DDR DRAM. In our modified symmetrical and the feedback clock, was used. This will provide enough
delay element, another NAND gate is used instead of an inverter time for the shift register to operate and the output waveform
(two NAND gates per delay stage). This scheme guarantees to stabilize before another decision by the PD is implemented.
that independent of process variations, since The unwanted side effect of this delay is an increase in the
while one NAND switches from a HIGH to LOW, the other lock time. The shift register is clocked by combining the
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999 567

Fig. 6. Measured rms jitter versus input frequency.

Fig. 7. Measured delay per stage versus VCC and temperature.

shift-left and shift-right signals. The power consumption will IV. EXPERIMENTAL RESULTS
decrease when there are no shift-left or -right signals and the The RSDLL was fabricated in a 0.21- m, four-poly, double-
loop is locked. Another concern with the phase-detector design metal CMOS technology (a DRAM process). We used a 48-
is the design of the flip-flops (FF’s). To minimize the static stage delay line with an operation frequency of 125–250 MHz.
phase error, very fast FF’s should be used, ideally with zero The maximum operating frequency was limited by delays
setup time. Also, the metastability of the flip-flops becomes external to the DLL such as the input buffer and interconnect.
a concern as the loop becomes locked. This together with There was no noticeable static phase error on either rising
possible noise contributions and the need to wait, as discussed or falling edges. Fig. 6 shows the resulting rms jitter versus
above, before implementing a shift-right or -left may increase input frequency. One sigma of jitter over the 125–250-MHz
the desirability of adding additional filtering in the phase frequency range was below 50 ps. The peak-to-peak jitter over
detector. Some possibilities include increasing the divider ratio this frequency range was below 100 ps. The measured delay
used in the phase detector or using a shift register in the phase per stage versus VCC and temperature is shown in Fig. 7. Note
detector to determine when a number—say, four—shift-rights that the 150-ps typical delay of a unit-delay element was very
or -lefts have occurred. For the present design, we were forced close to the rise and fall times on-chip of the clock signals and
to use a divide by two in the phase detector because of lock represents a practical minimum resolution of a DLL for use in
time requirements. a DDR DRAM fabricated in a 0.21- m process. The power
568 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 8. Measured ICC (DLL current consumption) versus input frequency.

consumption (current draw of the DLL when VCC V) of two-loop architectures where coarse loops (resolutions on the
the prototype RSDLL is illustrated in Fig. 8. We found that the order of 100 ps) are used with fine loops (resolutions on the
power consumption was mainly determined by the dynamic order of 10 ps [2]) for wide tuning range and small static
power dissipation of the symmetrical delay line. Our NAND phase errors.
delays in this test chip were implemented with 10/0.21- m
NMOS and 20/0.21- m PMOS. By reducing the widths of REFERENCES
both the NMOS and PMOS transistors, the power dissipation [1] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.
can be greatly reduced without a speed or resolution penalty Tsuboi, S.-Y. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K.
(with the added benefit of reduced layout size). Nishimura, K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K.
Mizutani, T. Anezaki, M. Hasegawa, and M. Taguchi, “A 256-Mb
SDRAM using a register-controlled digital DLL,” IEEE J. Solid-State
V. CONCLUSIONS Circuits, vol. 32, pp. 1728–1732, Nov. 1997.
[2] S. Eto, M. Matsumiya, M. Takita, Y. Ishii, T. Nakamura, K. Kawabata,
The concept of a register-controlled symmetrical delay- H. Kano, A. Kitamoto, T. Ikeda, T. Koga, M. Higashiro, Y. Serizawa,
K. Itabashi, O. Tsuboi, Y. Yokoyama, and M. Taguchi, “A 1Gb SDRAM
locked loop has been presented. The modified symmetrical with ground level precharged bitline and non-boosted 2.1V word line,”
delay element makes the RSDLL useful in DDR DRAM’s. in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 82–83.
Experimental results verify that this RSDLL is stable against [3] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,
T. Matano, Y. Hoshino, K. Miyano, S. Isa, S. Nakazawa, E. Kakehashi,
temperature, process, and power-supply variations. J. M. Drynan, M. Komuro, T. Fukase, H. Iwasaki, M. Takenaka, J.
Further development of the RSDLL will include investiga- Sekine, M. Igeta, N. Nakanishi, T. Itani, K. Yoshida, H. Yoshino, S.
tions into reducing power consumption, implementing phase- Hashimoto, T. Yoshii, M. Ichinose, T. Imura, M. Uziie, S. Kikuchi, K.
Koyama, Y. Fukuzo, and T. Okuda, “A 2.5-ns clock access 250-MHz,
locked loops where the symmetrical delay is used as part of a 256-Mb SDRAM with synchronous mirror delay,” IEEE J. Solid-State
purely digital registered-controlled oscillator, and developing Circuits, vol. 31, pp. 1656–1665, Nov. 1996.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997 1683

A Semidigital Dual Delay-Locked Loop


Stefanos Sidiropoulos, Student Member, IEEE, and Mark A. Horowitz, Senior Member, IEEE

Abstract—This paper describes a dual delay-locked loop archi-


tecture which achieves low jitter, unlimited (modulo 2 ) phase
shift, and large operating range. The architecture employs a core
loop to generate coarsely spaced clocks, which are then used by
a peripheral loop to generate the main system clock through
phase interpolation. The design of an experimental prototype in
a 0.8-m CMOS technology is described. The prototype achieves
an operating range of 80 kHz–400 MHz. At 250 MHz, its peak-
to-peak jitter with quiescent supply is 68 ps, and its jitter supply
sensitivity is 0.4 ps/mV.
Index Terms—Clock synchronization, delay-locked loops, phase
interpolation, phase-locked loops.
Fig. 1. Block diagram of a conventional DLL.

I. INTRODUCTION
conventional approaches, Section II presents the dual inter-

P HASE-LOCKED loops (PLL’s) and delay-locked loops


(DLL’s) are routinely employed in microprocessor and
memory IC’s in order to cancel the on-chip clock amplification
polating DLL architecture. Section III discusses circuit design
issues that arose in the prototype implementation of the archi-
tecture in a 0.8- m CMOS technology. Section IV discusses
and buffering delays and improve the I/O timing margins. the experimental results, and concluding remarks follow in
However, the increasing clock speeds and integration levels Section V.
of digital circuits create a hostile operating environment for
these phase alignment circuits. The supply and substrate noise
resulting from the switching of digital circuits affects the PLL II. ARCHITECTURE
or DLL operation and results in output clock jitter which
subtracts from the I/O timing margins. A. Conventional DLL’s
In applications where no clock synthesis is required, DLL’s A simplified block diagram of a conventional DLL [1] is
offer an attractive alternative to PLL’s due to their better outlined in Fig. 1. The components are a voltage controlled
jitter performance, inherent stability, and simpler design. The delay line (VCDL), a phase detector, a charge pump, and a
main disadvantage of conventional DLL’s, however, is their first-order loop filter. The input reference clock drives the
limited phase capture range. This paper presents a dual DLL delay line which comprises a number of cascaded variable
architecture which combines several techniques to achieve delay buffers. The output clock clk drives the loop phase
unlimited phase capture range, low jitter and static-phase error, detector (depicted in this example as a conventional flip-flop).
and four orders of magnitude operating frequency range. This The output of the phase detector is integrated by the charge
architecture is based on a cascade of two loops. The core pump and the loop filter capacitor to generate the loop control
loop generates six clocks evenly spaced by 30 which are voltage . The loop negative feedback drives the control
then used by the peripheral loop to generate the output clock, voltage to a value that forces a zero phase error between the
under the control of a digital finite state machine (FSM). By output clock and the reference clock.
using phase interpolation, the dual loop can provide unlimited This simple design offers many advantages compared to
phase shift without the use of a voltage controlled oscillator VCO-based PLL’s. Due to frequency acquisition constraints,
(VCO). Using an FSM for phase control offers the advantage PLL’s usually resort to a specific type of phase detector, the
of enabling the flexible implementation of complicated phase state-machine-based phase frequency detector (PFD). In con-
capture algorithms in the digital domain. Finally, by utilizing trast, DLL’s can be easily implemented by using “bang–bang”
self-biased techniques, the loop achieves large operating range control—i.e., the control signal of the loop, rather than being
and low jitter. proportional to the phase error magnitude, can simply be a
This paper begins with a brief overview of conventional binary “up” or “down” indication. Thus, in a “bang–bang”
DLL design. After outlining some of the disadvantages of DLL the phase detector can be a replica of the input data
Manuscript received April 10, 1997; revised June 5, 1997. This work was receiver resulting in an optimal placement of the sampling
supported by ARPA under contract DABT63-94-C-0054. clock in the center of the input receiver’s sampling uncertainty
The authors are with the Computer Systems Laboratory, Stanford Univer- window. Additionally, since DLL’s do not use a VCO, phase
sity, Stanford, CA 94305 USA and with Rambus Inc., Mountain View, CA
94040 USA. errors induced by supply or substrate noise do not accumulate
Publisher Item Identifier S 0018-9200(97)08033-5. over many clock cycles. This improved noise immunity is
0018–9200/97$10.00  1997 IEEE
1684 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 2. Dual interpolating DLL architecture.

the main reason for the increased adoption of DLL’s in the phase mixer is a clock with a slew rate inherently limited
applications that do not require clock synthesis. by , where is the output swing of the phase
The conventional DLL architecture of Fig. 1 suffers from mixer and the period of the clock. This slow clock exhibits
two important disadvantages: clock jitter propagation and increased dynamic noise sensitivity, thus degrading the jitter
limited phase capture range. Since the VCDL simply delays performance of quadrature mixing DLL’s.
the reference clock by a single clock cycle, the reference The approach presented here overcomes this limitation of
clock jitter directly propagates to the output clock. This all- quadrature mixing DLL’s since it generates the output clock
pass filter behavior with respect to the frequency of the by interpolating between smaller 30 phase intervals [5].
jitter of the reference clock results in reduced I/O timing Simultaneously, by avoiding the use of a VCO it eliminates the
margins, especially in “source-synchronous” interfaces where phase error accumulation problem of similar approaches [4].
the reference clock emanates from another noisy digital chip.
To overcome this problem, a separate low-jitter differential B. Dual Interpolating DLL
clock can be used as the input to the delay line. This way the
Fig, 2 shows a high-level block diagram of the proposed
on-chip common-mode noise and the reference clock jitter do
architecture. This architecture is based on cascading two loops.
not affect the I/O timing margins.
A conventional first-order core DLL is locked at 180 phase
A more important problem is that a VCDL does not have
shift. Assuming that the delay line of the core DLL comprises
the cycle slipping capability of a VCO. Therefore, at a given
six buffers, their outputs are six clocks which are evenly
operating clock frequency, the DLL can delay its input clock spaced by 30 . The peripheral digital loop selects a pair of
by an amount bounded by a minimum and a maximum clocks, and , to interpolate between. Clocks and
delay. As a consequence, extra care must be taken by the can be potentially inverted in order to cover the full 0–360
designer so that the loop will not enter in a state in which phase range. The resulting clocks, and , drive a digitally
it tries to lock toward a delay which is outside these two controlled interpolator which generates the main clock . The
limits. A compromising solution is to extend the VCDL range phase of this clock can be any of the quantized phase steps
and use an FSM that controls the loop start-up. However, between the phases of clocks and , where is the
DLL’s relying on quadrature phase mixing [2], [3] completely interpolation controlling word range.
eliminate this problem. This approach is based on the fact that The output clock of the interpolator drives the phase
quadrature clocks can be easily generated, given a clock of detector which compares it to the reference clock. The output
the correct frequency. The quadrature clocks are then fed to a of the phase detector is used by the FSM to control the phase
phase mixer which can produce a clock whose phase can span selection, the selective phase inversion, and the interpolator
the complete 0–360 phase interval. This approach eliminates phase mixing weight. The FSM moves the phase of the clock
the limited phase range problem of conventional DLL’s since according to the phase detector output. In the more common
it can essentially rotate the output clock phase infinite times case this means just changing the interpolation mixing weight
providing seamless switching at the quadrant boundaries. The by one. If, however, the interpolator controlling word has
main disadvantage of quadrature mixing is that the output of reached its minimum or maximum limit, the FSM must change
SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP 1685

the phase of clock or to the next appropriate selection.


This phase selection change might also involve an inversion
of the corresponding clock if the current interpolation interval
is adjacent to the 0 or 180 boundary. Since these phase
selection changes happen only when the corresponding phase
mixing weight is zero, no glitches occur on the output clock.
The digital “bang–bang” nature of the control results in dither-
ing around the zero phase error point in the lock condition. The
dither amplitude is determined by the interpolator phase step
and the delay through the peripheral loop.
In this architecture the output clock phase can be rotated, so
Fig. 3. Linearized dual DLL model.
no hard limits exist in the loop phase capture range: the loop
provides unlimited (modulo 2 ) phase shift capability. This
property eliminates boundary conditions and phase relation- core loop (in seconds) is the delay established by the
ship constraints, common in conventional DLL’s. The only core loop delay line, while the input delay is the delay
requirement is that the DLL input clock and the reference for which the core loop phase detector and charge pump do not
clock are plesiochronous (i.e., their frequency difference is generate an error signal. Since the core loop VCDL spans half
bounded), making this architecture suitable for clock recovery a clock cycle, is equal to half an input clock period. By
applications. Since the system does not use a VCO, it does using these loop variables, the input-to-output transfer function
not suffer from the phase error accumulation problem of of the core loop can be easily derived
conventional PLL’s. Moreover, the input clocks of the phase
interpolator are spaced by just 30 , so the output of the (1)
phase interpolator does not exhibit the noise sensitivity of
the quadrature mixing approach. Finally, the fact that the where (in rads/s) is the pole of the core loop as determined
capture algorithm can be completely implemented in the by the charge pump current, the phase detector and delay line
digital domain gives great flexibility in its implementation gain, and the loop filter capacitor. Similarly, the noise-to-delay
as will be discussed in Section III. Although the prototype error transfer function of the core loop can be shown to be
described in this paper is implemented with an analog core
loop, possible implementations of the architecture can use (2)
digital control in both loops, further enhancing the system
versatility. Moreover, the architecture can be easily extended where is the additional delay introduced in the core
to use a clock recirculating scheme in the core loop, so that loop from supply or substrate noise, and is the delay
the output clock frequency is a multiple of the input clock [7]. error seen by the core loop phase detector. This transfer
function indicates that noise induced delay errors can be
tracked up to the loop bandwidth and that the response of
C. Dual-Loop Dynamics the loop to a supply step consists of an initial step followed
Cascading two loops can compromise the overall system by a decaying exponential with a time constant equal to .
stability and lead to undesired jitter peaking effects. However, Before proceeding to analyze the response of the dual loop,
as the analysis in this section will show, this dual-loop it should be noted that the linearized model of Fig. 3 uses
architecture does not exhibit any jitter peaking irrespective a simplifying assumption. The assumption is that the delay
of the dynamics of the two loops. The behavior of the DLL error introduced by supply or substrate variations is
can be analyzed with respect to two types of perturbations: identical in both loops and does not depend on the state
i) input or reference clock delay variations and ii) delay of the phase selection multiplexers. Since the supply and
variations resulting from supply and substrate noise. The substrate sensitivity of the peripheral loop depends on the
frequency response of the dual loop can be analyzed by making phase selection and will be typically higher due to the presence
a continuous time approximation, in which the sampling of the final CMOS system clock buffer, this assumption is
operation of the phase detectors and the digital nature of the not necessarily accurate. However, it does not affect the
peripheral loop are ignored. This approximation is valid for conclusions drawn below about the stability of the loop, since
core and peripheral loop bandwidths at least a decade below it only removes a modifying constant, which is equal to the
the operating frequency. This constraint needs to be satisfied ratio in the delay sensitivities of the two loops. This constant
anyway in a DLL in order to eliminate the effects of higher only affects the relative location of the poles and zeros of the
order poles resulting from the delays around loop. resulting transfer function, and, as it will be shown below,
Fig. 3 shows the dual loop linearized model including the loop is unconditionally stable irrespective of the relation
both the loop clocks and , and delay errors between the individual poles and zeros. Using the model of
introduced by supply or substrate noise . Each of the two Fig. 3, it is straightforward to show that the transfer function
loops is modeled as a single pole system, in which the input, of the peripheral loop is identical in form to
output, and error variables are delays, similar to the single-loop that of the core loop. This result agrees with intuition since
analysis published in [7]. For example, the output delay of the in the dual loop system reference clock perturbations do not
1686 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

(a)

(b)

Fig. 5. Dual DLL detailed block diagram.


Fig. 4. Dual loop response to: (a) step change in input clock and (b) supply
noise step.
error later. When the pole frequencies of the two loops are
very close, the system overshoots since the peripheral loop
affect the core loop. More interesting is the transfer function of
compensates for the output delay error at approximately the
the input clock to dual-loop error since changes
same rate as the peripheral loop. The worst case overshoot of
in the period of the input clock will cause both the core and
approximately 4.5% of the initial disturbance occurs when the
peripheral loop to react. Based on (1) and (2), this transfer
peripheral loop bandwidth is twice that of the core loop. As the
function can be shown to be
peripheral loop bandwidth increases, the overshoot becomes
(3) progressively smaller since the peripheral loop corrects for
both the peripheral and core delay errors. Subsequently, the
This bandpass transfer function exhibits no peaking at any influence of the slower core loop correction on the output
frequency regardless of the relative magnitudes of and . delay error is compensated by the peripheral loop. Therefore,
The step response of the system, shown in Fig. 4(a), reveals even in the worst case, the dual loop cascade exhibits only
that unit-step changes in (i.e., step changes in the minor overshoot.
input clock period) will initially peak at a less than unity
value determined by the ratio of the two poles.1 Moreover, as III. CIRCUIT DESIGN
the magnitude of increases, the disturbance on the output
is reduced since the peripheral loop compensates quickly for A. Overview
disturbances at the output caused by changes of the input clock. A more detailed block diagram of the dual loop is shown in
Finally, the transfer function from supply or substrate noise- Fig. 5. This design uses a separate local differential clock as
induced delay errors to the delay error of the dual loop the input to the delay line. Although the use of this clock
can be derived is not inherent in the loop architecture, it minimizes the
supply sensitivity in applications such as “source synchronous”
(4) interfaces. To minimize the effects of input clock duty cycle
imperfections and common-mode mismatches, a duty cycle
Equation (4) also exhibits no peaking at any frequency since adjuster (DCA) [2] is employed after the first clock receiving
the location of the last zero can never be above that of buffer. The 50% duty cycle clock drives the core DLL. The
the poles. The step response of the system is plotted in core delay line consists of six differential buffers. An extra pair
Fig. 4(b) for various ratios of the core to peripheral pole of buffers B B generate two clocks which drive the core
frequencies. Under all conditions, the initial delay error is loop 180 phase detector. The output of the phase detector
equal to twice the injected unity error since this error is controls the charge pump which forces clocks C and C to
added on both loops. When the peripheral loop bandwidth is be 180 out of phase. Since all the buffers in the core delay line
less than half that of the main loop, there is no overshoot (including B and B ) have the same size, all the core VCDL
in the dual-loop step response. This result occurs because stages have the same fan-out and delay. Therefore, forcing
the core loop compensates for its delay error quickly, while C and C to be 180 out of phase will generate six evenly
the slower peripheral loop compensates for the output delay spaced by 30 clocks at the outputs of the core delay line.
1 It should be noted that in case in-CLK and ref-CLK are identical or The phase selection and phase inversion multiplexers are
correlated, the resulting transfer function exhibits a low-pass peaked behavior. differential elements controlled by the core loop control volt-
Nevertheless the resulting peaking is small, exhibiting a maximum of 15%
=
when pp pc , while it is less than 5% as long as pp and pc are an order of age. In order to eliminate jitter-sensitive slow clocks, all
magnitude apart in frequency. buffers in the clock path need to have approximately the same
SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP 1687

Fig. 7. Core loop phase detector.


(a)
The sensitivity of the dual-loop architecture to the core loop
phase offset depends on the particular application. For the
case that the dual DLL is used to just generate a clock whose
phase is directly controlled by the phase detector output, the
phase offset of the core loop does not affect the system phase
offset. In this case, the loop operation will not be affected as
long as the core loop phase offset is bounded. An absolute
core loop offset less than 30 ensures monotonic switching at
the 0 and 180 interpolation boundaries, so the interpolating
(b) loop functions correctly, albeit with a larger than nominal
Fig. 6. (a) Core loop delay buffer and (b) charge pump. interpolation phase step. Core loop phase offsets larger than
this amount will result in a hysteretic locking behavior at the
quadrant boundaries, which will increase the dither jitter if the
bandwidth. For this reason, the phase selection in this design reference clock phase forces the dual loop to lock at this point.
is implemented as a combination of a 3-to-1 and a 2-to-1 The dual-loop operation becomes more sensitive to core
multiplexers, instead of a single 6-to-1 differential multiplexer loop phase offsets in case the designer wants to use this
with lower total power. Since the phase selection multiplexer architecture to generate an additional clock that is offset by
can affect the phase shift of the core delay line through data- 90 relative to the reference clock. In such an application, the
dependent loading, the six output clocks are buffered before quadrature clock would be generated by using an extra pair
driving the phase selection multiplexers. This way, changing of phase selection and inversion multiplexers whose selects
the multiplexer select does not affect the core delay line phase would be offset by three relative to those generating the main
shift. clock. This would create a 90 interpolation interval offset,
The outputs of the phase inversion multiplexer drive resulting in the required quadrature phase shift. In this case
the phase interpolator which generates the low swing differen- the core loop phase offset would impact the quadrature phase
tial clock . This clock is then amplified and buffered through if the selects of the extra multiplexers happen to wrap around
a conventional CMOS inverter chain generating the main clock the 0 or 180 interpolation interval boundaries.
(CLK). The peripheral loop phase detector [1] compares that
Even though the prototype does not implement quadrature
clock to the reference clock, generating a binary phase error
phase generation, a low offset phase detector and careful
indication that is then fed to the FSM. The FSM based on the
matching of the layout were used to ensure uniform spacing
phase detector (PD) output selects phases and controls
of the six clocks. A self-biased DLL requires a linear phase
the phase interpolation.
detector. To avoid start-up problems that would result from the
use of a conventional state machine PFD [7], the core loop uses
B. Core Loop the phase detector depicted in Fig. 7. This design comprises
To minimize the jitter supply sensitivity, all the delay buffers an S–R latch augmented with two input pulse generators. The
in the design, from the input clock (in-CLK) to the output absence of extra state storage in this design eliminates any
of the phase interpolator ( ), use differential elements with start-up false locking conditions. Additionally, its symmetric
replica feedback biasing [6]. In order to linearize the loop structure and the use of pulse triggering minimize the core
gain and obtain large operating range, the core loop charge loop phase offset.
pump current is scaled along with the VCDL buffer current as The core of the phase detector is an S–R latch-based phase
illustrated in Fig. 6 [7]. Voltage is generated through the detector. The S–R latch ensures a 180 phase shift between
replica-feedback biasing circuit while is a buffered version the falling edges of its inputs only when the duty cycle of
of the charge pump control voltage . In addition to the core the two input clocks is identical. However, when the duty
VCDL buffers, voltages and control the differential cycle of the two input clocks is different, this mismatch will
buffer elements of the peripheral loop. This ensures that all propagate as a core loop phase locking offset. This happens
the buffers in the design have approximately equal delays and because an unbalanced overlap of the two input clocks causes
that the edge rates of the interpolator input clocks ( ) scale the output of the S–R latch to have a duty cycle deviating
with the operating frequency of the loop. from 50%. To compensate for this effect, the S–R latch is
1688 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 9. Phase interpolator (type-I) schematic.


Fig. 8. Phase detector and charge pump simulated transfer function.
case, the interpolation step is 1/16 of the 30 interval resulting
augmented with two pulse generators which propagate a low in approximately 2 peripheral loop nominal dither. Another
pulse on the positive edges of the input clocks. Since potential important requirement is that the design should provide for
overlaps are minimized, the design can tolerate large duty seamless interpolation-boundary switching. This means that
cycle imperfections and still provide an accurate 180 lock when the input code is such that the weight on one of the
in the core loop. input clocks is zero, this clock should have no influence on
Fig. 8 shows the simulated transfer characteristics of the the output.
phase detector and charge pump over three extreme process Fig. 9 shows a schematic diagram of the interpolator used
and environment conditions. The cycle time of the two input in the prototype chip. This design is a dual input differential
clocks is set at 4 ns, while their duty cycles are mismatched buffer which uses the same symmetric loads as all the core
by 0.5 ns such that the duty cycle of is 37.5% while VCDL buffers and peripheral loop multiplexers. The bias
the duty cycle of clock is 62.5%. It can be seen that voltages and are identical with those biasing the rest
the transfer function is linear and has no offset or dead-band of the loop, ensuring that its total delay is approximately 30
around the 2-ns point where the loop actually locks. However, of the clock period which is the same as the rest of the loop
the combination of input pulsing and duty cycle imperfections buffers. Therefore, the transition time of the interpolator input
results in nonlinear transfer function characteristics at the clocks is larger than the minimum delay through the inter-
vicinity of the boundaries of the locking range (i.e., 0 and polator, and the two input transitions overlap. This condition
4 ns). The only effect of this nonlinearity is that the core ensures that the interpolator outputs never settle at half of
loop can exhibit an initial slew-rate limited reduction of its the swing range. The current sources of the two differential
phase error, since the output of the phase detector and charge pairs are thermometer controlled elements. The thermometer
pump is constant. After the phase error has been reduced, codes are generated by a 16-b long up/down shift register
such that the phase detector operates within its linear region, which is controlled by the peripheral loop FSM. By changing
the core loop will exhibit a conventional single-pole response. the thermometer code, the FSM adjusts in a complementary
Harmonic locking problems, common in PLL’s using S–R fashion the currents of the two input differential pairs resulting
phase detectors, are eliminated in this design since the core in a mixing of the two input clock phases. This design
loop is reset to its minimum delay at system start-up. (type-I) does not completely satisfy the seamless boundary-
switching requirement. Even when the current through one of
C. Phase Interpolator Design the differential pairs is zero, the input still influences the output
The most critical circuit in the design of the peripheral of the interpolator. This influence is due to the capacitive
digital loop is the phase interpolator. The phase interpolator coupling of the gate-to-drain capacitance of the differential
receives two clocks and generates the main clock pair input transistors.
whose phase is the weighted sum of the two input phases. Fig. 10 shows an alternative design which does not suffer
Essentially, the phase interpolator converts a digital weight from this problem. In this design (type-II), the interpolator dif-
code generated from the FSM to the phase of clock . ferential pairs consist of unit cell differential pairs. Therefore,
Linearity is not important in the design of this digital-to-phase when one of the interpolation weight thermometer codes is
converter since it is enclosed in the peripheral loop feedback. zero, the corresponding input is completely cut off from the
The important requirement is that the interpolation process output, eliminating the gate-to-drain coupling capacitance.
is monotonic to ensure that no hysteresis exists in the loop Fig. 11 shows the simulated transfer function of the inter-
locking characteristics. Additionally, the phase step must be polator alternative designs. This simulation includes random
minimized since it determines the loop dither amplitude. In this ( 20 mV) threshold voltage offsets in the thermometer code
SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP 1689

(a) (b)
Fig. 12. (a) Simplified FSM algorithm and (b) resulting loop behavior.

Fig. 10. Phase interpolator (type-II) schematic.

Fig. 11. Simulated phase interpolator transfer function. Fig. 13. Prototype chip microphotograph.

current sources. The type-I design exhibits a nominal step of On every cycle of its operation, the FSM might undertake two
approximately 2 . However, due to the gate-to-drain capacitive actions.
coupling effect, the maximum step of 3.8 occurs at the
interpolation boundary when the input clock is switched to • In the more frequent case of in-range interpolation (i.e.,
the next selection. In the lower power implementation where weight 0), the FSM simply increments or decrements
no buffering is used at the core delay line outputs (type- the interpolation weight by shifting up or down the
I-unbuf), the data-dependent loading on the previous stage interpolator controlling shift register. The direction of the
results on a double phase step at the interpolation interval shift is decided based on the phase detector output and
boundaries. Although the alternative design (type-II) does not the current value of the state Early.
exhibit a boundary phase step, it was not used since it occupies • If the peripheral loop has run out of range in the current
more layout area and exhibits more nonlinear characteristics interpolation interval, the FSM seamlessly slides the
due to data-dependent loading of the previous stage. So in current interpolation interval by switching phase or to
the present implementation, worst-case dithering occurs at the next selection. The fact that the interpolation has run
the interpolation interval boundaries and has an approximate out of range in the current interval is simply indicated by
magnitude of 3.8 . a combination of the current value of the state Early, the
most or least significant bit of the thermometer register,
and the output of the phase detector. In case the current
D. Finite State Machine selection of phase or is adjacent to the 0 or 180
A simplified version of the peripheral loop FSM algorithm interpolation interval boundary, switching to the next
is outlined in Fig. 12(a). The single state Early of the FSM selection involves toggling the select of the second-stage
indicates the relationship of the two interpolator input clocks. phase inversion multiplexer.
1690 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 14. Noise generation and monitoring circuits.

The loop phase capture behavior resulting from this simple more complicated algorithms can be implemented requiring
algorithm is illustrated in Fig. 12(b). The phase error decreases minimal effort from the designer. Faster phase acquisition
at a linear rate until the system achieves lock. Subsequently, can be obtained by disabling the front end counter/filter and
the loop dithers around the zero phase error point with a changing the interpolation step by a larger amount while the
dither magnitude of one phase interpolation interval. This loop is not in lock. The loop can also implement a periodic
occurs because in this type of “bang–bang” system, the output phase calibration algorithm. In this case, the FSM is activated
of the phase detector is just a binary phase error without initially to drive the loop to zero phase error. Then it is
any indication of the magnitude of the phase error. The shut down to save power and it is periodically turned on to
complementary interpolation weights slew linearly, changing compensate for slow phase drifts. Since the FSM can run
direction at the interpolation interval boundaries. Once the at a frequency slower than that of the system clock, the
system finds lock, they either dither by one or they stay implementation of different algorithms is not in the system
constant if the dither point happens to lie on an interval critical path.
boundary.
The magnitude of the peripheral loop phase dither is IV. EXPERIMENTAL RESULTS
determined by the minimum interpolation step and the To verify the dual DLL architecture, a chip has been
delay through the feedback loop. In conventional analog fabricated through MOSIS in the HP CMOS26B process. This
“bang–bang” DLL’s, the loop delay is largely determined by is a 1.0- m drawn process with the channel lengths scaled to
the delay through the delay line and the clock distribution 0.8 m. Although the gate oxide in this process is 170 Å
network. However, this digital implementation has a larger allowing 5-V operation, the loop design and testing was done
minimum loop delay. The underlying reason is that driving with a 3.3-V power supply voltage.
the FSM directly from the phase detector output might lead Fig. 13 is a micrograph of the chip. The chip integrates the
into metastability problems, especially since the whole loop dual DLL, along with noise injection and monitoring circuits
operation is driving the phase detector to its metastable point and current-mode differential output buffers. The dual DLL
of operation. For this reason, in this implementation the occupies 0.8 mm of silicon area, the majority ( 60%) of
output of the phase detector is delayed by three metastability which is devoted to the peripheral loop logic. This is mainly
hardened flip-flops. This increases the mean time between due to the relatively large standard cell size of the library used
failures (MTBF) of the system to a calculated worst case of in this implementation.
approximately 100 years, but at the same time increases the The block labeled NOISE-GEN in Fig. 13 is used to inject
peripheral loop delay by three cycles. To compensate for that and measure on-chip supply noise. Fig. 14 shows a schematic
delay and decrease the loop dither, the FSM logic implements diagram of these circuits. The 1000- m wide transistor
a front-end filter which counts eight continuous phase detector shorts the on-chip supply rails creating a voltage drop across
“up” or “down” results before propagating this signal to the the off-chip 4- resistor . In order to monitor the droop
core FSM. This causes the FSM to delay its next decision on the on-chip supply, device and the external 5- load
until the results of its previous action have been propagated to resistor form a broadband attenuating buffer which drives
the phase detector output and reduces the inherent peripheral the 50- scope. The gain of the buffer is computed during an
loop dither to one phase interpolation interval. initial calibration step. The use of these circuits enables the
The digital nature of the peripheral loop control enabled the injection and monitoring of fast ( 1-ns rise time) steps on the
implementation of the FSM to be done through synthesis of a on-chip supply.
behavioral verilog model followed by a simple standard cell The dither jitter of the loop with quiescent on-chip supply
place and route. The FSM behavioral model was verified by varies with the input phase. This occurs because the offset of
simulation in conjunction with a behavioral core loop model. the interpolator and the phase selection multiplexers change
The significance of this automated methodology is that other according to the point of lock. Fig. 15 shows the worst-
SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP 1691

Fig. 15. Jitter histogram with quiet supply.

Fig. 16. Jitter histogram with 1-MHz 750-mV square wave supply noise.

case jitter (68 ps) with quiescent supply. The jitter histogram the reference clock to a constant voltage while the input clock
consists of the superposition of two Gaussian distributions ran at its nominal frequency of 250 MHz. The histogram
resulting from the switching of the peripheral loop between valleys correspond to the interpolation interval boundaries.
two adjacent interpolation intervals. The distance between the The spacing of the valleys is within 10% of their nominal
peaks of the two superimposed distributions is about 40 ps, 333-ps distance, indicating good matching of the delays of
which is in fair agreement with the simulation results. With the core loop buffers. The absence of one valley at the 180
the noise generation circuits injecting a 750-mV 1-MHz square interpolation boundary indicates a slight offset in the core
wave on the chip supply, the peak-to-peak jitter increases to loop. The fact that the magnitude of the highest peak of the
400 ps (Fig. 16). It should be noted that simulation results histogram is smaller than the magnitude of the deepest valley
indicate that approximately 50% of this jitter is not inherent to indicates that the interpolator achieves the 4-b target linearity
the loop, but is due to the supply sensitivity of the succeeding (the 4-b linearity of the interpolator was also confirmed by
static CMOS clock buffer and off-chip driver. a similar histogram of a single interpolation interval). Thus
Fig. 17 illustrates the linearity of the interpolation process the overall linearity of the DLL is limited by the steps at the
in the peripheral loop. The figure shows the histogram of interpolation interval boundaries.
the output clock with the peripheral loop FSM continuously Table I summarizes the loop performance characteristics.
rotating that clock. The histogram was generated by keeping With a 3.3-V supply, the loop operates from 80 kHz to
1692 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 17. Interpolation process linearity.

TABLE I complicated phase alignment algorithms in a straightforward


PROTOTYPE PERFORMANCE SUMMARY manner.
A prototype using a linear self-biased core loop has been
implemented in a 0.8- m technology. The prototype achieves
68-ps peak-to-peak jitter, 0.4-ps/mV supply sensitivity, and
0.08–400 MHz operating range.

ACKNOWLEDGMENT
The authors are grateful to M. Johnson, T. Lee, J. Maneatis,
and K. Yang for helpful discussions.

REFERENCES
[1] M. Johnson and E. Hudson, “A variable delay line PLL for CPU-
coprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, Oct.
400 MHz. The phase offset between the reference clock and 1988.
[2] T. Lee et al., “A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500
the output clock of the loop is less than 40 ps. Operating at MB/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496,
250 MHz, the dual DLL draws 31 mA dc from a 3.3-V power Dec. 1994.
supply. [3] M. Izzard et al., “Analog versus digital control of a clock synchronizer
for a 3 Gb/s data with 3.0 V differential ECL,” in Dig. Tech. Papers
1994 Symp. VLSI Circuits, June 1994, pp. 39–40.
[4] M. Horowitz et al., “PLL design for a 500 MB/s interface,” in Dig.
V. SUMMARY Tech. Papers Int. Solid State Circuits Conf., Feb. 1993, pp. 160–161.
[5] S. Sidiropoulos and M. Horowitz, “A semi-digital delay locked loop with
Although DLL’s are easier to design than PLL’s and offer unlimited phase shift capability and 0.08–400 MHz operating range,” in
better jitter performance, their main disadvantage is their Dig. Tech. Papers Int. Solid State Circuits Conf., Feb. 1997, pp. 332–333.
[6] J. Maneatis and M. Horowitz, “Precise delay generation using coupled
limited phase capture range. This disadvantage limits their oscillators,” IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282, Dec.
application to completely synchronous environments and com- 1993.
plicates start-up circuitry. This paper presented a dual DLL [7] J. Maneatis, “Low-jitter process-independent DLL and PLL based
on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp.
architecture which removes this limitation by using a core DLL 1723–1732, Nov. 1996.
to generate coarsely spaced clocks which are then used by a
peripheral DLL to generate the output clock by using phase
interpolation. This architecture has unlimited (modulo 2 )
phase shift capability, therefore removing boundary conditions Stefanos Sidiropoulos (S’93), for a photograph and biography, see p. 690 of
the May 1997 issue of this JOURNAL.
and phase relationship constraints between the system clocks.
The only requirement is that the DLL input and reference
clocks are plesiochronous, making the dual DLL suitable for
clock recovery applications. In addition, the digital nature Mark A. Horowitz (S’77–M’78–SM’95), for a photograph and biography,
of the peripheral loop control enables implementation of see p. 690 of the May 1997 issue of this JOURNAL.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002 1021

A Wide-Range Delay-Locked Loop With a Fixed


Latency of One Clock Cycle
Hsiang-Hui Chang, Student Member, IEEE, Jyh-Woei Lin, Ching-Yuan Yang, Member, IEEE, and
Shen-Iuan Liu, Member, IEEE

Abstract—A delay-locked loop (DLL) with wide-range opera-


tion and fixed latency of one clock cycle is proposed. This DLL
uses a phase selection circuit and a start-controlled circuit to
enlarge the operating frequency range and eliminate harmonic
locking problems. Theoretically, the operating frequency range of
the DLL can be from 1 ( max ) to 1 (3 min ), where
min and max are the minimum and maximum delay of a
delay cell, respectively, and is the number of delay cells used
in the delay line. Fabricated in a 0.35- m single-poly triple-metal
CMOS process, the measurement results show that the proposed
DLL can operate from 6 to 130 MHz, and the total delay time
between input and output of this DLL is just one clock cycle.
From the entire operating frequency range, the maximum rms
jitter does not exceed 25 ps. The DLL occupies an active area of
880 m 515 m and consumes a maximum power of 132 mW
Fig. 1. Block diagram of the conventional analog DLL.
at 130 MHz.
Index Terms—Delay-locked loops, latency, phase-locked loops,
wide range. maximum operating frequency of a DLL will be limited by the
minimum delay of the delay line.
In this paper, a DLL with wide-range operation and fixed
I. INTRODUCTION latency of one clock cycle is proposed by using the phase selec-
tion circuit and the start-controlled circuit. The proposed DLL
W ITH THE evolution and continuing scaling of CMOS
technologies, the demand for high-speed and high in-
tegration density VLSI systems has recently grown exponen-
not only locks the delay equal to one clock cycle but also op-
erates without the restrictions stated above. The operating fre-
tially. However, the important synchronization problem among quency range of the proposed DLL can also be increased.
IC modules is becoming one of the bottlenecks for high-perfor- The range problem of conventional DLLs will be discussed
mance systems. in Section II. The architecture of the proposed DLL will be in-
Phase-locked loops (PLLs) [1]–[3] and delay-locked loops troduced in Section III and the building blocks in this DLL will
(DLLs) [4]–[7] have been typically employed for the purpose of be described in Section IV. Measurement results are given in
synchronization. Due to the difference of their configuration, the Section V. Conclusions are given in Section VI.
DLLs are preferred for their unconditional stability and faster
locking time than the PLLs. Additionally, a DLL offers better
jitter performance than a PLL because noise in the voltage-con- II. RANGE PROBLEM OF CONVENTIONAL DLLS
trolled delay line (VCDL) does not accumulate over many clock A conventional DLL, as shown in Fig. 1, consists of four
cycles. major blocks: the phase detector (PD), the charge-pump cir-
Conventional DLLs may suffer from harmonic locking over cuit, the loop filter, and the VCDL. In the DLL, the reference
wide operating range. If the DLLs are to operate at lower fre- clock, ref_clk, is propagated through VCDL. The output signal,
quency without harmonic locking, the number of delay stages vcdl_clk, at the end of the delay line is compared with the ref-
must be increased to let the maximum delay of the delay line erence input. If delay different from integer multiples of clock
be equal to the period of the lowest frequency. However, the period is detected, the closed loop will automatically correct it
by changing the delay time of the VCDL. However, the conven-
Manuscript received November 5, 2001; revised March 27, 2002. tional DLL will fail to lock or falsely lock to two or more pe-
H.-H. Chang and S.-I. Liu are with the Department of Electrical Engineering riods, , of the input signal if the initial delay of the VCDL is
and Graduate Institute of Electronics Engineering, National Taiwan University, shorter than 0.5 or longer than 1.5 , as shown in Fig. 2.
Taipei, Taiwan 10617, R. O. C. (e-mail: lsi@cc.ee.ntu.edu.tw).
J.-W. Lin was with the Department of Electrical Engineering and Graduate Therefore, if the DLL is required to lock the delay to one clock
Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan cycle of the input reference signal, the initial delay of the VCDL
10617, R. O. C. He is now with Sunplus Corporation, Hsinchu 300, Taiwan. needs to be located between 0.5 and 1.5 [7], regardless
C.-Y. Yang is with the Department of Electrical Engineering, Huafan Univer-
sity, Taipei, Taiwan 223, R. O. C. of the initial voltage of the loop filter. Assume that the max-
Publisher Item Identifier 10.1109/JSSC.2002.800922. imum and the minimum delay of the VCDL are
0018-9200/02$17.00 © 2002 IEEE
1022 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 2. DLL in normal lock and false lock conditions.


Fig. 3. System architecture of the proposed DLL.

and , respectively. As a result, the period of the input


signal should satisfy the following inequality [7]:

Max
Min (1)

Equation (1) shows that the DLL is prone to the false locking Fig. 4. Small-signal model of the conventional analog DLL.
problem when process variations are taken into account [7].
Therefore, some solutions [6]–[10] are proposed to overcome
this problem. They are described as follows.
First, the basic idea is to use a phase-frequency detector
(PFD) [5], because it has a capture range of 2 , 2 wider
than other phase detectors. So, the PFD is a better choice for
wide range operation. However, the PFD cannot be used in the
DLL alone without any control circuit because the DLL will
try to lock a zero delay. A PFD combined with a control circuit
is presented in [6]. Nevertheless, in some cases, especially for
high-frequency operations, the initial delay between ref_clk
and vcdl_clk, as shown in Fig. 1, may be larger than two clock Fig. 5. Block diagram of the phase selection circuit.
cycles and harmonic locking will occur.
Second, a solution called an all-analog DLL using a replica
presented for the DLL to solve false locking problems and keep
delay line [7] has been developed to solve the narrow frequency
the latency of one clock cycle. The exact 50% duty cycle is not
range problem of a conventional DLL. If the delay range of the
necessary.
VCDL satisfies the relation ,
the DLL will have a maximum operation range of 7:1.
Third, a digital-controlled DLL called the self-correcting III. ARCHITECTURE OF THE PROPOSED DLL
DLL is proposed in [8]. The problem of false locking is The architecture of the proposed DLL is shown in Fig. 3. It
solved by the addition of a lock-detect circuit and the modified is composed of a conventional analog DLL, a phase selection
phase detector. Although this self-correcting DLL avoids false circuit, and a start-controlled circuit. Before the DLL begins to
locking, the outputs of the VCDL are required to have an exact lock, the phase selection circuit will choose an appropriate delay
50% duty cycle. cell to be a feedback signal (vcdl_clk) according to different fre-
The DLL developed in [9] uses a stage selector for fast-locked quencies of input signal. In other words, the number of the delay
and wide-range operations, but the DLL requires an additional cells may change at different input frequencies. The minimum
VCDL, which increases the area. A similar DLL can automat- delay of the delay line is determined by one unit-delay
ically change its lock mode to extend the operation range, but cell. The maximum delay can be decided as where
the latency of the DLL will be larger than one clock cycle [10]. is the number of unit-delay cells. Thus, the operating fre-
The approach presented in this work uses a phase selection quency range of the DLL can be from to
circuit to automatically decide what number of delay cells .
should be used. This can enable the DLL to operate in the The linear model of the DLL is shown in Fig. 4, where the
wide-frequency range. A new start-controlled circuit is also summer stands for a phase detector, is the charge-pump cur-
CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE 1023

Fig. 6. Schematic of edge detection circuit. (a) Edge detection circuits. (b) Clock edge generation. (c) Latch N.

(a) (b)
Fig. 7. Timing diagram of edge detection circuit.

rent, is the period of the input reference clock, is the the jitter performance will be degraded. Hence, the following
capacitor value in the loop filter, and is the gain of the tradeoff design guideline was suggested in [12]:
VCDL which is proportional to the number of delay cells. In
the steady-state locked condition, the -domain transfer func- (4)
tion can be expressed as [11]
where .
When the input frequency is higher, the phase selection circuit
(2) will select the smaller number of delay cells and will
become smaller. In order to have an adequate loop bandwidth for
the DLL, the capacitances used in the loop filter must become
smaller. In this work, the 3-bit control signals generated from
where is the input delay time and is the output delay the phase selection circuit will switch the number of capacitors
time. The loop bandwidth can be expressed as [11] in the loop filter depending on the selected phase.
After the vcdl_clk is decided, the DLL will start the locking
(3) process, which is controlled by the start-controlled circuit. First,
the delay between input and output of the VCDL is initially set
to the minimum value and then allows the down signal of the
Since the transfer function is inherently stable, a wider loop PFD output activate, supposing that the VCDL’s delay increases
bandwidth can be used to achieve fast acquisition time, but with control voltage decreasing. Therefore, the delay between
1024 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 8. Schematic of start-controlled circuit associated with PFD.

Fig. 9. Timing diagram of start-controlled circuit.

input and output of the VCDL will increase until it reaches one
clock period of the input signal. Thus, the DLL will not fall into
false locking and the latency is fixed to one clock cycle no matter
how long a delay the VCDL provides.

IV. CIRCUIT DESCRIPTION


A. Phase Selection Circuit
The phase selection circuit consists of two blocks: an edge de-
tector and a multiplexer with a decoder, as shown in Fig. 5. The
schematic and timing diagram of the edge detector are shown
in Figs. 6 and 7, respectively. To guarantee that the latency of
the DLL is just one clock cycle, the first two clock phases in
Fig. 6 are reserved for measurement. In practice, the first two
clock phases could be included in the phase selection circuit to
improve the operating frequency range of the DLL. At the initial
state, the signal startb is set to low to reset the edge detector out-
puts (i.e., d3 d10) and the delay of the VCDL is set to its min- Fig. 10. Schematic of the PFD circuit [12].
imum value. When the signal startb goes high, the edge detector
will detect the rising edge of input signals in sequence during the vcdl_clk is decided, the DLL will start the locking process,
the next two rising edges of ref_clk. Referring to Fig. 7(a), sup- which will be explained later. By the decoder, signals (d3 d10)
pose that the signals all have rising edges in sequence during are decoded to generate 3-bit control signals, which switch the
one clock cycle, therefore, the outputs (d3 d10) are all high number of capacitors used in the loop filter for tuning the loop
and the multiplexer will select phase 10 as the output signal, bandwidth.
vcdl_clk. However, if the input frequency is higher, suppose that
the timing diagram is similar to Fig. 7(b). All the inputs have B. Start-Controlled Circuit
rising edges during one clock cycle, but only the rising edges of The schematic of the start-controlled circuit and the asso-
phases 1 4 in sequence lead the selected phase to be 4. The ciated PFD are shown in Fig. 8. It is composed of only two
vcdl_clk will be low until the selected phase is chosen. After rising-edge trigger D-flip-flops (DFFs), two NAND gates, and
CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE 1025

Fig. 11. Schematic of the charge-pump circuit [11].

Fig. 12. Schematic of the delay cell with replica bias [12].

Fig. 14. Microphotograph of the chip.

PFD are in low level. When startb goes to high, setupb will
also go to high. After two consecutive falling edges of vcdl_clk
trigger the DFFs, the down signal of the PFD will be activated
and let the delay of the VCDL increase. The delay of the VCDL
will increase until it is equal to one clock period of the input
Fig. 13. Simulated transfer curve of the VCDL. signal due to the nature of negative feedback architecture. Since
the start-controlled circuit forces the delay of the VCDL to its
two inverters. The timing diagram of this start-controlled circuit minimum value and controls the delay of the VCDL to increase
is shown in Fig. 9. Initially, startb is set to low in order to clear until its delay is equal to one clock period, the DLL will not fall
the two DFF’s outputs. Therefore, setupb is low and pulls the into false locking even when . In order to get
control voltage to , as shown in Fig. 3 (i.e., set the VCDL equal delays for path1 and path2, dummy loads should be added
delay to its minimum value). In this way, the two inputs of the in point A. In comparison with [6], this start-controlled circuit
1026 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 15. DLL at initial state when operating frequency is 6 MHz.

Fig. 17. Jitter histogram when DLL operates at 130 MHz.

Fig. 16. DLL at initial state when operating frequency is 130 MHz. Fig. 18. Measurement results of rms jitter over different frequencies.

has two advantages: the proposed circuit is simple, and the duty TABLE I
PERFORMANCE SUMMARY
cycle of ref_clk and vcdl_clk is not required to be exactly 50%.

C. Other Circuits
In this work, the dynamic logic style PFD [13] is adopted to
avoid the dead-zone problem and improve the operating speed.
To mitigate charge injection errors induced by the parasitic
capacitors of the switches and current source transistors, the
charge-pump circuit developed in [11] is used here. The delay
cell circuit is similar to [11]. The schematics of these circuits
are shown in Figs. 10–12. The control voltage of the loop filter
is directly connected to nMOS rather than pMOS. Therefore,
the transfer curve of delay versus control voltage is monotonic
decreasing, as shown in Fig. 13.

V. EXPERIMENTAL RESULTS
The prototype chip is fabricated in a 0.35- m single-poly show the first four cycles of the DLL in the locking process
triple-metal standard CMOS process. The microphotograph of when the operating frequency is 6 and 130 MHz, respectively.
the chip is shown in Fig. 14. The capacitors used in the loop After the signal startb is high, the phase selection circuit will se-
filter are integrated in the chip and formed by metal-to-metal lect one of the outputs of the VCDL as close as possible to the
capacitors. The experimental results show that the DLL can op- next rising edge of the input clock, ref_clk. Figs. 15 and 16 also
erate in the frequency range of 6–130 MHz. Figs. 15 and 16 show that after the signal startb is high, the first rising edge of
CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE 1027

the output clock of the VCDL, vcdl_clk, leads that of the input [10] Y. Okuda, M. Horiguchi, and Y. Nakagome, “A 66–400 MHz adap-
clock, ref_clk. Since the signal startb will set the control voltage tive-lock-mode DLL circuit duty-cycle error correction,” in Symp. VLSI
Circuits Dig. Tech. Papers, June 2001, pp. 37–38.
in Fig. 3 to , the proposed phase detector and the cur- [11] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based
rent-pump circuit will discharge the loop filter to increase the on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp.
delay of the VCDL. It will align the phases between the input 1723–1732, Nov. 1996.
[12] A. Chandrakasan, W. J. Bowhill, and F. Fox, Design of High-Perfor-
clock and output clock of the VCDL. Fig. 17 shows the jitter mance Microprocessor Circuit. New York: IEEE Press, 2001, p. 240.
histogram when the DLL operates at 130 MHz. Fig. 18 shows [13] S. Kim et al., “A 960-Mb/s/pin interface for skew-tolerant bus using
the measurement results of rms jitter over different frequencies. low jitter PLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 691–700, May
1997.
Table I gives the performance summary. The proposed DLL can
be seen to have a wide-operational range and a fixed latency of
one clock cycle.
Hsiang-Hui Chang (S’01) was born in Taipei,
Taiwan, R.O.C., on February 4, 1975. He received
VI. CONCLUSION the B.S. and M.S. degrees in electrical engineering
from National Taiwan University, Taipei, in 1999
A DLL with wide-range operation and fixed latency of one and 2001, respectively. He is currently working
clock cycle is proposed. First, the multiphase outputs of the toward the Ph.D. degree in electrical engineering at
VCDL are all sent to the phase selection circuit. Then the National Taiwan University.
His research interests are PLL, DLL, and
phase selection circuit will automatically select one of the high-speed interfaces for gigabit transceivers.
delayed outputs to feedback. As a result, this DLL can operate
over a wide range without suffering from harmonic locking
problems. Ideally, this DLL can operate from
to . The experimental results also demonstrate
the functionality of the proposed DLL. Moreover, at different Jyh-Woei Lin was born in Kaoshiung, Taiwan,
R.O.C., in 1974. He received the B.S. degree in
operating frequencies, the jitter performances are all in an electrical engineering from National Taipei Univer-
acceptable range and the latency is just one clock cycle. Since sity of Technology in 1996, and the M.S. degree
the speed of the proposed circuits can be increased if the in electrical engineering from National Taiwan
University in 2001.
more advanced process is used, the performance of the DLL He joined Sunplus Corporation, Hsinchu, Taiwan,
such as the operating frequency range can be improved with a in 2001 as an Analog Circuit Designer. His research
little hardware and design effort. The power consumption of interests include PLL, DLL, and interface circuits for
high-speed data links.
the digital part in the DLL and the total die area will also be
reduced.

REFERENCES Ching-Yuan Yang (S’97–M’01) was born in Miaoli,


Taiwan, R.O.C., in 1967. He received the B.S. de-
[1] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Cir-
gree in electrical engineering from the Tatung Insti-
cuits: Theory and Design. Piscataway, NJ: IEEE Press, 1996.
tute of Technology, Taipei, Taiwan, R.O.C., in 1990,
[2] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans.
and the M.S. and Ph.D. degrees in electrical engi-
Commun., vol. COM-28, pp. 1849–1858, Nov. 1980.
neering from National Taiwan University, Taipei, in
[3] R. E. Best, Phase-Locked Loops: Theory, Design and Applica-
1996 and 2000, respectively.
tions. New York: McGraw-Hill, 1998.
He has been on the faculty of Huafan University,
[4] R. L. Aguitar and D. M. Santos, “Multiple target clock distribution
Taiwan, since 2000, where he is currently an Assis-
with arbitrary delay interconnects,” Electron. Lett., vol. 34, no. 22, pp.
tant Professor with the Department of Electronics En-
2119–2120, Oct. 1998.
gineering. His research interests are in the area of
[5] R. B. Watson Jr. and R. B. Iknaian, “Clock buffer chip with multiple
mixed-signal integrated circuits and systems for high-speed interfaces and wire-
target automatic skew compensation,” IEEE J. Solid-State Circuits, vol.
less communication.
30, pp. 1267–1276, Nov. 1995.
[6] C. H. Kim et al., “A 64-Mbit 640-Mbyte/s bidirectional data strobed,
double-data-rate SDRAM with a 40-mW DLL for a 256-Mbyte memory
system,” IEEE J. Solid-State Circuits, vol. 33, pp. 1703–1710, Nov.
1998. Shen-Iuan Liu (S’88–M’93) was born in Keelung,
[7] Y. Moon, J. Choi, K. Lee, D. K. Jeong, and M. K. Kim, “An all-analog Taiwan, R.O.C., on April 4, 1965. He received both
multiphase delay-locked loop using a replica delay line for wide-range the B.S. and Ph.D. degrees in electrical engineering
operation and low-jitter performance,” IEEE J. Solid-State Circuits, vol. from National Taiwan University, Taipei, in 1987 and
35, pp. 377–384, Mar. 2000. 1991, respectively.
[8] D. J. Foley and M. P. Flynn, “CMOS DLL-based 2-V 3.2-ps jitter 1-GHz During 1991–1993, he served as a Second Lieu-
clock synthesizer and temperature-compensated tunable oscillator,” tenant in the Chinese Air Force. During 1991–1994,
IEEE J. Solid-State Circuits, vol. 36, pp. 417–423, Mar. 2001. he was an Associate Professor in the Department
[9] H. Yahata, T. Okuda, H. Miyashita, H. Chigasaki, B. Taruishi, T. Akiba, of Electronic Engineering of National Taiwan
Y. Kawase, T. Tachibana, S. Ueda, S. Aoyama, A. Tsukinori, K. Shibata, Institute of Technology. He joined the Department of
M. Horiguchi, Y. Saiki, and Y. Nakagome, “A 256-Mb double-data-rate Electrical Engineering, National Taiwan University,
SDRAM with a 10-mW analog DLL circuit,” in Symp. VLSI Circuits Taipei, in 1994, where he has been a Professor since 1998. His research
Dig. Tech. Papers, June 2000, pp. 74–75. interests are in analog and digital integrated circuits and systems.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000 1553

Active GHz Clock Network Using Distributed PLLs


Vadim Gutnik, Member, IEEE, and Anantha P. Chandrakasan, Member, IEEE

Abstract—A novel clock network composed of multiple syn- is that skew is only relevant between communicating latches,
chronized phase-locked loops is analyzed, implemented, and but the clock path is always the length of the chip. Clock speeds
tested. Undesirable large-signal stable (mode-locked) states increase with gate delay, and processor architectures can exploit
dictate the transfer characteristic of the phase detectors; a matrix
formulation of the linearized system allows direct calculation of both locality of blocks and pipelining to avoid penalty due to
system poles for any desired oscillator configuration. A 16-oscil- long signal paths, but the error in a global clock scales with the
lator 1.3-GHz distributed clock network in 0.35- m CMOS is total path delay, and is thus a growing fraction of a clock cycle.
presented here. In this paper, we consider the effects of static and dynamic
Index Terms—Clock network, multiple oscillator system, phase- mismatch on a few representative clock networks in Section II
locked loop. and propose a distributed generation scheme that needs only
local synchronization to generate a global clock. Large and
I. INTRODUCTION small-signal stability of the proposed network is analyzed in
Section III. This clock was implemented on a test chip; circuit

T HE CLOCK distribution network of a modern micropro-


cessor uses a significant fraction of the total chip power
and has substantial impact on the overall performance of the
details and results are presented in Sections IV and V.

II. MODELING RANDOM SKEW


system. For example, the 72-W 600-MHz Alpha processor [1]
dissipates 16 W in the global clock distribution, and another A. Assumptions
23 W in the local clocks: more than half the power goes to Given sufficiently accurate models, systematic skew can
driving the clock net. The clock uncertainty budget for a global be corrected at design time. Therefore, the primary interest
clock is 10% of a clock period, which translates to a 10% reduc- is random zero-mean variations. For the sake of comparing
tion in maximum operating speed; as argued below, this penalty architectures, we make several simplifying assumptions.
is likely to increase for currently popular clock architectures. 1) Delay mismatch, both static and dynamic, is proportional
Most conventional microprocessors use a balanced tree to dis- to total delay.
tribute the clock [1]–[3]. Because the delays to all nodes are 2) Wire RC delay is independent of gate delay ( ).
nominally equal, trees may be expected to have low skew. How- 3) The clock period proportional to gate delay.
ever, at gigahertz clock speeds a large fraction of skew and jitter 4) Chip size is independent of gate delay.
comes from random variations in gate and interconnect delay. 5) In 0.25- m technology, signal delay across a die equals
The majority of jitter in a clock tree is introduced by buffers and one clock period.
inter-line coupling to the clock wires; a relatively small amount Assumption 1 is inaccurate, but convenient. Mismatch due
comes from noise in the source oscillator [4]. Therefore, a pri- to gradients scales as delay squared; purely random short-dis-
mary consideration in clock design is matching delay along the tance mismatch scales as the square root of delay. For the sake
clock path. of analysis, however, we will assume that uncertainty scales lin-
As clock speed increases, signal delay across a chip becomes early. Assumptions 2, 3, and 4 are approximately true, given his-
comparable to a clock cycle. For example, a 2-cm-long wire in torical data: as the geometries scale the resistance increase in
a 0.25- m process has a delay of 0.86 ns, while the clock might clock wires is offset by lower capacitance; processor cycle time
be as high as 1 GHz; scaling to 4 GHz, the same wire (with is generally on the order of 8–16 gate delays; and chip sizes
optimal buffering) will have a delay of approximately 0.43 ns, hover around mm .
compared to a clock period of 0.25 ns. In all practical cases a Assumption 5 serves to normalize signal delay, chip size, and
signal that takes longer than a clock cycle to propagate would clock speed. It is not coincidental that random variation has be-
be pipelined, and hence re-clocked. The fundamental weakness come a noticeable issue at about the time when cross-die signal
of tree distribution (and networks that depend on tree matching) delay is comparable to one clock cycle: as a heuristic, 10% of a
clock cycle is allocated for unmodeled skew and jitter margin,
Manuscript received March 24, 2000; revised June 24, 2000. This paper was and delay uncertainty is about 5%–10% of delay. Hence, when
supported by the MARCO Focused Research Center on Interconnects, which
was funded at the Massachusetts Institute of Technology through a subcontract
delay across a chip is comparable to clock cycle time, random
from the Georgia Institute of Technology, and supported in part by a Graduate delay is a considerable fraction of the total clock error budget.
Fellowship from the Intel Corporation.
V. Gutnik was with M.I.T. Microsystems Technology Lab, Cambridge, MA
02139 USA. He is now with Silicon Laboratories, Austin, TX 78749 USA
B. Tree
(e-mail: gutnik@mit.edu). To keep internal clock skew low, a tree is generally made deep
A. P. Chandrakasan is with M.I.T. Microsystems Technology Lab, Cam-
bridge, MA 02139 USA (e-mail: anantha@mtl.mit.edu). enough that a tile driven by a single leaf is small compared to the
Publisher Item Identifier S 0018-9200(00)09441-5. size of the chip [5], [6]. In turn, this means that the path from the
0018–9200/00$10.00 © 2000 IEEE
1554 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

D. Active Feedback
As is evident from the given examples, most of the skew
comes from the initial long-distance distribution of a clock to
relatively small loads. A delay-locked loop (DLL) could be
adapted to measure and cancel out wire variations, as shown
in Fig. 3. If the round-trip delay is tuned to an even number of
clock cycles, the wire has nominally 0 delay.
Unfortunately, despite the apparent symmetry, the forward
and reverse paths do not match well for two reasons. First,
“matched” buffers are physically separated. In Fig. 3, should
match , although it would be physically near . is not as
far away from its matched pair as it might be in a tree, but it will
still typically be millimeters away. Second, there is no temporal
correlation. The clock signal passes at a different time than
it passes , so any time-dependent variations, including those
due to power supply and signal coupling, do not match.
Fig. 1. Simulated edge in a grid with skew to the drivers.
Another approach, proposed by Intel, is shown in Fig. 4 [7].
Here, a DLL matches delays to two half-trees; an obvious gen-
eralization, with four DLLs matching quarter-trees is shown in
Fig. 5. Static delay variations of some nearest neighbors are can-
celed out by the DLL to within the precision of the matching of
the comparators. The drawback is that some neighboring nodes,
as and in Fig. 5, are only related through multiple DLLs.
A much better result can be obtained by using DLLs that take
multiple reference inputs, and adjust output phase to be aligned
exactly between the two inputs. The network can then be re-
drawn somewhat more symmetrically, as Fig. 6. (For clarity, the
local tree was not drawn, and the connections to the compara-
tors are abstracted.)
Optimization of the number of tiles is straightforward. In-
Fig. 2. Short circuit power in a grid vs. input tree skew.
ternal skew scales with tile area, so as the number of tiles in-
creases, internal skew falls. However, every boundary between
tiles introduces some skew because of mismatch in the phase de-
clock source to the load is comparable to the size of the entire
tector (PD). Hence, as the number of tiles increases, the number
die. Because the worst-case skew occurs between two adjacent
of boundaries increases. Fig. 7 shows the optimization curves
leaves for which the clock path was completely different, worst
calculated for this clock metric. As in other clock networks,
case mismatch depends on the entire source-to-leaf delay. And
faster clocks require a more finely grained architecture. Jitter in
worse, the problem becomes worse with process scaling. Be-
a DLL network will rise in exactly the same way as it increases
cause RC delay does not scale, delay along an optimally buffered
in clock trees, and for the same reasons. Skew scales linearly
line scales only as ; hence the skew as a fraction of the clock
with because it is comprised of comparator mismatches and
period grows as with falling . delays across each leaf-patch. Note, however, that in a phase-
locked loop (PLL) the noise can be expected to scale with ; a
C. Grid
PLL network like the one in Fig. 6 would have total clock un-
Modern grids are H-tree-grid hybrids: a short H-tree dis- certainty that is a constant fraction of the clock period.
tributes clock to a few (4 or 16, for example) buffers around a
chip, and those buffers drive a clock grid in parallel. Shorting
III. STABILITY
the buffers together helps drive down some of the uncertainty
at the cost of increased short-circuit power during switching We propose a distributed clock network comprised of an
and somewhat slower edge rates. However, rise time scales array of synchronized PLLs. Independent oscillators generate
linearly with , so by the same reasoning as applied to the tree the clock signal at multiple points (“nodes”) across a chip;
scaling arguments, skew as a fraction of rise time will increase each oscillator distributes the clock to only to a small section
with as gate delay falls. When the tree skew exceeds rise of the chip (“tile”) (Fig. 8). PDs at the boundaries between tiles
time, short circuit power dissipation increases rapidly, and the produce error signals that are summed by an amplifier in each
clock edges begin to show an unacceptable kink. Fig. 1 shows tile and used to adjust the frequency of the node oscillator. In
simulated edge shapes with increasing input skew for a grid general, the network need not be square or regular.
driven from a 4-level tree with skews from 0 to 200 ps, and With locally generated clocks, there are no chip-length clock
Fig. 2 shows the corresponding short-circuit power dissipation, lines to couple in jitter; skew is introduced only by asymmetries
plotted as a fraction of -power for the clock grid. in PDs instead of mismatches in physically separated buffers,
GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs 1555

Fig. 3. Low-skew wire with DLL.

Fig. 7. Tile number optimization.


Fig. 4. Matching tree leaves with a DLL.

Fig. 8. Distributed clocking network.

the use of multiple independent clocks [8], this approach pro-


duces a single fully synchronized clock. The rest of this section
Fig. 5. DLL architecture.
examines small and large signal stability of a distributed PLL.

A. Small Signal
In a multiple-oscillator PLL large-signal and small-signal be-
havior are interrelated. In normal operation, the oscillators are
phase-locked, and jitter depends on the network response to
noise. Because startup is expected to take a negligibly small
fraction of time, the connection of the oscillators is optimized
for small-signal behavior rather than to make initial acquisition
more efficient. The linearized small-signal behavior, valid when
the oscillators are nearly in phase, is analyzed first.

B. General Derivation
Fig. 6. Multi-input delay cell DLL architecture.
The block diagram (Fig. 9) of a multiple-oscillator PLL is
essentially identical to the one for a conventional PLL, except
and the clock is regenerated at each node, so high-frequency that the connections between blocks are vectors instead of indi-
jitter does not accumulate with distance from the clock source. vidual signals, and the gains and transfer functions are matrices
Unlike earlier work on multiple clock domains which suggested instead of scalars. This means that the PD becomes matrix ,
1556 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

Fig. 9. Linear system model of a multi-oscillator PLL.

system ), the output of PLL is the input to PLL , as


shown in Fig. 10. is described by

(3)

Fig. 10. One-dimensional PLL array; symmetrical with the dotted-line This system has multiple poles at the same place where a single-
connections. oscillator PLL has single poles.
On the other hand, in a perfectly symmetrical array (call it
), the input to each oscillator is the phase of oscillators
of size , and the loop filter becomes ,a
and (Fig. 10, with the dotted-line connections). The
corresponding matrix. is an in-
matrix is the same because the physical arrangement of nodes
tuitively meaningful matrix. The network of oscillators
is identical, but changes:
is similar to a lumped circuit with a node for each oscillator
and a branch for each connection between pairs of oscillators.
Node voltages in represent oscillator phase, and branch cur-
rents represent the error signals on the output of the PD. is the (4)
conductance matrix for with unity conductance branches.
for a four-oscillator network is shown in (1). Each off-diagonal
entry is 1 if there is a PD between node and node ; To achieve the same phase margin in as in , it is necessary
is the number of detectors attached to node . to lower the gain . This can be shown with a geometrical ar-
gument: in , when the phase of oscillator changes by ,
the change is measured at two PDs, so oscillator feels twice
(1) the feedback that it would have felt in , and at the same time,
oscillators and both adjust in the opposite direction,
giving four times the effective gain. Hence, the gain must be de-
creased by a factor of approximately four. Mathematically, the
DC gain in the loop can be lumped into . largest eigenvalues of is 1, but the largest eigenvalue of
Writing the transfer function in matrix form gives is 3.5. Poles of the symmetrical system, solved via (2),
are plotted in Fig. 12(a). The key difference between and
(2) is the systems’ response to noise. In both cases, noise at frequen-
cies higher than the unity gain frequency are attenuated. For
where is the phase error input to each phase comparator. frequencies much lower than , the response can be calculated
is the reference phase, and are the noise contribu- via (2). Fig. 11 shows a Bode plot of noise at node in response
tions from interconnect and PD mismatch. to a noise source at node . Noise performance of is much
worse for intermediate frequencies because there is no feedback
C. Examples so errors propagate forever. In , the feedback limits the influ-
Matrix is determined by the geometry of the tiles, and ence of preceding stages, and this in turn attenuates noise. For
hence will constrained by the placement of clock loads, which this reason, networks with feedback are preferred, despite the
for this problem is fixed. Assuming the simplest possible PLL, more complicated stability calculation.
. This leaves , , and as design variables. 2) Two-Dimensional Array: A two-dimensional array is an-
There are still far too many choices to find the general op- alyzed exactly the same as is a one-dimensional array, except
timum, but a few examples may help guide the search. that the gain has to decrease by another factor of two because the
1) One-Dimensional Array: A one-dimensional array of os- center oscillators see four neighbors rather than two. A 16-ele-
cillators with PDs between neighbors is the simplest generaliza- ment array in a grid is implemented in this thesis. Its poles
tion of a single PLL. In a perfectly asymmetrical array (call this are shown in Fig. 12(b).
GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs 1557

Fig. 13. Mode-locking example.

Fig. 11. Comparison of noise responses for symmetrical and asymmetrical Because phase is periodic with period , the phase measured
networks. at the PDs . For small ,
, so the nonlinearity is irrelevant. However, with

(6)
so is a stationary point. This is intuitively easy to see, in
reference to Fig. 13: each oscillator leads one neighbor, and
lags behind another neighbor by exactly the same amount. The
net phase error is zero, so clearly there is no restoring force to
drive the phases to 0. Because the nonlinearity does not change
for small deviations from , dynamics about are the same
as those about 0 and hence this state is stable. The locking
(a)
of a distributed oscillator to nonzero relative phases has been
called mode-locking [9]. At startup, each oscillator in a dis-
tributed PLL starts at a random phase, so there is a nonzero
chance of converging to a mode-locked state. Simulations show
that for a network like the one shown here, the system ends
mode-locked from of random initial states. The proba-
bility goes up rapidly with the size of the system; a array
ends up mode-locked well over 99% of the time.
Pratt and Nguyen proved several useful properties about sys-
tems in mode-lock [9]. The key result, generalized for non-
Cartesian networks, is that for a system in mode-lock, there
must be a phase difference between two oscillators such that
where is the number of nodes in the largest minimal
(b) loop in the network and a minimal loop is a loop in the graph
that cannot be decomposed into multiple loops
Fig. 12. Root locus for 1-D and 2-D PLL arrays. (a) 1-D array. (b) 2-D array.
This result suggests a way to distinguish between
mode-locked states and the desired 0-phase state: in mode-lock,
D. Large Signal: Mode Locking there must be at least one branch with a large phase error. If the
The analysis of the previous section indicates that fully con- gain of the PD is designed to be negative for a phase difference
nected networks should have a better noise response than asym- larger than , then all mode-locked states are made unstable
metrical networks. However, the feedback allows the possibility without affecting the in-phase equilibrium. Pratt and Nguyen
of undesirable large-signal modes. Consider the matrices for a suggest that XOR PDs preclude mode-lock in a rectangular
PLL network: network of oscillators because the response decreases for phase
errors larger than , [9]. This result follows directly from the
result derived above: in a rectangular array, the largest minimal
loop has four nodes, so . A PD described in the
next section, with , would be useful in nonrectangular
networks, and where more gain near 0 phase is desirable.

IV. IMPLEMENTATION
(5) The distributed clock network generates the clock signal with
PLLs at multiple points (“nodes”) across a chip, and distributes
1558 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

Fig. 14. Ring oscillator schematic

Fig. 16. Simulated phase transfer curve

Fig. 15. Phase detector (PD).

each only to a small section of the chip (“tile”) (Fig. 8). PDs
at the boundaries between tiles produce error signals that are
summed by an amplifier in each tile and used to adjust the fre-
quency of the node oscillator
Because the proposed network has many nodes, the power
and size constraints on each node are even more stringent than
the constraints on a single global PLL. The oscillator, PD, and
loop filter of a working demonstration chip, fabricated in a stan- Fig. 17. Locking behavior of the PLL array
dard 0.35- m single-poly triple metal process, are considered in
turn below. loaded arbiter which acts as a nonlinear PD. For no input phase
difference, the output is balanced. As the phase difference in-
A. Oscillator
creases from zero, one output will be asserted for the full du-
The demonstration chip used an nMOS-loaded differen- ration of an input pulse, while the other output will be asserted
tial ring oscillator as a voltage-controlled oscillator (VCO) for only the remainder of the input pulse duration after the first
(Fig. 14). Transistors comprise the differential input pulse ends, which is equal to the input phase difference.
inverter. The differential pair is , the tail current is driven Thus the detector has very high gain near zero phase error that
by , and act as the nMOS load. The nMOS loads allow drops off to zero as the input phase difference approaches the
fast oscillation and shield the output signal from noise. input pulse width (Fig. 16).
is a low-pass version of generated by subthreshold The pulse generators and enable this arbiter to give
leakage through PFET ; supply noise coupling in through frequency-error feedback. If one input is at a higher frequency
of is bypassed by . The oscillation frequency than the other, its output will be asserted for more input pulses
is only dependent on the supply voltage through capacitor than the other. Because the width of the pulses is independent
nonlinearity and the output conductance of , and feedback of input frequency, the average output voltage corresponds to
of the PLL compensates drift of and . frequency. Unlike a typical phase-frequency detector, however,
the strength of the error signal falls to zero as frequency differ-
B. Phase Detector (PD) ence goes to 0, so there can be no mode-lock problems, yet large
The PD, shown in Fig. 15, has a sufficient nonlinearity, higher signal frequency (and hence, phase) locking is enhanced. Fig. 17
gain at small input phase difference and less high-frequency shows the large signal correction and small signal behavior of
content than an XOR PD. The core ( ) is an nMOS- the entire array of PLLs as the already internally locked array
GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs 1559

Fig. 18. Loop filter schematic.

Fig. 19. Frequency-locked divider outputs.


Fig. 20. Micrograph of the 16-oscillator 1.3-GHz chip.

approaches and locks to the reference clock. The detector fits in


m m. PD was placed between one of the nodes and the chip clock
input to lock the network to an external reference. The output
C. Loop Filter of the 16 oscillators was divided by 64 and driven off chip. At
The loop filter is shown in Fig. 18. make up ampli- V, the divided outputs were seen to be frequency
fier , while make up . The differential output locked at 17 to 21 MHz, corresponding to oscillator phase lock
currents from the PDs at the edges of each tile are summed at at 1.1 to 1.3 GHz. An oscilloscope plot of four locked output
nodes and , and drive both amplifiers. is a single signals is shown in Fig. 19. Long-term jitter between neighbors
stage differential pair so it has relatively low gain but a band- is less than 30 ps. Cycle-to-cycle jitter is less than 10ps. The
width limited by . has a high-gain cascoded stage oscillators, amplifiers and all the biasing draws 130 mA at 3 V.
driving a common source PFET . is a large gate ca- A chip plot is shown in Fig. 20. (The rest of the area on the
pacitor which serves to set the dominant pole of such that mm mm chip is taken up by test circuits.)
the PLL network is stable. is biased at very low current to
boost gain and enable a low time constant (as low as 12 kHz) VI. CONCLUSION
with a m m gate capacitor. The simple design and
Design and measurements on this chip confirm that gener-
feed-forward compensation allow the loop filter to fit in only
ating and synchronizing multiple clocks on chip is feasible. Nei-
m m. Each clock node, consisting of an oscillator
ther the power nor the area overhead of multiple PLLs is sub-
and a loop filter, takes just m m.
stantial compared to the cost of distributing the clock by con-
ventional means. Most importantly, a distributed clock network
V. RESULTS can take advantage of improved devices by shrinking the size of
A chip was fabricated with a array of nodes and PD the cells, lowering the overall skew and jitter, so performance
between nearest neighbors. Counting one node and two PDs the will scale with the speed of devices, rather than with the much
area overhead is approximately 0.0038 mm per tile. Another slower improvement of on-chip interconnect speed.
1560 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

REFERENCES Vadim Gutnik (M’00) received the B.S. degree in electrical engineering and
materials science from the University of California, Berkeley, in 1994, and the
[1] D. W. Bailey and B. J. Benschneider, “Clocking design and analysis for
S.M. and Ph.D. degrees in electrical engineering from the Massachusetts Insti-
a 600-MHz Alpha microprocessor,” J. Solid State Circuits, vol. 33, no. tute of Technology, Cambridge, in 1996 and 2000, respectively.
11, pp. 1627–1633, Nov. 1998.
Previous research interests have included micromechanical resonators, and
[2] C. F. Webb, “A 400-MHz S/390 microprocessor,” in ISSCC Dig. Tech. variable-voltage power supplies. He is currently working as a Design Engineer
Papers, Feb. 1997, pp. 168–169. at Silicon Laboratories, Austin, TX.
[3] T. Yoshida, “A 2-V 250-MHz multimedia processor,” in ISSCC Dig.
Dr. Gutnik received an NDSEG fellowship in 1994, and the Intel Foundation
Tech. Papers, Feb. 1997, pp. 266–267. Fellowship in 1997.
[4] I. A. Young, M. F. Mar, and B. Bhushan, “A 0.35-m CMOS
3-880-MHz PLL N/2 clock multiplier and distribution network with
low jitter for microprocessors,” in ISSCC Dig. Tech. Papers, Feb. 1997,
pp. 330–331. Anantha P. Chandrakasan (M’95) received the
[5] H. B. Bakoglu, J. T. Walker, and J. D. Meindl, “A symmetric clock- B.S., M.S., and Ph.D. degrees in electrical engi-
distribution tree and optimized high-speed interconnections for reduced neering and computer sciences from the University
clock skew in ULSI and WSI circuits,” in IEEE Int. Conf. Computer of California, Berkeley, in 1989, 1990, and 1994,
Design, NY, Oct. 1986, pp. 118–122. respectively.
[6] P. Zarkesh-Ha, T. Mule, and J. D. Meindl, “Characterization and mod- Since September, 1994, he has been the Analog
eling of clock skew with process variations,” in Proc. IEEE 1999 Custom Devices Career Development Assistant Professor of
Integrated Circuits Conf., pp. 441–444. electrical engineering at the Massachusetts Institute
[7] G. Geannopoulos and X. Dai, “An adaptive digital deskewing circuit for of Technology, Cambridge. His research interests in-
clock distribution networks,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. clude the ultra-low-power implementation of custom
400–401. and programmable digital signal processors, wireless
[8] F. Ançeau, “A synchronous approach for clocking VLSI systems,” J. sensors and multimedia devices, emerging technologies, and CAD tools for
Solid State Circuits, vol. SC-17, no. 1, pp. 51–56, Feb. 1982. VLSI. He is a co-author of the book titled Low Power Digital CMOS Design
[9] G. A. Pratt and J. Nguyen, “Distributed synchronous clocking,” IEEE (Norwood, MA: Kluwer, 1995). He has served on the technical program com-
Trans. Parallel and Distributed Systems, Mar. 1995. mittee of various conferences including ISSCC, VLSI Circuits Symposium,
DAC, ISLPED, and ICCD. He is the Technical Program Co-Chair for the 1997
International Symposium on Low-Power Electronics and Design and for VLSI
Design’98.
He received the National Science Foundation Career Development Award in
1995, the IBM Faculty Development Award in 1995, and the National Semicon-
ductor Faculty Development Award in 1996. He received the IEEE Communi-
cations Society 1993 Best Tutorial Paper Award for the IEEE Communications
Magazine paper titled, “A Portable Multimedia Terminal.”
ISSCC 2000 I SESSION 10 I CLOCK GENERATION AND DlSTRl6UTlOH I PAPER TA 10.5

TA 10.5 Active GHz Clack Network using Distributed PLLs difference iiicrcasos rrorti eeru, one output is asserted for the fiill
durntiun of nn input pulse, while thc nthcr output is asscrtcd for
\ladim Gutnik, Ananfha Chandrakasan only tlie remniiidcr of the input pu1.s~rlurnt.ion after tlic first input
pulse ends, which is cqual to the input phase differcnco. Thus the
MIT Microsyslcms Technology I.ah, Cambridge, MA dotectnr has high gain near zero phnsc error that drops nff to zero
A S the input phnsc rliffcrence apprnnclics the iiiput pulw width

Mout modern microprocessors usc a balanccd tree t u distributr? t h o (Figurc 10.5.4).


cluck [I].Hnwcvcr, n I gigahertz cloclr spccdfi nn increasiiifi-iirnction
of slrew and jitter conies from random vwiatims in gate and The pulse generators P, and P, cnnble this orbitcr to give fwquency
intercunnccf dclny.The majority nf jit.tcriii a clock tree is inti>odiiccd error ficcdl)ncl~, Irone input is at n higher frequcncy than Lhe oLher,
by bufycri; and inler-linc coupling to Ihc clock wircs. A rclntivcly its output will bo asscrtad Tor tilore input pulscs Ihnu the othcr.
siiiall amount comes from noisc in the s o i i i ~ eoscillator 121. Bocnusc the width of tho pulscs is independcnt o f inpul frequency,
thr! average output voltngc corresponds t.n frequency. Unlikc n
Thin distributed cloelr notwo~lcgcrierateu tho clnclc sigrrnl with typicnl phasc-frcqucncy dcleclor, howcvcr, the fihangth uf thc error
plinsr! locked loops (I’LLs) at multiple points (iiorlcs) ncrosfi n chip, signal falls to zcro as frequency differenct! gncs to 0, Y U thcrc can br!
anddistdxdeil each oiily to a ~ i n n lscction
l ofthe chip(tilc) (Figure no inodelock prublcms, yet large eigiial frequency- (and hcncc,
10.5.11. Phave detectors (PD) til Ihe bouiidarics lictwesn tiles phase-) looking is cnhnnced. Kgurc 10.5.5shows the large-signnl
product error signals thnl are summed by an amplifier iii ench tile currectioii and smnll-signal bchavior oEths entire array of PLLs na
nnd used t u acljnst tho rvequency uf the node oscillnlor. the already intcrnnlly-locked array nppronches and lnclts tn the
refereiicc clnck. Thc PD fits in 3Ox30[nn2,
With locnlly-@nerated claclcs, thew are no chip-lctigth clock lincs
to couple in jittcr; slrcw is intrwluccd only by asynimclrics in phase One loop filler is sssociakd with cnch VCO. Tu avoid the wries
detectors itisteacl of mismatches in physically scpayntcd hflers; resistor of a charge puiiip with passive RC compensation, a f e d
and the clocli i8 regeneratcd at cnch node, xu high frequency jitter rorwmd compensation inelhod is uscd. Thc loop filter of Figure
docs not nccuniulatc with distance from the clock mime. Uillikc 1O.S.G consists of t w o differential ainplifiers. M, - M, mdKC up
cnrlicr work on multiple clock dornains which suggmls usc uf nuiplifier A,, whilc M9- M,? nialc np A,. The differential output
ii>iiltiplcindependent clocks, this approach prnrluccs n single fully- currents fiom tlic PDs at the edges cifcach tile nre s u n n ” at nodcs
synchronized clock. This nrbitrary iictworlc of tiles, riicli with its In+ and In-, nnrl drive buth amplifiers. A, is A singlc-stagcdiffcrcri-
o w n PT,L, is more gcnornl than piwiuus activc skcw Inmagemelit lial pair so it hafi relatively low gain bul a bandwidth limitcd by
approaches (31. g,:,dC,*. A.. hnr: n high gain cascndcd stnge driving a cninmoli source
pPE‘? M,7, M,, is n large gate capacitor which SCIVCS to set Ihe
IIowcvcr, because thcm nrc many nudes, tlic powcr nnrl size cuii- doniiiiaril polo n f ?,I2 such that t h o PLI, network is stablo. Mi, i s
stthnints o n each clcmcnt d n distributed clock gcnci*ntiontiwhitcc- biased a t l o w current t o boost gain nnd ennble tiinc constnnt as low
t w c nre evenmorc stringcnl lhnn the constraints on n single, global 88 l21rHz with A 1.5x15!iin2 gate capncitor. The simple dcsigti arid
PLT,. Furtliannore, thcrc niiisl be t i way to cnsurc thnl the mnltiplc feed-fonuartlcoinpciisationalluwthc loopfiltcrm fit iiionly 1 5 ~ 4 5 p ~ i ’ .
nodesgetniid stay sjnchronized.‘l‘hsuscillator,pliascdetectur,and Each clock nndc, corisiuting of nil oacillnlor s n d a loop filter., Inkes
loopfiltcr oTs wurking d c m ” m l i o n chip, fabricated i r i afilaildnrd j u s t 4 ~ X 4 ~ ~ l i i i 2 ,
0.35pn1, singlo-poly triplc-mctnl process, arc considered in turn
bclo\v. A chip was fnhricnlcd with a 4x4 away o f rrodes and PD hctwccri
ncnrest neighbors. Counting one norlo and two I’Ds, the a r m
This chip iiaes an nMOS-londcd differential ring oscillalar as a overhead is approxirnnlely 0.003Briini2 pcr tile. Another PD is
voltage-contrullcdoscillator (VCO)t o minimize povvor supply noisc betwecn nnc ofthr? nodes and thccliipcloclc input tu lock the nctworlc
(Fig-orc 1.0.5.2). ’I’ransistom - M, conipriso thc differential
to nn external rofcrcncc. Tlie output of the 1.G oscillators is divided
invartor. The differential pair i s ML,+, Lhc tail currant is drivcIlhy M, by 64 end drivcli off chip. At ,V, 5 W ,thc divided o1itput.s arc!
nndM,,: act as thcnMOS lonrl. Thenn’lOS loads allow fnst oscillation frequency-lockod n t 17 1~ZlMHz, cnrrcspotidingt u oseillntor phase
and shield the output signnl kom VlIn noisc. V,,im,i s a low-pnss lock at 1.1 to 1.3GIIx. An oscillascopc plot of four locked output
vcraioni ofV,, gcncmtcd I~ysuhlhresholdlcak~~c llirongh pFlN M,; signals is showti i t ] Figure 10,5.7.
supply noivc coupling in through C,c,iuf M4,7 is bypassed by M, . ‘I’hc
oscillntiun frcqucncy ifi dependent on the supplyvnltngconly through J,ong-tcrm jit,tcr bctweeii neighbors is less Lhan 30ps r m . Cycle-Lo-
capacitoi. nonlinearity and Llie output conductnncc of M4:7, and cyclc jittcr is loss than 101)s.‘I’hcoscillntora, nriiplificrs and nll the
fecdback o l the I’LL compcnsntcs drin of Vnn aiid Vb-k,4. hiaaing draws 1301n.4 nt 3V. A chip plot is shown in Figure 10,6.8.
Thc res1 ofthe area on tlic 3x3min2chip is tnkcn tip by test c i r c u i t s .
Bocausophnseis periodic, asotnfescillatorviieednotall bo in phase
t u each liavc xcro net phase error. Tliifi phcnoinenon, niodclnclr, is Dcsigri nod measureincnt.on t.his chip confirm tlint generating and
describcd in Rererenee [41, which n o l c s Lhat inodelock can be sy~~chroriizing multiple clocks on chip is feasihlc. Neither the powcr
avoidcd by iisiiig I’Us whosc response decreascs nionotonicnlly nor tlie area ovcrhond of rnultiple I’LLs i s solistantial compared to
bcymid a phnse difference of d2, ns from an XOH phnsc! detector. thc cost of distributing the clock by convcntionnl incans. h h t
Notc that this solution precludcs thc use of B phase-ficqireoq importnntly, a distrilrrutcd clock network can lake advantage o f
dctcctor (PFD). Lack nf a PFD is Iwoblematic bccausc the cepture improved devices by shrinking thc s i x of llie cells, lowcriiig the
bandwidth of a nicnio~ylcfifiPld, is liiiiitcd t o a fcw percent of thc overall slrcw m r l j i l l a r , 80 performancc will scale with device speed,
center keqwncy, wliilo thc cenler frcqucncics o f widely-spced rather than with the much sluwcr improvement of on-chig intercon-
oficillntors un a chip can cnsily vary by 10-20%, nect spccd.

The 1’D prnposcd here, shown in V i w w 10.5.3, E ~ R R sufficicnt. Auknowlcdgii I C RIS:


nonliIicm,ily, higher gain at small input phase diffcrcncc and less The nutliors ~clmowlrxlge~ ~ p p from
~ r tlie
t MRR.CO Focused Re-
high-frequency content thnn nn XOE 1’U. ‘I’lie cord (RI, - MJ iu an senrch Contcr mi Interconnects fuiidod nlrM12’ through a suhcon-
nMOS-loaded arbiter which ncts ns U noidincar plinfic delector. For Lrnct from Georgia Tccli. Vodim Gutnilr was partly nuppo~.tedby a
iio input phnse difference, thc output i s balanced. An tlic pl~nse gmdutite fellowship from 1nt.el Corp.

174 2000 IEEE International Solid-State Circuits Conference 02000 IEEE


0-7803-5853~8/00/$10.00
ISSCC 2000 / February 8,2000 I Salon 9 / 10:45 AM

Keferences : Chip Boundary


[I] Bnilcy,D. W.nnd B. 6.Bcnschncidcr, “Clucltinrdcsign and analysis fora
GOO M l h Alplin rilict’opr.ocfsanr.,” ,Joiwrialof Solid Stnte Circuits, vol. 33,110.
11, pp. 1G27-1639,Novcnihcr 1998. i l e Boundary
, M.I”.Mar, and 13. Ithushan, “A0.3:im CMOS :1-H8OMHz 1’LL
121Y U U I I.~ A.,
NL2 cloclc nlulliplicr il rrtl dinidmiion netwnrk with low jitter formicruproccs-
AOIS,’’ in ISYCI: 13igwt nfTechnicn1 l’flpers, Fcbrtwry 1007, pp. 330-331,
[YI Genuunpouios, G. wid X. Diii, ‘‘An utlnptivc digital rlnskcwirlg.circuit fnr
cloclt didtimihillion
riotworks,” in tHHCC Uigent nfl’cclinicnl Papers, hhruury
19!M,pp, 400-4(1 1.
1.11 Pmtt, G . A. nnd J. Nguycn, ”Distributed synclirorrou:, clocking,” 1l4l1Cl3:
Truns;ictionson Pnrallcl a n d Ilish~ibutetlSystems, Pehriinry 1996.

............................................

Vt>iaa h14 M7

I
Vout
X t Loop Filter
6t vco

1
Figtire 10.6.1: Dicltribiited docking network.
I I

~6~
4 g.............................................. --I ~-
I -
501 .- - --
~

Figure 10.6.2:Ring oscillator schcmntic.


................. ,’.’...”.”.....

fi 5. :; P
M4
AI
v
Y2

M2

:...............i ................
:
1 . I .
-0.2 -0.1 0 0.1 0.2
TIme dlfference (nanoseconds)
-
-
Figure 10.6.3 Phase detector. Figure 10.6.4 Simulutcd I’D tmnsCcr curve.
6

0 ’ I 1 I ‘ I

Figtire 10.6.6: Loup filter schematic.

,.031 . I ., I . L ~ . A I - -dPigurc 10.6.7: SCCpngc! 454.


0.5 1 1.5 2 2.5 3 3.5
Slmulatlon time (mlcmsoconds)
Piguro iU.S.6: Locking behavior of the I’LL array. Fignrc 10.5.8: See pnge 4Gt.

DIGEST OF TECHNICAL PAPERS 175


ISSCC 2000 PAPER CONTINUATlONS

Figure 10.1.5: Clock and data rccovery (CUR.)with MPM.

- - Sitnilkition B 3.1 GHz


I3-3simplihed Foimiili~Cl 3.1 GH7
- Complelc Formtila 0 3 , i l i t l z

01
50 100 a
Spacing b c l w o n Signal Linu and G r o w l Line (11 111)

Vibpro 10.4.8: Miorngraph. Figure 10.4.R Wire inductancc formulae.

Figurc 10.4.7: Wire inductance with substrate effect.

454 2000 IEEE International Solid-state Circuits Conference 0-7803-5853-8/00/$10.0002000 IEEE


ISSCC 2000 PAPER CONTINUATIONS

G:S nimaf moulh

1
Figure 105.8:nistrilmtcd dock chip.
fixcl vcrtcx dclcctnr 1 100M
Nuinber nf cbnnncls
I'nwer I cllamcl
Arcalclinnncl
< 1UOJI\V
1 5 0 hu 400 luii'
] CMOS and
Figure 11.1.7: Ruiigc finder ASIC in 0 , 8 ) ~ m
Track pnsilion resolulioii I 15 [I111 p:rckagcd transmitter-receiver.
Tntd m a ,8 Id Core aren of chip is 1xl.67min2.
tarliation
. . dose (10 yrs) li)-30Mndr~10"1icutnms/rii~
Tracker
Numlicr of ctiaiinels 12M
Power I cliilnncl < 3n1w
Trnck posilioii rewliitioii Sfl-10011nr
Bndinliun dose (10 yrs) I O Mnd $. I O " ILICI~I' -
Caloriiiiclor
- --
Niimbcr o f clinnnds LOOK
Snmpljnfi rate I 2 liil al 40 MI lz
Rrtlialion d u e (10 ym) 500 KmdtlO"n/ctn' (lmrrtil)
Zfl Mmrlt Ill"n/cin? h l c a l x )
Munti dctccior
Nuinbcr nf chinncla BW K
T i m i y rcaoliition .7 11s Figutw 11.3.4: 130 chtinnel protolype micrograph.
Kadialion ilme ( I O yrsi IflKrnd t IO" n l c d
Chip is 2x8mm2.
Dala rnte i i h level I I Tbit/scc
lrigger

-4
I25 by 50 iiiicrotis
hiialog 2.2 V Di si131 I .Ir V
13" 6.6 and 5.2 pA
11.9

Tlircsliulil of Iiil channcli


incrmwd b y 1000 eIecImis.
~

arid Ihe m i s e iruill 220 Iu 450


cleclrniis rim.

Pigurc 11.3.8. Measured prutotypc charnctcristics. Pimrc 11.2.7: Dic micrograph.

DIGEST OF TECHNICAL PAPERS 455


IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000 377

An All-Analog Multiphase Delay-Locked Loop Using


a Replica Delay Line for Wide-Range Operation and
Low-Jitter Performance
Yongsam Moon, Student Member, IEEE, Jongsang Choi, Kyeongho Lee, Member, IEEE,
Deog-Kyoon Jeong, Member, IEEE, and Min-Kyu Kim

Abstract—This paper describes an all-analog multiphase work over process, voltage, and temperature (PVT) variations.
delay-locked loop (DLL) architecture that achieves both Since DLL's adjust only phase, not frequency, the operating fre-
wide-range operation and low-jitter performance. A replica quency range is severely limited. We propose a new DLL archi-
delay line is attached to a conventional DLL to fully utilize the
frequency range of the voltage-controlled delay line. The proposed tecture that operates in a wide frequency range while keeping
DLL keeps the same benefits of conventional DLL's such as good the low-jitter performance.
jitter performance and multiphase clock generation. The DLL Various wide-range DLL architectures [2]–[7], with similar
incorporates dynamic phase detectors and triply controlled delay motivations, have been developed, which can be classified
cells with cell-level duty-cycle correction capability to generate into three categories: analog type [2], digital type [3], [4], and
equally spaced eight-phase clocks. The chip has been fabricated
using a 0.35-µm CMOS process. The peak-to-peak jitter is less dual-loop type [5]–[7]. While a conventional analog DLL [1]
than 30 ps over the operating frequency range of 62.5–250 MHz. uses a voltage-controlled delay line (VCDL), the wide-range
At 250 MHz, its jitter supply sensitivity is 0.11 ps/mV. It occupies analog DLL [2] uses phase mixers for wide-range operation.
smaller area (0.2 mm2) and dissipates less power (42 mW) than However, because of its relatively high analog complexity,
other wide-range DLL's [2]–[7]. the analog DLL requires a process-specific implementation,
Index Terms—Delay-locked loop, duty-cycle correction, dy- making it relatively difficult to port across multiple processes
namic phase detector, multiphase clock generation, replica delay [4]. Thus, digital DLL's [3], [4] have been proposed for
line, triply controlled delay cell. better process portability. However, skew error and jitter are
increased due to continuous change of phase selections among
I. INTRODUCTION quantized delay times with supply and temperature variations.
To overcome these problems, dual-loop architectures have been

A S THE SPEED performance of VLSI systems increases


rapidly, more emphasis is placed on suppressing skew and
jitter in the clocks. Phase-locked loops (PLL's) and delay-locked
proposed [5]–[7]. In [5], a PLL is added to make the core DLL
lock to a reference frequency, and a phase mixer interpolates
two intermediate clocks in the core DLL and produces the
loops (DLL's) have been typically employed in microproces-
final output clock. Or, almost continuous phase is obtained
sors, memory interfaces, and communication IC's for the gener-
with addition of a fine delay line [6] or a phase interpolator
ation of on-chip clocks. However, it becomes increasingly diffi-
[7] to a digital DLL. However, additional chip area and power
cult to reduce the clock skew and jitter, whether they are inherent
consumption of these wide-range DLL's are excessive, and
or result from substrate and supply noise, as the clock speed and
furthermore, their jitter performance gets worse compared with
circuit integration levels are increased.
conventional DLL's since the number of delay cells or gates in
While the phase error of PLL's is accumulated and persists
the clock propagation paths becomes larger.
for a long time in a noisy environment, that of DLL's is not ac-
We propose a new DLL architecture that achieves a large
cumulated, and thus, the clock generated from DLL's has lower
operating range by attaching a replica delay line in parallel
jitter. Therefore, DLL's offer a good alternative to PLL's in cases
with a conventional analog DLL. Since the replica delay line
where the reference clock comes from a low-jitter source, al-
occupies one-fourth the area of the core DLL, it incurs only
though their usage is excluded in applications where frequency
a small increase in chip area and power consumption. Since
tracking is required, such as frequency synthesis and clock re-
the replica delay line is out of the clock propagation path, it
covery from an input signal. However, the main problem of con-
does not do any harm on low-jitter performance. While other
ventional DLL's [1] is that they are very difficult to design to
wide-range DLL's [2]–[7] use phase mixers or phase selec-
tions to generate a single output, the proposed DLL uses a
Manuscript received July 20, 1999; revised October 6, 1999. similar multistage analog VCDL to what conventional analog
Y. Moon, J. Choi, and D.-K. Jeong are with the School of Electrical DLL's use. Therefore, the proposed DLL can generate mul-
Engineering, Seoul National University, Seoul 151-742 Korea (e-mail:
ysmoon@griffin.snu.ac.kr). tiphase clocks without using excessive amount of hardware.
K. Lee was with the School of Electrical Engineering, Seoul National Univer- Furthermore, by incorporating a dynamic phase detection cir-
sity, Seoul 151-742 Korea. He is now with Global Communication Technology cuit and cell-level duty cycle correction method, the multi-
Inc., Los Altos, CA 94024 USA.
M.-K. Kim is with Silicon Image, Inc., Sunnyvale, CA 94086 USA. phase clocks are equally spaced even in high-frequency oper-
Publisher Item Identifier S 0018-9200(00)00538-2. ations. A prototype DLL designed for eight-phase clocks can
0018–9200/00$10.00 © 2000 IEEE
378 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Fig. 2. Block diagrams of (a) a digital DLL and (b) a dual-loop DLL.

Fig. 1. Block diagram of (a) a conventional DLL and (b) a DLL locking
operation and operating frequency range limitation.

be used in applications such as gigabit serial interfaces [8],


[9].
This paper is arranged as follows. Section II describes a con-
ventional analog DLL and includes an analysis of its operational
frequency range. This section also overviews other wide-range
DLL architectures. In Section III, the proposed architecture is
presented with design ideas, issues, and various analyses. Sec-
tion IV describes various circuits used in the design. Section V
discusses the prototype chip implementation and shows experi- Fig. 3. Block diagram of the proposed analog DLL.
mental results. Section VI concludes this paper with a summary.

II. CONVENTIONAL ARCHITECTURES or, equivalently, in terms of

A. Range Problem of Conventional DLL's


A simplified block diagram of a conventional DLL [1] is out-
(2)
lined with its operation mechanism in Fig. 1. When the delay
time of the VCDL is initially smaller (or larger) than
the period of the reference clock (Ref-CLK), the DLL The range of stuck-free clock period is determined by inequality
adjusts until phase difference disappears in a negative (2). If the target clock period satisfies inequality (2), the DLL
feedback loop, as shown in Fig. 1(b). The phase difference is works without the stuck problem. However, it should be noted
detected by sampling the reference clock with the rising edge of that inequality (2) has the maximum range
the output clock (DLL-CLK). Depending on the sampled value, , when . In
a DOWN or UP pulse is generated. These pulses discharge (or addition, if , there is no range of
charge) a capacitor in the loop filter, thereby decreasing (or in- that satisfies inequality (2), and the DLL is prone to the
creasing) the control voltage and reducing the phase differ- stuck problem. Since the PVT variations of can be as
ence gradually. much as 2:1 in a typical CMOS process, the stuck-free condi-
However, if the sampling edge of DLL-CLK deviates from the tion can be satisfied over only a very narrow range of , and
lock range indicated in Fig. 1(b), the DLL falls prey to a stuck thus a time-consuming and tedious circuit trimming job is re-
or a harmonic lock problem. In order to avoid this problem, the quired when process migrations are performed across different
minimum of should be located between processes.
0.5 and , and the maximum between
and 1.5 . These stuck-free conditions can be ex- B. Digital DLL's and Dual-Loop DLL's
pressed as the following inequality: Digital DLL's [3], [4] have been developed to overcome the
narrow frequency range problem of conventional analog DLL's.
A simplified block diagram of a typical digital DLL [3] is out-
(1) lined in Fig. 2(a). Multistage delay cells in the VCDL provide
MOON et al.: ALL-ANALOG MULTIPHASE DLL 379

fixed and quantized delay times. The finite-state machine se-


lects one clock output with closest phase to the reference clock's
by using digital control bits instead of using an analog control
voltage.
Therefore, major drawbacks in the digital DLL's are large
skew due to quantized delay time and large jitter due to con-
trol-bit updates during operation. To increase the resolution and
cover a wide delay range, a large delay cell array must be used,
and that inevitably increases chip area and power consumption.
In order to cope with these problems, Garlepp et al. [4] pro-
posed a phase blending technique in a hierarchical structure for
improved phase resolution. However, the inherent problems of
digital DLL's are not solved entirely. Fig. 4. Configuration and operation of a replica delay line.
Dual-loop DLL's [6], [7] have been proposed to minimize the
problems of digital DLL's. A simplified block diagram of archi-
tecture proposed in [6] is shown in Fig. 2(b). A fine delay line,
which is analog controlled, is attached to a digital DLL in the
subsequent stage. In [7], a phase interpolator is cascaded to a
digital DLL for unlimited phase capture range. These dual-loop
architectures achieve both a wide frequency range and relatively
low jitter performance. However, due to digital DLL's inherent
nature, jitter histogram of the generated clock shows the su-
perposition of two Gaussian distributions [7] resulting from the
control-bit updates. In addition, the overhead of chip area and
power consumption is significant.
Fig. 5. (a) Delay capture range of the replica delay line and (b) gain curve of
CSPD.

B. Delay Capture Range of the Replica Delay Line


III. PROPOSED ARCHITECTURE Fig. 4 shows the circuit diagram and operation waveforms of
the replica delay line. The replica delay line generates a con-
A. All-Analog DLL Using a Replica Delay Line trol voltage Vcr to pass to the core DLL. Vcr is used as a ref-
erence voltage in the core DLL to lock to the input frequency.
Fig. 3 shows a high-level block diagram of the proposed ar- The CSPD takes two inputs ICLK and QCLK. ICLK is directly
chitecture. The delay time of the main analog DLL connected to Ref-CLK, and QCLK is delayed from Ref-CLK
(core DLL) is primarily controlled by a control voltage Vcr, by one delay cell. is equal to the delay difference be-
which is generated from a replica delay line. Another control tween ICLK's and QCLK's rising edges. In the charge pump, the
voltage Vcp fine-tunes . The replica delay line consists pullup current is tuned to three times the pulldown current
of only one replica delay cell, a current steering phase detector . When is high, the charge on the
(CSPD), and a low-pass filter (LPF). The replica delay cell is filter capacitors will be decreased by , and Vcr will
identical to the delay cells in the core DLL. Due to sharing of go down. On the other hand, when is low, the charge will be
Vcr, the delay time of the replica delay cell is almost increased by , and Vcr will go up three times faster.
equal to the delay time of each delay cell in the core DLL. When the feedback loop is locked, a stable value of Vcr will be
They are not exactly the same unless Vcp equals bias. Due to the obtained with the relation of . There-
characteristics of the proposed CSPD [10], is forced to fore, in the locked state, XNOR output has the low-to-high
be one-eighth of . Therefore, of the core DLL be- duration ratio of 1:3 , and the rising edge
comes equal to when the number of delay cells in the core of ICLK leads that of QCLK by one-eighth of .
DLL is eight. With the replica delay line with a wide frequency Fig. 5(a) shows the capture range of the replica delay line
range, the core DLL's operating frequency bounds will be estab- when . This can be derived from the gain
lished, and thus the core DLL will not fall into such a harmonic curve of the CSPD shown in Fig. 5(b). If is smaller than
lock problem as conventional analog DLL's do. With only a neg- 1/8 , the change of Vcr, denoted as , will become
ligible increase in chip area and power consumption, the pro- negative and will increase. That action is indicated in the
posed architecture offers many advantages compared with other gain curve as the corresponding arrow pointing to the right. If
wide-range DLL's. Since the DLL is analog controlled and the is between 1/8 and 7/8 will be pos-
clock path is not extended, the DLL can keep the low-jitter per- itive and the corresponding arrow points to the left. Eventually,
formance of the conventional DLL. In addition, because it uses a will be settled at 1/4 , which represents 1/8 .
multistage analog VCDL, the proposed DLL can generate mul- However, if is larger than 7/8 , the settling point
tiphase clocks. will run away and a harmonic lock problem will occur.
380 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Fig. 6. Replica delay line for high-frequency operation.

The operating conditions explained above, in which the delay


line locks correctly, can be summarized in the inequalities as
follows:

(3) Fig. 7. Block diagram of the core DLL.

or equivalently in terms of

Max
(4)
If the delay range of the controlled delay cell satisfies the re-
lation , the DLL will have a fre-
quency range determined by the entire delay range of the delay
cell. However, even if we make the delay range wider and sat-
isfy in an effort to increase the
frequency range, the lock range is limited to only 7:1.
In some applications where the frequency range must be
larger than 7:1, changing the pump-current ratio of the CSPD
can make the frequency range wider. For example, with
, the frequency range of 9:1 can be obtained.
With , the frequency range of 11:1 can be
obtained.
In high-frequency operations, , especially , may be
too short to drive the XNOR gate. So, a divide-by-two circuit and Fig. 8. (a) Core DLL with cell-level duty-cycle correction and (b) rising and
a pair of delay cells are used to slow down the frequency of falling edge alignment.
Ref-CLK [11]. The new configuration shown in Fig. 6 is effec-
tively the same as the one in Fig. 4 but offers a more robust D. Cell-Level Duty-Cycle Correction
operation in the high-frequency operations.
Fig. 8 shows the core DLL with a cell-level duty-cycle cor-
rection mechanism. In high-frequency operations, clock out-
C. Core DLL puts with a short cycle time can be severely distorted as the
Fig. 7 shows a simplified block diagram of the core DLL. It clock passes through many delay cells. Even if the duty cycle
consists of a VCDL, a dynamic phase detector, a charge pump, of Ref-CLK is 50% at the entrance, that of CLK7 may deviate
and a loop filter. The core DLL generates eight-phase clock out- significantly from 50%. It causes multiphase clock outputs to
puts through eight delay cells (DC's) in the VCDL. The core have phase error, which could be fatal, especially in high-speed
DLL is the same as a conventional analog DLL except that it communication applications. A conventional solution is to at-
has another control voltage Vcr. Vcr from the replica delay line tach duty-cycle correction circuits to all clock output drivers
coarsely determines the delay time of the VCDL so with the price of added area, increased jitter, and further phase
that is equal to in the locked state. In the locked mismatch due to elongated path. So a cell-level duty cycle cor-
state, the eighth clock output, CLK7 in Fig. 7, is aligned with rection is proposed.
Ref-CLK. The second phase detector shown in Fig. 8 takes inverted
In high-frequency operations, there may be some static phase Ref-CLK and inverted CLK7 as its inputs, generating a control
mismatch between CLK7 and Ref-CLK due to the long rise/fall signal Vduty as the output. It fine-tunes the cell current ratio,
times of signal transition edges compared with the period of and thus aligns the falling edges of Ref-CLK and CLK7. In the
the clock. So the fine-tuning is required. The dynamic phase steady state, therefore, both rising and falling edges of CLK7
detector (PD) in the core DLL generates control signal Vcp, fine- and Ref-CLK are synchronized in phase, and both clocks have
tunes , and removes residual phase mismatch so that the the same duty cycle. It should be noted that the duty-cycle cor-
rising edge of Ref-CLK is exactly aligned with that of CLK7. rection circuit (DCC) used right at the input of Ref-CLK corrects
MOON et al.: ALL-ANALOG MULTIPHASE DLL 381

Fig. 10. (a) Dynamic phase detector and (b) its operations.

Fig. 9. Triply controlled DC. (a) Circuit diagram of a DCE and (b)
configuration of a triply controlled DC.

the duty cycle of Ref-CLK only. With cell-level duty cycle cor-
rection, not only CLK7 but also the other intermediate clock out-
puts maintain a 50% duty cycle without any additional circuits.
Although two control voltages Vcp and Vduty are simultane-
ously adjusted in the coupled negative feedback loops, the sta- Fig. 11. Prototype chip microphotograph.
bility is guaranteed by making one of its loops have a sufficiently
low bandwidth.
Since the high and low levels alternate in an inverter chain,
duty-cycle control signals must alternate between Vduty and
IV. CIRCUIT DESIGN
Vduty_b as well. Therefore, Vduty_b controls DCE0 and DCE3
A. Triply Controlled Delay Cell and Vduty controls DCE1 and DCE2, as shown in Fig. 9(b).
According to the noise analyses of [12] and [13], a In the delay circuit, either Vduty or Vduty_b changes the duty
fast-slewing (short rise/fall time) delay cell with a fully cycle of the clock outputs by adjusting the current ratio of
switching capability offers less phase noise. Although offering to . With this mechanism, the multiphase clock
a full swing output, a shunt-capacitor delay cell [14], with its outputs, CLK0 CLK7, will be duty-cycle corrected and
capacitor, would increase the chip area and power. Therefore, equally spaced. There is no need to attach a DCC circuit in
we decided to use the current-starved inverter [15] as a basic each clock output.
controlled delay cell. Since the current-starved inverter does
not require a level conversion circuit, which is required for a B. Dynamic Phase Detector
differential delay cell, it has less chip area and power, although Since the tuning precision of the core DLL depends on the
substrate and supply noise might cause detrimental influence. characteristics of the phase detector, we propose a new high-pre-
A triply controlled delay cell is used as the basic delay cell cision dynamic phase detector. Fig. 10(a) shows the circuit di-
element (DCE). The circuit diagram of the DCE and the config- agram of the proposed dynamic phase detector, which is im-
uration of one unit of DC are shown in Fig. 9. Four DCE's and proved from the published phase-frequency detector [8] by re-
two inverters compose a DC and make its rising/falling delay moving a feedback path and replacing the feedback input with
times symmetric. The delay time of the triply controlled an REF and DCLK signal. The phase detector can operate with
delay cell is determined by six control signals: Vcr, Vcr_b, Vcp, less phase offset at high frequencies due to symmetry of circuit,
Vcp_b, Vduty, and Vduty_b. Of those signals, Vcr and Vcr_b shallow logic depth of only two gates, and fast operation with
come from the replica delay line. In the DCE, the sizes of a dynamic logic circuit. While the widths of UP and DOWN
MP1 and MN1 are made larger than the others' so that Vcr and pulses are proportional to the phase difference of the inputs as
Vcr_b can control and primarily. The other control shown in Fig. 10(b), there remains a chain of short pulses in the
signals, which are generated by the core DLL, make only small locked state. These pulses in the locked state serve to reduce
adjustments to and for the fine-tuning of . the dead zone of the phase detector [8]. However, the accuracy
Vcp and Vcp_b are used to align the rising edges of Ref-CLK of the phase detector is improved when the pulse duration is
and CLK7. Vduty and Vduty_b are responsible for maintaining shorter. Furthermore, smaller capacitor in the loop filter can be
the correct duty cycle and, thus, aligning the falling edges. used since the amount of pumped charge is smaller compared
382 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Fig. 12. Clock waveforms at 62.5 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.

with a conventional “bang-bang” type of phase detector or a pro- show that the clock outputs are aligned with precise phase re-
portional phase detector with wider pulse width. lationships of less than 1% error over an operating frequency
range from 62.5 to 250 MHz. The delay range of the VCDL is
estimated to be between 4 and 16 ns. With minor change of de-
V. EXPERIMENTAL RESULTS
vice sizes of the VCDL, the operating frequency range could be
The test chip has been fabricated using a 0.35-µm, N-well, extended toward a higher frequency range.
triple-metal CMOS process. The threshold voltages in this Fig. 14(a) and (b) shows the jitter histograms in the clock
process are 0.42 V (NMOS) and −0.22 V (PMOS). The output CLK7. The frequency of Ref-CLK is 250 MHz. Fig. 14(a)
gate-oxide thickness is 75 nm. Fig. 11 shows a microphoto- shows 4-ps rms and 29-ps peak-to-peak jitter characteristics
graph of the fabricated chip. The chip integrates the DLL with in a quiet power supply, where only the DLL is activated in
an on-chip decoupling capacitance of 270 pF. The active area the chip. When other digital circuits are turned on, rms and
of the DLL occupies 0.08 mm2 and the decoupling capacitor peak-to-peak jitter are increased to 6.4 and 44 ps, respectively,
0.12 mm2. Since the pulse currents of the multiphase clock and internal supply noise of about 200 mV is measured. If a
outputs are interspersed, the ac component of the supply 500-mV, 1.1-MHz square wave is injected externally on the
current is present at the eighth harmonic frequencies of the power supply, the peak-to-peak jitter increases to 83 ps, as
clock. Therefore, the 270-pF on-chip capacitor is adequate to shown in Fig. 14(b). At 250 MHz, jitter supply sensitivity is
reduce the on-chip supply noise induced by switching of digital measured to be only 0.11 ps/mV. Furthermore, from 62.5 to 250
circuits. MHz, the clock outputs show almost flat jitter performance.
The prototype chip operates from 62.5 to 250 MHz with a Since the delay range of the VCDL in the core DLL is primarily
3.3-V power supply. Fig. 12(a) shows the waveforms of CLK0 set by Vcr and Vcr_b, the gain of the VCDL is nearly flat over
and CLK2 at 62.5 MHz. These clock outputs are the first and a wide range of operating frequency. The jitter performance of
the third clocks, respectively, and have a 90 phase difference. the proposed DLL is better than or at least comparable to other
Fig. 12(b) shows the waveforms of CLK0 and CLK4, which wide-range DLL's [2]–[7].
are an inversion of each other with a 180 phase difference. Table I summarizes the DLL performance characteristics.
Fig. 13(a) and (b) shows the same waveforms at 250 MHz. In The power dissipation is proportional to the operating fre-
spite of some ringing due to capacitance and inductance of the quency. Operating at 250 MHz, the DLL draws 12.6-mA dc
board and measurement instrument, the measurement results from a 3.3-V power supply.
MOON et al.: ALL-ANALOG MULTIPHASE DLL 383

Fig. 13. Clock waveforms at 250 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.

Fig. 14. Jitter histograms at 250 MHz in (a) a quiet supply and (b) with added 1.1-MHz, 500-mV square wave noise.
384 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

TABLE I [10] Y. Moon, D.-K. Jeong, and G. Kim, “Clock dithering for electromagnetic
PERFORMANCE CHARACTERISTICS OF compliance using spread spectrum phase modulation,” in IEEE ISSCC
PROTOTYPE CHIP Dig. Tech. Papers, Feb. 1999, pp. 186–187.
[11] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, “A 62.5–250
MHz multi-phase delay-locked loop using a replica delay line with triply
controlled delay cells,” in Proc. IEEE Custom Integrated Circuits Conf.,
May 1999, pp. 299–302.
[12] B. Kim, “High speed clock recovery in VLSI using hybrid analog/digital
techniques,” Ph.D. dissertation, Univ. of California, Berkeley, Memo.
UCB/ERL M90/50, June 1990.
[13] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and phase noise in
ring oscillators,” IEEE J. Solid-State Circuits, vol. 34, pp. 790–804, June
1999.
[14] M. Bazes, “A novel precision MOS synchronous delay line,” IEEE J.
Solid-State Circuits, vol. SC-20, pp. 1265–1271, Dec. 1985.
[15] D.-K. Jeong, G. Borriello, D. A. Hodges, and R. H. Katz, “Design of
PLL-based clock generation circuits,” IEEE J. Solid-State Circuits, vol.
SC-22, pp. 255–261, Apr. 1987.

Yongsam Moon (S'96) was born in Incheon, Korea, on March 1, 1971. He re-
VI. CONCLUSION ceived the B.S. and M.S. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1994 and 1996, respectively, where he is currently
By including a replica delay line with a CSPD, the core DLL pursuing the Ph.D. degree.
He has been working on architectures and CMOS circuits for microproces-
operates in a wide frequency range from 62.5 to 250 MHz. Since sors. His current research interests include clock and data recovery for high-
the replica delay line occupies a quarter of the area of the core speed communication and high-speed I/O interface circuits.
DLL, the area cost and power consumption of the prototype
chip are much smaller than those of other wide-range DLL's
[2]–[7]. Both the analog-control scheme and the flat gain of
Jongsang Choi was born in Korea on September 11, 1974. He received the B.S.
the VCDL offer a low-jitter performance of 4-ps rms and 29-ps and M.S. degrees in electronics engineering from Seoul National University,
peak-to-peak, and a low supply sensitivity of 0.11 ps/mV. The Seoul, Korea, in 1997 and 1999, respectively, where he is currently pursuing
DLL incorporates dynamic phase detectors and triply controlled the Ph.D. degree.
He has been working on architectures and CMOS circuits for high-speed com-
delay cells with cell-level duty-cycle correction capability in munication. His current research interests include high-speed CMOS circuits
order to generate equally spaced eight-phase clocks. and gigabit network systems.
The DLL can be used not only as an internal clock buffer of
microprocessors and memory IC's but also as a multiphase clock
generator for gigabit serial interfaces. With a faster VCDL with
Kyeongho Lee (S'92–M’00) was born in Seoul, Korea, on August 5, 1969. He
minor change of device sizes, the DLL will operate at a higher received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul
and wider frequency range. National University, Seoul, Korea, in 1993, 1995, and 2000, respectively.
Since 2000 he has been with Global Communication Technology, Inc., Los
Altos, CA. He is working on various CMOS high-speed circuits for RF com-
REFERENCES munication. His research interests include high-speed CMOS circuits and PLL
[1] M. Johnson and E. Hudson, “A variable delay line PLL for CPU-co- systems.
processor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp.
1218–1223, Oct. 1988.
[2] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson,
and T. Ishikawa, “A 2.5 V CMOS delay-locked loop for an 18 Mbit, Deog-Kyoon Jeong (S'87–M'89) received the B.S. and M.S. degrees in elec-
500 Megabyte/s DRAM,,” IEEE J. Solid-State Circuits, vol. 29, pp. tronics engineering from Seoul National University, Seoul Korea, in 1981 and
1491–1496, Dec. 1994. 1984, respectively., and the Ph.D. degree in electrical engineering and computer
[3] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, “Multifrequency sciences from the University of California at Berkeley, Berkeley, CA, in 1989.
zero-jitter delay-locked loop,” IEEE J. Solid-State Circuits, vol. 29, pp. From 1989 to 1991, he was with Texas Instruments Incorporated, Dallas, TX,,
67–70, Jan. 1994. where he was a Member oif the Technical Staff. He worked on modeling and
[4] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, design of BiCMOS circuits and single-chip implementation of the SPARC ar-
C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Leen, and M. A. chitecture. Since 1991, he has been on the Faculty of the School of Electrical En-
Horowitz, “A Portable Digital DLL for High-Speed CMOS Interface gineering, Seoul National University, Seoul, Korea, as an Associate Professor.
Circuits,” IEEE J. Solid-State Circuits, vol. 34, pp. 632–644, May 1999. His research interests include high-speed circuits, microrocessor architectures,
[5] S. Tanoi, T. Tanabe, K. Takahashi, S. Miyamoto, and M. Uesugi, “A and memory systems.
250–622 MHz deskew and jitter-suppressed clock buffer using two-loop
architecture,” IEEE J. Solid-State Circuits, vol. 31, pp. 487–493, Apr.
1996.
[6] K. Lee, Y. Moon, and D.-K. Jeong, “Dual loop delay-locked loop,”, U.S.
patent pending. Min-Kyu Kim was born in Seoul, Korea, in 1965. He received the B.S., M.S.,
[7] S. Sidiropoulos and M. A. Horowitz, “A semi-digital dual delay-locked and Ph.D. degrees in electronics engineering from Seoul National University,
loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997. Seoul, Korea, in 1988, 1990, and 1998, respectively.
[8] S. Kim, K. Lee, Y. Moon, D.-K. Jeong, Y. Choi, and H. K. Lim, “A 960- From 1995 to 1996, he was with the Electronics and Telecommunications
Mb/s/pin interface for skew-tolerant bus using low jitter PLL,” IEEE J. Research Institute, Taejon, Korea, working on the development of high-speed
Solid-State Circuits, vol. 32, pp. 691–700, May 1997. communication IC's for ATM switches. Since 1998, he has been working on
[9] D.-L. Chen and M. O. Baker, “A 1.25 Gb/s, 460 mW CMOS transceiver high-speed serial link technologies at Silicon Image, Inc., Cupertino, CA. His
for serial data communication,” in IEEE ISSCC Dig. Tech. Papers, Feb. current interests include circuit design for high-speed communication systems
1997, pp. 242–243. and digital-interface display systems.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001 417

CMOS DLL-Based 2-V 3.2-ps Jitter 1-GHz Clock


Synthesizer and Temperature-Compensated
Tunable Oscillator
David J. Foley, Student Member, IEEE, and Michael P. Flynn, Senior Member, IEEE

Abstract—This paper describes a low-voltage low-jitter clock digital logic gates are required to convert a conventional DLL
synthesizer and a temperature-compensated tunable oscillator. into a wider range self-correcting DLL. For comparison, in [2]
Both of these circuits employ a self-correcting delay-locked loop a second DLL is added to achieve wider range operation.
(DLL) which solves the problem of false locking associated with
conventional DLLs. This DLL does not require the delay control The synthesizer outlined in this paper operates over a wide
voltage to be set on power-up; it can recover from missing refer- range of input reference clock frequencies and generates a low-
ence clock pulses and, because the delay range is not restricted, jitter output clock running at nine times the reference frequency.
it can accommodate a variable reference clock frequency. The Jitter measurements of 3.2 ps rms and 20 ps peak-to-peak, for
DLL provides multiple clock phases that are combined to produce a 2-V supply and 1-GHz output frequency, show that the core
the desired output frequency for the synthesizer, and provides
temperature-compensated biasing for the tunable oscillator. With DLL compares well with recently reported DLLs [2], [3]. Mul-
a 2-V supply the measured rms jitter for the 1-GHz synthesizer tiple clock phases from the DLL are combined using digital
output was 3.2 ps. With a 3.3-V supply, rms jitter of 3.1 ps was logic to produce the synthesizer output [4]. An alternative ap-
measured for a 1.6-GHz output. The tunable oscillator has a 1.8% proach requiring a pair of on-chip tuned LC-tanks is described
frequency variation over an ambient temperature range from in [5].
0 C to 85 C. The circuits were fabricated on a generic 0.5- m
digital CMOS process. The tunable voltage-controlled oscillator (VCO) is intended
for use in a transceiver where the receive and transmit clocks
Index Terms—CMOS analog integrated circuits, delay-locked
loops, frequency synthesizers, tunable oscillators, voltage con- are plesiochronous. It is possible to tune the VCO around a
trolled oscillators. center frequency while still maintaining good temperature inde-
pendence. In some applications it may also act as a replacement
for a fractional-N-type synthesizer. This circuit is similar to the
I. INTRODUCTION oscillator described in [6] but it uses a lower jitter DLL in place
of the PLL and can operate over a wider frequency range.
T RADITIONALLY, phase-locked loops (PLLs) have been
used for clock synthesis. The synthesizer and tunable
oscillator outlined in this paper employ a delay-locked loop
In Section II the DLL architecture is discussed, starting with
a review of a conventional DLL and progressing to the new
(DLL). A DLL is more stable than higher order PLLs and self-correcting architecture. Section III outlines the clock syn-
requires only one capacitor in its first-order loop filter. On thesizer architecture. This is followed in Section IV by an out-
the other hand, a PLL generally requires a more complex line of the temperature-compensated tunable oscillator archi-
second-order filter. This filter usually employs larger com- tecture. Section V discusses the circuit layout and Section VI
ponents which may need to be off chip. Additionally, a DLL introduces measured performance results for the two circuits.
offers better jitter performance than a PLL because phase errors This paper then concludes in Section VII with a summary of the
induced by supply or substrate noise do not accumulate over achievements of this work.
many clock cycles [1].
The self-correcting DLL overcomes problems of false II. DLL ARCHITECTURE
locking associated with conventional DLLs. A self-correcting
circuit detects when the DLL is locked, or is attempting to lock, A. Conventional DLL
to an incorrect delay and then brings the DLL into a correct A simplified block diagram of a conventional DLL is illus-
locked state. This DLL does not require the delay control trated in Fig. 1. This circuit contains a voltage-controlled delay
voltage to be set on power-up; it can recover from missing line (VCDL), a phase detector, a charge pump, and a first-order
reference clock pulses and, because the delay range is not loop filter. The delay line, consisting of cascaded variable delay
restricted, it can accommodate a variable reference clock fre- stages, is driven by the input reference clock, ckref. The output
quency. This paper describes how a small number of additional of the delay line’s final stage and the ckref falling edges are
compared by the phase detector to determine the phase align-
Manuscript received July 19, 2000; revised October 24, 2000. This work was ment error. The phase detector output is integrated by the charge
supported by Parthus Technologies. pump and loop filter capacitor to generate the control voltage,
D. J. Foley is with the Department of Microelectronics, National University vcntl, of the delay stages.
of Ireland, Cork, Ireland.
M. P. Flynn is with Parthus Technologies, Cork, Ireland. When correctly locked, the total delay of the delay line
Publisher Item Identifier S 0018-9200(01)01483-4. should equal one period of the reference clock. A conventional
0018–9200/01$10.00 © 2001 IEEE
418 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

Fig. 1. Conventional DLL architecture.

Fig. 3. Self-correcting DLL architecture.

provide additional information about the nature of the locked


delay. In the prototype the delayed phases, (1:9), are decoded
to indicate the VCDL delay. If the delay is outside an accept-
able delay range then the lock-detect circuit takes control of the
loop from the phase detector. The lock-detect circuit signals the
charge pump to charge or discharge the filter capacitor until it
is safe for the phase detector to regain control of the loop.
Three control signals are produced by the lock-detect circuit:
over to indicate that the VCDL delay is greater than 1.5 ref-
erence clock periods, under to indicate that the delay is less
than 0.75 clock periods, and release is activated when the delay
reaches 1.25 clock periods. The release signal clears the over
and under control signals and removes the phase detector from
reset. The phase detector then regains control of the loop. If nei-
ther under nor over is active then the phase detector has control
of the loop and the DLL is either in correct lock or approaching
correct lock.
If the DLL is in lock and it is brought out of lock because
of missing reference clock pulses or a step in the input refer-
ence frequency, then the DLL may inadvertently try to lock to
Fig. 2. (a) Three-stage VCDL. (b) Waveforms with correct lock. (c) an incorrect delay. The DLL is allowed to attempt to reach the
Waveforms with false lock. undesired lock delay until it triggers either an over or an under
signal at which time the lock-detect circuit takes control of the
DLL may lock or attempt to lock to an incorrect delay. In Fig. 2 DLL loop.
we show correct and false locking for a three-stage delay line
[Fig. 2(a)]. Fig. 2(b) shows the output phases at each stage C. Lock-Detect Circuit
, and with the delay line in correct lock. The DLL The VCDL output phases are first level shifted to CMOS
control loop has aligned and ckref. The total delay is one levels. The level shift circuitry is designed to have high gain
period of the reference clock. In Fig. 2(c) and ckref are and fast rise and fall times. This helps to minimize any jitter
again aligned but the total delay is two clock periods. The DLL contribution from this circuitry. The level-shifted output phases,
can also falsely lock to three or more periods of delay or can (1:9), are latched on the rising edge of the reference clock. The
attempt to lock to zero delay. outputs from these latches are processed by the decode circuitry
as shown in the schematic of Fig. 4. The inputs, (1:8), corre-
B. Self-Correcting DLL Architecture spond to the (1:8) output phases of the VCDL. Fig. 5 shows
Fig. 3 shows a block diagram of the new self-correcting DLL. example output waveforms for a nine-stage VCDL. In Fig. 5(a)
The problem of false locking is solved by the addition of a lock- when the state of the VCDL output phases is decoded none of
detect circuit and by some slight modifications to the conven- the control signals are activated as the VCDL is correctly locked
tional phase detector. The DLL incorporated in the two designs to one period of the reference clock. In Fig. 5(b) the VCDL is
reported in this paper employs a nine-stage VCDL as shown in incorrectly locked to two periods of the reference clock and the
Fig. 3. state of the output phases is decoded to activate the over control
In a conventional DLL, only the state of the output of the signal.
last delay element is used. From the example in Fig. 2, we can The phase detector outputs, up and dn, signal the charge pump
see that the state at the outputs of the other delay elements can to charge or discharge the filter capacitor. An active over output
FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR 419

Fig. 6. VCDL delay stage schematic.

Fig. 4. Lock-detect decode circuitry.

Fig. 7. Phase detector schematic.

A three-transistor VCR structure is adopted for better control


(a) (b) linearity. The DLL negative feedback control loop compensates
Fig. 5. Nine-stage VCDL waveforms with (a) correct lock and (b) false lock. for variations in the stage delay due to process and temperature.
The differential delay stage structure and coupling capacitors
between bias lines and supply help to minimize supply-induced
from the lock-detect circuit disables the phase detector and ac- jitter noise.
tivates the up control signal. Similarly, the lock detect under
output activates dn. Following power-on reset the lock-detect E. Charge Pump
circuit is initialized by setting over active. This ensures a faster
The charge pump charges or discharges the filter capacitor.
acquisition time for the DLL because the filter capacitor is con-
tinuously charged to a voltage level corresponding to 1.25 ref- The voltage on this capacitor, vcntl, sets the VCDL stage
propagation delay. To minimize the temperature variation of
erence clock periods. At this VCDL delay, the release signal is
the VCDL delay, the charging and discharging currents are
activated and the phase detector gains control of the loop and
proportional to absolute temperature. This helps to maintain a
brings the DLL to lock. The state of the output phases corre-
sponding to a delay of nine reference clock periods is the same constant loop gain and phase margin over temperature.
as that corresponding to a single reference clock period delay.
F. Phase Detector
This circuitry is therefore only capable of detecting incorrect
delays up to eight periods of the reference clock. This is not a The phase detector, shown in Fig. 7, employs the conventional
limitation of the design as any delays above this would be out- sequential-phase-frequency detection scheme [7] but extra gates
side the delay range of the VCDL. In general, the error detection have been included. This extra logic enables the lock-detect cir-
logic can detect an incorrect lock delay up to periods of cuit to over-ride the phase detector control of the loop. The lock-
the reference clock, where is equal to the number of VCDL detect output signals, over and under, now have direct control
output phases. of the charge pump. The lock-detect circuit can therefore charge
or discharge the VCDL control voltage, vcntl, to a voltage from
D. Voltage-Controlled Delay Line (VCDL) which it is safe for the phase detector to regain control of the
loop.
Fig. 6 shows one of the VCDL delay stages. The stage is
designed to operate from a supply as low as 1.8 V and is similar
to that used in [7]. The stage propagation delay is proportional III. CLOCK SYNTHESIZER ARCHITECTURE
to the tail current for the output charging and to the voltage- The clock synthesizer generates a differential output clock
controlled resistor (VCR) resistance for the output discharging. running at nine times the input reference frequency. The clock
420 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

Fig. 8. Clock synthesis waveforms.

Fig. 10. 1.62-GHz clock generation schematic.

Fig. 9. Optimized AND-OR block diagram.

synthesizer employs the DLL structure shown in Fig. 3 to gen-


erate the multiple clock phases that are then combined to pro-
duce the output clock. There are two steps in the generation of
the output clock. The first step combines the nine DLL output Fig. 11. Tunable VCO architecture.
phases, (1:9), to generate three clocks ck1, ck2, and ck3. Fig. 8
shows the clock waveforms. These three clocks are phase sep-
arated by one-ninth of a reference clock period and have a fre-
quency three times that of the reference clock. Fig. 9 shows how
the 1, 4, and 7 output phases are combined in an optimized
AND-OR structure with symmetrical delays to generate the ck1
clock. Using identical logic the 2, 5, and 8 phases produce
the ck2 clock and the 3, 6, and 9 phases produce the ck3
clock.
The second step in generating the synthesizer output clock
is to combine these three clocks in another AND-OR structure
to produce a differential output clock, and , running at
nine times the reference clock frequency; see Fig. 8. This design
produces a 1.62-GHz output clock frequency for a 180-MHz
reference clock frequency. For a 0.5- m 3.3-V CMOS process Fig. 12. Tunable VCO stage block diagram.
there is a bandwidth limitation of approximately 500 MHz for
reliable on-chip clock transmission [8]. The high bandwidth
IV. TEMPERATURE-COMPENSATED TUNABLE VCO
available at the chip outputs is utilized (determined by the ex-
ARCHITECTURE
ternal pull-up resistor and load capacitance) [8] to produce the
1.62-GHz clock as shown in Fig. 10. The AND function of the The temperature-compensated oscillator utilizes the control
clock generation is performed in the chip core and the analog OR loop voltage, vcntl, of the DLL (Fig. 3) to compensate for any
function is performed in the I/O ring. External pull-up resistors temperature and supply voltage induced frequency fluctuation
set the output swing and match the output impedance to that of in a VCO. Fig. 11 shows how the VCO and VCDL stages are
the test equipment. Damping resistors are included to avoid any both connected to vcntl. (For ease of illustration a conventional
oscillations resulting from the combination of the lead and pin DLL is shown in Fig. 11 but in practice the new DLL architec-
inductance and load capacitance. This removes the necessity to ture of Fig. 3 is employed). The VCDL in the DLL tracks tem-
double bond these high-frequency outputs. perature and process variations in the VCO circuit. The VCO is
FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR 421

Fig. 15. Variation of measured jitter over output frequency.

Fig. 13. Die photo of the synthesizer and tunable oscillator.

Fig. 16. 720-MHz synthesizer output for V = 1:8 V.

Fig. 14. 1.62-GHz synthesizer edge jitter histogram.

composed of the same delay stages as the VCDL and its temper- Fig. 17. VCO frequency variation with temperature.
ature (and process) variations will therefore be the same (apart
from some minor random mismatch effects and thermal gradi-
ents across the die). vcntl thus compensates for the VCDL and
VCO temperature fluctuations. The last VCO stage has an addi-
tional tuning voltage, tune, which fine tunes the VCO frequency.
By varying the tune voltage it is possible to tune the VCO center
frequency to within 3%. A wider tuning range can be achieved
by varying the frequency of the DLL reference clock, ckref.
The schematic of the last VCO stage is shown in Fig. 12. This
stage is identical to the other VCO and VCDL stages except that
the VCR contains a transistor which is connected to the external
tune voltage. In all other stages this transistor is connected to Fig. 18. VCO frequency variation with tune voltage.
ground. The extra charging current required in this VCO stage
is provided by the controlled current source bias . ature-compensated tunable oscillator has an active area of
0.7 mm .
V. CIRCUIT LAYOUT
The synthesizer and temperature-compensated tunable VI. TEST RESULTS
oscillator were fabricated on a standard 0.5- m triple-metal Fig. 14 shows a histogram of the edge jitter on the 1.62-GHz
single-poly digital CMOS process. The die photomicrograph synthesizer output clock for a supply of 3.3 V. Edge jitter of
of the device, containing both the synthesizer and tempera- 3.1 ps rms and 20 ps peak-to-peak were measured. The jitter
ture-compensated tunable oscillator, is shown in Fig. 13. The measurements of 3.2 ps rms and 20 ps peak-to-peak, for a 2-V
synthesizer has an active area of 0.6 mm and the temper- supply and 1-GHz output frequency, show that the DLL core ex-
422 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

TABLE I the tune voltage. As can be seen from the plot, the relationship
MEASURED SYNTHESIZER CHARACTERISTICS is close to linear. It is possible to tune the frequency around a
center frequency in the range from 200 to 500 MHz by selecting
an appropriate input reference frequency. This ensures that this
scheme can be used for a wide variety of applications. The mea-
sured jitter on the 400-MHz output was 29 ps rms and 180 ps
peak-to-peak. Table I shows the measured synthesizer charac-
teristics. Table II summarizes the measured characteristics of
the temperature-compensated tunable oscillator.

VII. CONCLUSION
In this paper, a robust self-correcting low-jitter DLL was used
as the basis for a low-voltage high-frequency synthesizer and a
temperature-compensated tunable oscillator. The DLL does not
require the VCDL control voltage to be set on power-up. The
DLL can recover from missing reference clock pulses and it
can track step changes in a variable reference clock frequency.
The synthesizer has significantly lower edge jitter than the tradi-
tional PLL-type synthesizer [9] and other reported DLL circuits
[10], [11]. The temperature-compensated tunable oscillator pro-
vides a temperature-stable tunable frequency that varies by just
1.8% over the 0 C to 85 C temperature range.

TABLE II ACKNOWLEDGMENT
MEASURED TUNABLE OSCILLATOR CHARACTERISTICS
The authors wish to acknowledge contributions from the
following Parthus Technologies employees: J. Ryan, J. Horan,
C. Cahill, F. Fuster, J. Collins, B. Kinsella, M. Erett, and
S. Murphy. The authors also wish to thank R. Fitzgerald from
the NMRC for the die photo micrographs. The device was fab-
ricated on the ESM (Newport) Wafer Fab through Europractice.

REFERENCES
[1] B. Kim, T. C. Weingandt, and P. R. Gray, “PLL/DLL system noise anal-
ysis for low-jitter clock synthesizer design,” in Proc. ISCAS, June 1994,
pp. 151–154.
[2] Y. Moon, J. Choi, K. Lee, D. Jeong, and M. Kim, “An all-analog multi-
phase delay-locked loop using a replica delay line for wide-range oper-
ation and low jitter,” IEEE J. Solid-State Circuits, vol. 35, pp. 377–384,
Mar. 2000.
[3] M. Mota and J. Christiansen, “A high-resolution time interpolator based
on a delay-locked loop and an RC delay line,” IEEE J. Solid-State Cir-
cuits, vol. 34, pp. 1360–1366, Oct. 1999.
[4] D. Foley and M. Flynn, “CMOS DLL-based 2-V 3.2-ps jitter 1-GHz
clock synthesizer and temperature compensated tunable oscillator,” in
Proc. IEEE Custom Integrated Circuits Conf., May 2000, pp. 371–374.
[5] G. Chien and P. R. Gray, “A 900-MHz local oscillator using a DLL-based
hibits better jitter performance than that reported for the higher frequency multiplier technique for PCS applications,” in ISSCC Dig.
voltage DLLs (3.3-V supply, 0.35- m CMOS, 4-ps rms jitter) in Tech. Papers, Feb. 2000, pp. 202–203.
[2] and (5-V supply, 0.7- m CMOS, 10-ps rms jitter) in [3]. The [6] H. Chen, E. Lee, and R. Geiger, “A 2-GHz VCO with process and tem-
perature compensation,” in Proc. ISCAS, June 1999, pp. 11 569–11 572.
measured jitter (rms) variation versus synthesizer output fre- [7] A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator with
quency for a 3.3-V supply is shown in Fig. 15. With the supply 5 to 110 MHz of lock range for microprocessors,” IEEE J. Solid-State
reduced to 1.8 V, the rms jitter was measured at 4.9 ps for an Circuits, vol. SC-27, pp. 1599–1607, Nov. 1992.
[8] M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos, “High-speed electrical
output frequency of 720 MHz. Fig. 16 shows this 720-MHz syn- signaling: Overview and limitations,” IEEE Micro., vol. 18, pp. 12–24,
thesizer output. Mismatched propagation delays and interblock Jan./Feb. 1998.
routing in the frequency multiplication block (Fig. 9) resulted [9] H. C. Yang, L. K. Lee, and R. S. Co, “A low-jitter 0.3-165 MHz CMOS
PLL synthesizer for 3-V/5-V operation,” IEEE J. Solid-State Circuits,
in 100-ps interperiod jitter. vol. 32, pp. 582–586, Apr. 1997.
Fig. 17 shows the temperature-compensated tunable oscil- [10] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based
lator frequency variation with temperature. Varying the ambient on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp.
1723–1732, Nov. 1996.
temperature from 0 C to 85 C resulted in a total frequency [11] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked
variation of 1.8%. Fig. 18 shows the variation of frequency with loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997.
FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR 423

David J. Foley (S’00) received the B.Eng. degree Michael P. Flynn (S’92–M’95–SM’98) was born in
from the National University of Ireland, Limerick, in Cork, Ireland. He received the B.E. and M.Eng.Sc.
June 1988. In 1994 he received the M.Eng.Sc. degree degrees from the National University of Ireland,
from the National University of Ireland, Cork, where Cork, in 1988 and 1990, respectively. He received
he is currently working toward the Ph.D. degree. the Ph.D. degree in electrical engineering from
He has worked in IC design with NEC Corpora- Carnegie Mellon University, Pittsburg, PA, in 1995.
tion, Tamagawa, Japan, from 1988 to 1990, AT&T From 1998 to 1991, he was with the National
Bell Labs, Tokyo, Japan, from 1990 to 1992, and Microelectronics Research Center, Cork. He was
Parthus Technologies, Dublin, Ireland, from 1994 to a Co-op Engineer with National Semiconductor in
1998. Santa Clara, CA, from 1993 to 1995. From 1995
to 1997, he was a Member of Technical Staff with
Texas Instruments DSPS R&D Lab, Dallas, TX. He is now a Technical
Director with Parthus Technologies, Cork. He is also a part-time Lecturer in the
Department of Microelectronics at the National University of Ireland, Cork.
Dr. Flynn received the 1992–1993 IEEE Solid-State Circuit Predoctoral Fel-
lowship. He is a member of Sigma Xi.
ISSCC 2002 / SESSION 4 / BACKPLANE INTERCONNECTED ICs / 4.1

4.1 A 1.5V 86mW/ch 8-Channel 622–3125Mb/s/ch the nMOS differential pair converts Va to differential currents Ip
and In, which are mirrored into the pMOS current sources to be
CMOS SerDes Macrocell with Selectable
steered by the high-speed differential clock (I-IB). A self-biased
Mux/Demux Ratio nMOS load is used with MP1 and MP2 to control the output com-
mon-mode voltage.
Fuji Yang, Jay O’Neill, Patrik Larsson, Dave Inglis, Joe Othmer
Agere Systems, Holmdel, NJ The phase interpolator exhibits an infinite phase shift range
allowing the DLL to easily track the frequency offset between the
local clock and the incoming data and enables shared-PLL archi-
2.5-3.125Gb/s serial links are commonly used for chip-to-chip tecture for multi-channel serial links with plesiochronous clock-
interconnects in high-speed network systems. In SONET OC-768 ing.
application, at least 16 on-chip SerDes transceivers are required
to guarantee total full duplex I/O throughput of 40Gb/s. Figure 4.1.4 illustrates the non-monotonic relation between the
Published 2.5Gb/s SerDes transceivers consume between 150 phase shift introduced by the interpolator and the two weights α
and 200mW, not suitable for applications requiring hundreds of and β. To have a 2π interpolation range, the bang-bang phase
on-chip SerDes transceivers [1]. Developing a low-power SerDes detector polarity must be updated to provide the correct up/down
transceiver is important for high throughput network ICs [2]. signals for different quadrants. This is by a PD-polarity-control
Another challenge is reduction of inter-channel noise coupling circuit in association with a Q-detect circuit. The Q-detect circuit
when integrating many transceivers on the same chip. This low- detects the output vector quadrant by determining the sign of α
power 8-channel SerDes macrocell employs a shared-PLL archi- and β. The Q-detect circuit uses the replica of the V-I converter
tecture. As shown in Figure 4.1.1, on the transmitter side, the in the phase mixer.
on-chip TxPLL provides a half-rate clock to all transmitters. On
the receiver side, the RxPLL distributes I- and Q-phase clocks to Although the phase mixer has control weights α and β, the phase
8 receivers. Each receiver has a phase interpolator to generate interpolation is only a function of α/β, and is independent of the
an output phase-aligned with the in-coming data for clock and amplitude of α and β. The loop, sensitive only to the phase vari-
data recovery. Sharing a single PLL between a group of trans- ation, thus controls α/β. As a consequence, α and β can grow or
mitters or receivers reduces the power and avoids the potential shrink arbitrarily. To prevent α and β from being too small, an
multi-VCO coupling problem found in a conventional one-PLL- offset current is intentionally introduced in the charge-pumps. It
per-channel configuration. The macrocell realized in a 0.16µm is controlled as follows: If α>0, Iup = I0 + Ioffset and if α<0, Idown = I0
CMOS process consumes an average power of 86mW per channel + Ioffset (the same algorithm is applied for Q-charge-pump). As the
at 1.5V power supply. result, α and β are always pulled away from zero to eliminate
any shrinking possibility. To prevent overflow on α and β, the
The transmitter 16:1 or 20:1 serialization starts with 4 shift-reg- amplitude control circuit clips α and β by blocking UP or DOWN
ister based selectable 4:1 or 5:1 multiplexers. Their 4 outputs are signal. As shown in Figure 4.1.4, Va or Vb will be kept within
sent to a tree-based 4:1 multiplexer (Figure 4.1.1). A pMOS CML [Vmin, Vmax].
output driver with on-chip 50Ω terminations is employed. The
output signal referenced to the ground makes the interface inde- The test chip in a 0.16µm 5-level metal CMOS technology uses a
pendent of the power supply. The output amplitude is set to 217-pin PBGA package. The chip micrograph is shown in Figure
1Vpp, diff. 4.1.5. Active area is about 2mm2. Figure 4.1.6a shows the mea-
sured jitter tolerance of the receiver. The CDR works with VDD
The receiver employs an interleaved integrate-and-dump front- as low as 1V for 1Gb/s maximal input data rate. With 1.5V power
end (Figure 4.1.1) [3, 4]. The integrate-and-dump operation supply, the receiver covers an input data rate range of 622 to
improves the SNR and eliminates the quadrature clock required 3125Mb/s. Measured recovered clock jitter is 87.1ps pp at
in a conventional half-rate front-end [5]. The integrator outputs 2.5Gb/s. Figure 4.1.6b shows the Tx output eye diagram mea-
are de-multiplexed by the decision-latches controlled respective- sured at 3.2Gb/s with a 231-1 PRWS. The measured jitter is
ly by ck2i and ck2q, which are divide-by-2 clocks of the recovered 57.8ps pp and static VDD sensitivity is 0.06ps/mV. Measured
clock. The decision-latch outputs d1-d4 are fed into 4 shift-regis- results are summarized in Figure 4.1.7.
ters to realize the 4:16 or 4:20 de-serialization. The integrator is
References:
implemented in a way similar way to that proposed in Reference
[1] R. Gu et al “A 0.5-3.5Gb/s Low-Power Low-Jitter Serial Data CMOS
[4], but with a pMOS input stage. It has a gain of 2 allowing Transceiver,” ISSCC Digest of Technical Papers, pp. 352-353, Feb. 1999.
relaxed offset and noise requirements of the latches. The receiv- [2] M-J. Lee et al., “An 84mW 4Gb/s clock and data recovery circuit for ser-
er achieves 30mVpp,diff sensitivity with BER <10-12 at 2.5Gb/s. ial link applications” VLSI Symposium 2001, pp. 149-152, 2001.
[3] S. Sidiropoulos et al “A 700Mb/s/pin CMOS signaling interface using a
The clock recovery is by a DLL based on an analog phase inter- current integrating receivers” IEEE JSSC, vol. 32, no. 5, pp. 681-690, May
polator [6]. In contrast to the implementation in Reference [6], a 1997.
[4] J. Savoj et al., “A CMOS Interface Circuit for Detection of 1.2Gb/s RZ
four-quadrant phase mixer is used here. Referring to Figure
Data” ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999.
4.1.2, the DLL consists of a bang-bang phase detector (PD), a PD [5] P. Larsson, “An Offset-Cancelled CMOS Clock Recovery/Demux with
polarity control circuit, an amplitude control circuit, I- and Half-Rate Linear Phase Detector for 2.5Gb/s Optical Communication”
Q-charge-pumps and the four-quadrant mixer-based phase ISSCC Digest of Technical Papers, pp. 74-75, Feb. 2001.
interpolator. The analog phase interpolation is by mixing the I- [6] T. Lee et al., “A 2.5V CMOS delay-locked loop for an 18Mb, 500Mb/s
and Q-phase clocks from the RxPLL with respective weights α DRAM” IEEE JSSC, vol. 9, no. 2, Dec. 1994
(=Va-Vref) and β (=Vb-Vref): CLK=α*(I-IB)+β*(Q-QB). Va and Vb are
independently generated by I- and Q-charge-pumps. The weights
α and β, ranging from negative to positive, directly control the
quadrant changes. This eliminates the potential phase disconti-
nuity at quadrant crossings found in the circuit of Reference [6].
Figure 4.1.3 shows the schematic of one 4-quadrant mixer, where

• 2002 IEEE International Solid-State Circuits Conference 0-7803-7335-9 ©2002 IEEE


ISSCC 2002 / February 4, 2002 / Salon 8 / 1:30 PM

9D 9E

7[ 3// 5[3// 9UHI


4GHWHFW
, 4
7; 5;
, ,%
FK  FK 
XSL 9D
&3L

3'SRODULW\FRQWURO

$PSOLWXGHFRQWURO
FK  FK 
,QSXW XS

3KDVHGHWHFWRU
FK  FK 
FK  GG FK 
GDWD GQL
' φ LQWHUSFNJHQ
FORFNJHQ 3'
9UHI
FN
FNT
' FNL
&/. GQ
XST


LQW
FNL
9E


 G


&3T


GULYHU ' G
 G
G
'
GQT


LQW FNT

' 9PD[
$GHWHFW 4% 4
9PLQ
9D 9E

Figure 4.1.1: Overall architecture. Figure 4.1.2: DLL block diagram.

β 4
2XWSXWYHFWRU 9D9E
,S ,Q ,%
, , ,, ,,, ,9
9PD[
9UHI Φ α ,
9D 9RS 9RQ 9UHI Φ(GHJU.)
   
03 03 9PLQ

FOLSLQJ

&RPPRQORDGFLUFXLW

Figure 4.1.3: Four-quadrant mixer schematic. Figure 4.1.4: Relation between the phase shift and the weights Va and Vb.
Technology: 0.16 CMOS with 5 metal levels

5[3// Supply Voltage 1.5V

Power dissipation 75mW per transceiver with Tx output set to 1Vpp,diff


85mW for Tx and Rx PLLs + clock buffers
5;
Active area 2mm2 (PLL : 0.1mm2, single transceiver : 0.25mm2)

BER < 10 –12 (all measurements were done with BER < 10 –12 )

Receiver Sensitivity 30mVpp,diff


7;
Recovered clock jitter 87.1psPkPk

Max. offset frequency 400ppm at 2.5Gb/s

Input data rate range 622-3125Mb/s

Transmitter’s output jitter 57.8ps Pk-Pk at 3.2Gb/s


7[ 3//
TxoutputVDD sensitivity 0.06ps/mV

Output Amplitude 1Vpp,diff

Figure 4.1.5: Die micrograph. Figure 4.1.7: Summary of measured results.

• 2002 IEEE International Solid-State Circuits Conference 0-7803-7335-9 ©2002 IEEE


SVGLY

D E
Figure 4.1.6: Measured Rx jitter tolerance and Tx eye diagram at 3.2Gb/s.

• 2002 IEEE International Solid-State Circuits Conference 0-7803-7335-9 ©2002 IEEE


IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998 1987

A Fully Integrated VCO at 2 GHz


Markus Zannoth, Bernd Kolb, Joseph Fenk, and Robert Weigel, Senior Member, IEEE

Abstract—A fully integrated voltage-controlled oscillator at a


frequency of 2 GHz with low phase noise has been implemented
in a standard bipolar process with a f t of 25 GHz. The design
is based on an LC-resonator with vertical-coupled inductors.
Only two metal layers have been used. The supply voltage of
the oscillator is 2.7 V. The phase noise is only 0136 dB/Hz at 4.7-
MHz frequency offset. A tuning range of 150 MHz is achieved
with integrated tuning diodes.
Index Terms— Bipolar, fully integrated VCO, noise require-
ment for cordless phones.

I. INTRODUCTION

W ITH the fast growth of the wireless application mar-


ket, there is a growing need for smaller designs and
higher levels of integration for the reduction of costs and Fig. 1. DECT specification.
size. Because of the very poor performance of integrated
resonators on silicon IC’s, local RF oscillators are difficult adjacent channel, three times the channel spacing of 1.728
to integrate with regard to the phase noise requirements. MHz has to be taken (see Fig. 1). The filter is centered in
At the moment, external resonators are used with external the channel, half of the filter-bandwidth of 1 MHz has to be
hyperabrupt tuning diodes. The integrated tuning diodes have subtracted to get the offset frequency
low performance because it is not possible to produce a
hyperabrupt pn-junction in our standard bipolar process. The MHz
MHz
limiting factor is the inductor of the resonator, which can only
archive quality factors of about four. While the performance MHz MHz
of mobile telecommunication standards like global system Normalized to a 100-kHz frequency offset, the requirement is
for mobile communication (GSM) or digital communication 98.6 dB/Hz, if the phase noise has a constant slope of 20
system (DSC-1800) requires such high noise performance, dB/decade, as assumed by [3]. These are the requirements for
which cannot be reached in our technology with the use the main transmitter-voltage-controlled oscillator (TX-VCO),
of full integration of the local oscillator, the requirements when the following blocks are not dominating in noise. This
for cordless phones like for the Digital European Cordless is indeed fulfilled.
Telecommunications (DECT) standard seem to be achievable. With fully integrated oscillators these requirements seem to
In a DECT system, the critical point concerning oscillator be possible to realize. This paper presents a fully integrated
phase noise is the emissions due to modulation. There the oscillator, which achieves the required specification. It uses
emitted power at the output in the third adjacent channel is an LC oscillator consisting of integrated tuning diodes and
specified to be smaller than 20 nW, which equals 47 dBm integrated vertically coupled inductors. In this design only two
[1]. With a maximum output power of 25 dBm, there is metal layers in a standard bipolar process are used.
a difference of 72 dB. The specified measurement filter to
get this power has a bandwidth of 1 MHz. So the noise
II. OSCILLATOR WITH COUPLED INDUCTORS
requirement at this point becomes 132 dB/Hz, which results
from: dBm The phase noise in oscillators depends on the quality factor
dBm dB Hz dB Hz . This is the difference of the resonator, the noise figure of the amplifier creating a
between the noise level and the output-power normalized to negative resistance, and the energy in the resonator [3], [4].
1 Hz. The frequency offset for this specification is the start For low phase noise, passive resonators are chosen. With active
frequency of the measurement filter. As this is in the third inductors, noise is added by the active devices that cannot
be compensated by increasing the quality factor. For best
Manuscript received April 10, 1998; revised July, 17, 1998. integration and reproducibility spiral inductors are used, as
M. Zannoth, B. Kolb, and J. Fenk are with Siemens AG, Muenchen D- described in [5]. The quality factor of the resonator is mainly
81541, Germany. limited by the inductors. Here it is limited to four, as only two
R. Weigel is with the University of Linz, Institute for Communication and
Information Engineering, Linz A-4040 Austria. metal layers are available. One metal layer has a thickness of
Publisher Item Identifier S 0018-9200(98)08854-4. 0.8 m. With limited chip area for the inductor and the use of
0018–9200/98$10.00  1998 IEEE
1988 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Fig. 2. Possibilities of feedback.

vertically coupled inductors, a quality factor of only four was Fig. 3. Equivalent circuit of the tuning diode.
achievable without changing technology parameters.
The simplest way to reduce phase noise is increasing the
resonator energy by applying higher voltages to the resonator. case of direct coupling. In the presented design the resonator
In this design an emitter-coupled pair with cross feedback voltage reaches a value of 3 .
is used as a negative resistance, which is responsible for The limiting elements for the maximum voltage of the
undamping the resonator. The limit of the maximum oscillation resonator are now the two serial-connected tuning diodes. To
amplitude depends on the feedback. There are three ways of decrease their voltage without reducing the resonator-energy,
feeding the output-signal back to the input (see Fig. 2). The a capacitor is added in series at the cost of tuning range.
easiest way is direct coupling, where no biasing network is This capacitor is also responsible for getting a linear tuning
needed and very low power consumption can be achieved. characteristic. To provide the DC-path for the tuning diodes,
Using direct coupling the voltage across the resonator is resistors are connected in parallel to the coupling capacitances.
limited by the base-collector diode of the transistors. When These resistors are negligible in sight of reducing the quality
forward biased, this diode inserts additional damping and factor because they have a large value of 1 k , which is much
current noise to the resonator causing increased phase noise. larger than the capacitances impedance of 40 (see Fig. 6).
With capacitive coupling [6] this can be avoided. Here no The quality factor of the capacitance is about 24, which is
resistive element is inserted into the feedback. With capacitive at the same range as the varactor. These quality factors are
feedback a phase noise of 100 dB/Hz at 100 kHz could negligible high relative to that of the inductor.
be achieved by [6], having quality factors of eight. The The inductors are produced as symmetrical quadratic spirals.
disadvantage is the need of a high-impedance biasing network At our standard bipolar process only two metal layers could
at the transistors base. This biasing network can be realized be used to create vertically coupled inductors. The crosses
by noisy resistors or by large inductors that cost a lot of chip- are made in the gap between two metal lines (see Fig. 7). The
space. If resistors are used, uncorrelated noise is introduced cost of this technique is the wide gap between the lines, which
to both halfwaves of the signal, when the oscillator acts in causes an increment of the size and parasitic effects like series
its linear region. This noise is nearly negligible, when a low resistance and substrate capacitance. The quality factor is as
resonator is used. In our case the impedance at the input low as four. This is caused by the technology, where the metal
of the feedback amplifier is about 500 . This impedance layers have a poor conductivity and high capacitances to the
consists of the feedback capacitor and the tank impedance at medium-doped substrate. In this design an inductor of 2.7 nH
resonance. The Bias resistor in parallel is about 4 k , and so was used. Its series resistance is 4.2 . The coupling factor
the effect of adding noise is not very dominant. With inductive was estimated to 0.85. The values of the equivalent circuit (see
coupling the bias current can be fed through the inductor. This Fig. 8) where first calculated by algorithms from [7] and then
allows connecting a low-impedance biasing network which can fitted to measurement. The coupling capacitor was estimated
be made of a voltage source. The advantage of connecting a from the plate capacitance of the two metal layers.
voltage source directly to the circuit is the absence of resistive For tuning, the base-emitter diode of a transistor is used (see
elements that cause white noise, which would be converted to Fig. 3). This has the disadvantage of a high series resistance
phase noise by the nonlinear elements. Every DC path can be (base resistance) of 2.6 and a relatively low capacitance
blocked carefully against emissions from the supplies without variation by a factor of 1.75 applying a voltage difference of
any resistive element. The maximum voltage at the resonator 2.7 V (see Fig. 4). However, this represents the only way to
can be adjusted by the biasing voltage so that the base-collector create a tuning diode without changing the standard bipolar
diode is not the limiting element. Now the amplitude of the process, where no hyperabrupt pn-junctions are available. The
swing is limited by the base emitter diode of the transistors of this varactor was simulated to be about 25 (see Fig. 5)
and the limitation of the current source. The energy in the when it is calculated from 1/(jwRC). The base-collector diode
resonator can be increased and so the phase noise is reduced. could not be used for tuning, because it does not have such a
Now the maximum voltage is not one diode-voltage, as in the large capacitance variation.
ZANNOTH et al.: FULLY INTEGRATED VCO 1989

Fig. 7. Simplified layout of the inductor.

Fig. 4. Capacitance of the tuning diode.

Fig. 8. Equivalent circuit of the inductor.

Fig. 5. Quality factor of the tuning diode.

Fig. 9. Output buffer.

resonator, a high impedance is required for minimizing the


effects of the load. The signal is fed through small coupling
capacitances (400 fF) to emitter-followers that provide this
high input impedance. This first stage drives a differential am-
plifier with open collector outputs. A balun can be connected
that transforms the differential signal to a single ended one
that can be connected to 50 . The current-consumption of
the amplifier is about 6 mA. Its output power is about 8
dBm at 50 , which is enough for noise measurements.
Fig. 6. VCO with coupled inductors.
III. RESULTS
To get the signal into a 50- measurement system a buffer The measured phase noise of the oscillator can be calculated
is added (see Fig. 9). As the signal is taken directly from the from expressions by Leeson [3] and has a slope of 20 dB/Dec
1990 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

TABLE I
SUMMARY OF THEVCO CHARACTERISTICS

Supply voltage 2.7 V


Current of oscillator core 12 mA
Current of amplifier 6 mA
Output power 08 dBm
Center frequency 1.96 GHz
Tuning constant (KVCO) 055 MHz/V
Tuning range (0 . . . 2.7 V) 150 MHz
Phase noise @ 100 kHz 0102 dBc/Hz
Phase noise @ 4.7 MHz 0136 dBc/Hz
Fig. 10. Measured tuning characteristic. Suppression of harmonics 23 dB
Size of one inductor 215 m 2 215 m
Chip size without Pads 600 m 2 600 m

Fig. 11. Measured and simulated phase noise.

measured at offset-frequencies between 100 kHz and 50 MHz,


which represent the measurement limits. If the phase noise is
calculated from

[5]

we get a value of 135 dB/Hz at an offset frequency of


4.7 MHz. The effective series resistance is about 16 .
It is taken from the series resistances of the inductor, the Fig. 12. Chip photograph.
tuning diodes, and the coupling capacitances. The amplitude
of the oscillation was simulated to be 1.5 (3 )
at a resonance frequency of 2 GHz. The noise factor of the dB/Hz at 4.7 MHz offset frequency, and a VCO constant
amplifier was supposed to be about two. The expressions from of 55 MHz/V (see Figs. 10 and 11). The phase noise is
[3] give a approximation for the expected noise. For better measured in the middle of the tuning range. With a thicker
calculation, nonlinear effects have to be considered. In [12], metal layer (1.2- m aluminum instead of 0.8 m) a phase
such calculations as the correlation between waveform and noise of even 137 dB/Hz can be achieved. This improvement
phase noise are shown. is achieved by reducing the series resistance of the inductor
The simulation with spectre RF shows nearly the same and increasing its quality factor. The tuning characteristic is
values as the measured ones. In Fig. 11 the simulated and nearly linear (see Fig. 10) and the noise performance varies
measured noise are shown up to an offset frequency of 6 MHz. only less than 1 dB at the whole tuning range from 0
In the simulation the equivalent circuits are taken from above. V–2.7 V. The linearity of this characteristic is improved by
The simulator uses nonlinear methods [13], [14] to calculate the series capacitance (Fig. 6) added to the varactor diode.
noise. It gives nearly the same results as the measurement and The current through the oscillator core is about 12 mA;
the small signal approximation from above. the supply voltage is 2.7 V. At tuning voltages above the
The measurement shows that the free running oscillator has supply voltage the varactor diodes get forward biased and
a resonant frequency of 1.96 GHz, a phase noise of 136 so the frequency stays nearly constant and the noise rises.
ZANNOTH et al.: FULLY INTEGRATED VCO 1991

This occurs because of the reduction of the quality factor Markus Zannoth was born in Munich, Germany, in
and the introduction of additional current noise due to the 1971. He received the Dipl. Ing. degree in electrical
engineering in 1996 from the Technical University
forward-biased diodes. of Munich, Munich, Germany. Since 1996, he has
been working towards the Dr.Ing. degree at Siemens
AG and the Technical University of Munich.
His doctoral research is on integrated oscillators.
IV. CONCLUSION
A fully integrated bipolar VCO is realized (see Fig. 12) that
achieves a measured phase noise of 136 dB/Hz at 4.7 MHz.
The oscillator has a linear tuning characteristic with a tuning
range of 150 MHz at a center frequency of 1.96 GHz. Further
characteristics are given in Table I.
Bernd Kolb was born in 1972. He studied electrical engineering with
In this design two metal layers are used to build vertically an emphasis on telecommunication techniques at the Georg-Simon-Ohm-
coupled integrated inductors. These have quality factors of Polytechnic Nuremberg. There, he received the Dipl.Ing. (FH) degree in 1995.
about four. Integrated varactor diodes are implemented by He joined the Siemens High Frequency IC Department in 1995. Since
then, he has worked in the field of oscillators, frequency dividers, and vector
using base-emitter diodes of transistors. With this design the modulators. He has focused on designing highly integrated transmitter IC’s
noise requirements of the DECT-specification of 132 dB/Hz for mobile communication. He is now with Lucent Network Systems GmbH
at 4.7 MHz frequency offset are achieved with a margin of Nuremberg, Germany, where he designs high-frequency parts of base station
for mobile communication.
4 dB. The output power is 8 dBm at 50 , with a center
frequency of 1.95 GHz. For the use of this oscillator in a DECT
product, the varactor-capacitance will be increased until the
required center frequency of 1.88 GHz is reached. The design Joseph Fenk received the diploma in electronics
from the Technical University of Munich, Munich,
has been realized in standard high-volume bipolar process with Germany, in 1968.
an of 25 GHz. He is responsible for product definition and
project management of communications RF-
integrated circuits at Siemens Components, Inc.,
REFERENCES Integrated Circuit Division. After joining Siemens
in 1968, he worked as a Development Engineer
[1] ETSI, Digital European Cordless Telecommunications (DECT) Common on high-frequency components in the Discrete
Interface, Part 2: Physical Layer, Oct. 1992. Components Group, developing transmitters, aerial
[2] L. L. Larson, RF and Microwave Circuit Design for Wireless Communi- and tuner transistors, FET’s, and Varactor and PIN
cations. Boston: Artech House, 1996. diodes. In 1976, he joined the Integrated Circuits Group as a Design Engineer
[3] B. D. Leeson, “A simple model of feedback oscillator noise spectrum,” for consumer products. He has been engaged in the development of integrated
Proc. Lett. IEEE, pp. 329–330, Feb. 1966. circuits for infrared preamplifiers, prescalers, IF-amplifiers/demodulators for
[4] G. Sauvage, “Phase noise in oscillators: A mathematical analysis of FM-radio and satellite-TV, mixer/oscillators FM radio, TV-and SAT-TV, and
Leeson’s model,” IEEE Trans. Instrum. Meas., vol. IM-26, pp. 408–410, TV UHF/VHF modulator IC’s, as well as circuits for narrowband FM mobile
Dec. 1977. radio. He holds more than 50 patents relating to IC and system design and
[5] J. Craninckx and M. S. J. Steyaert, “A 1.8-GHz low-phase-noise CMOS has presented technical papers at numerous industry conferences and forums.
VCO using optimized hollow spiral inductors,” IEEE J. Solid-State
Circuits, vol. 32, pp. 736–744, May 1997.
[6] G. Palmisano, M. Paparo, F. Torrisi, and P. Vita, “Noise in fully
integrated PLL’s,” in Proc. 6th Workshop Advances in Analog Circuit
Design AACD’97, Como, Italy, pp. 1–19. Robert Weigel (S’88–M’89–SM’95) was born in Ebermannstadt, Germany,
[7] J. Crols, P. Kinget, J. Craninckx, and M. Steyaert, “An analytical model in 1956. In 1989, he received the Dr.Ing. degree, and in 1992 the Dr.Ing.habil
of planar inductors on lowly doped silicon substrates for high frequency degree, both in electrical engineering from the Technical University of
analog design up to 3 GHz,” in IEEE Symp. VLSI Circuit Dig. Tech. Munich, Munich, Germany.
Papers, 1996, pp. 28–29. From 1982 to 1988, he was a Research Assistant, from 1988 to 1994,
[8] J. N. Burghartz, M. Soyuer, and K. A. Jenkins, “Microwave inductors he was a Senior Research Engineer, and from 1988 to 1996, he was a
and capacitors in standard multilevel interconnect silicon technology,” Professor at the Technical University of Munich. In the winter of 1994–1995,
IEEE Trans. Microwave Theory Tech., vol. 44, pp. 100–104, Jan. 1996. he was a Guest Professor at the Technical University of Vienna, Vienna,
[9] L. Dauphinee, M. Copeland, and P. Schvan, “A balanced 1.5 GHz Austria. Since 1996, he has been Head of the Institute for Communication
voltage controlled oscillator with an integrated LC resonator,” in Proc. and Information Engineering at the University of Linz, Austria. He has been
ISSCC’97, Session 23, Analog Techniques, pp. 390–391. engaged in research and development on microwave theory and techniques,
[10] I. B. Jansen, K. Negus, and D. Lee, “Silicon bipolar VCO family for integrated optics, high-temperature superconductivity, surface acoustic wave
1.1 to 2.2 GHz with fully-integrated tank and tuning circuits,” in Proc. (SAW) technology, and digital and microwave communication systems. In
ISSCC’97, Session 23, Analog Techniques, p. 392. these fields, he has published more than 120 papers and has given more than
[11] B. Razavi, “A 1.8 GHz CMOS voltage—Controlled oscillator,” in Proc. 90 international presentations. His work includes European research projects
ISSCC’97, Session 23, Analog Techniques, pp. 388–389. and international journals.
[12] K. A. Hajimiriand and T. H. Lee, “A general theory of phase noise in Dr. Weigel is a senior member of the IEEE Microwave Theory and Tech-
electrical oscillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–194, niques and the Ultrasonics, Ferroelectrics, and Frequency Control Societies.
Feb. 1998. He is also a member of the Institute for Systems and Components of the
[13] CADENCE, Oscillator Noise Analysis in SpectreRF, application note to Electromagnetics Academy, the Informationstechnishe Gesellschaft (ITG) in
SpectreRF, 1998. the Verband Deutscher Elekrotechniker (VDE), and the Society of Photo-
[14] F. X Kärtner, “Untersuchung des Rauschverhaltens von Oszillatoren,” Opticals Instrumentation Engineers (SPIE). In 1993 he was a co-recipient of
Ph. D. dissertation, Tech. Univ. Munich, Munich, Germany, 1988. the MIOP-award.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998 295

A Simple Precharged CMOS Phase Frequency Detector


Henrik O. Johansson

Abstract— We propose a simple precharged CMOS phase


frequency detector (PFD). The circuit uses 18 transistors and has
a simple topology. Therefore, the detector, in a 0.8-m CMOS
process, works up to clock frequencies of 800 MHz according to
SPICE simulations on extracted layout. Further, the detector has
no dead-zone in the phase characteristic which is important in
low jitter applications. The phase and frequency characteristics
are presented and comparisons are made to other PFD’s. The
phase offset of the detector is sensitive to differences of the duty-
cycle between the inputs. Mixed-mode simulations are presented
of the lock-in procedure for a phase-locked loop (PLL) where
the detector is used. Measurements on the detector are presented
for a test-chip with a delay-locked loop (DLL) where the phase
detection ability of the detector has been verified. Fig. 1. Conventional phase frequency detector (conPFD) from [2].

Index Terms— CMOS integrated circuits, delay lock loops,


phase detectors, phase lock loops.

I. INTRODUCTION

A part of a phase-locked loop (PLL) is the phase detector


(PD) [1]. The PD detects the phase difference between
the reference frequency and the controlled slave frequency.
Some PD’s also detect frequency errors, they are then called
phase frequency detectors (PFD’s). A PFD is usually built
with a state machine with memory elements such as flip-flops
[2], [3], Figs. 1 and 2, respectively. We propose a new simple
PFD, ncPFD, which uses two nc-stages [4] and six inverters,
Fig. 3(a).
A drawback with some phase detectors is a dead zone in
Fig. 2. Precharge type phase frequency detector (ptPFD) from [3].
the phase characteristic at the equilibrium point. The dead zone
generates phase jitter since the control system does not change
the control voltage when the phase error is within the dead
zone.
In Section II the ncPFD circuit is described. The phase
and frequency characteristics are discussed in Sections III and
IV, respectively, and comparisons are made to other PFD’s.
Behavioral mixed-mode simulations are made to check the
lock-in properties of the ncPFD detector and these simulations
are shown in Section V. Experiments on the phase detection
abilities of the ncPFD are presented in Section VI.

II. CIRCUIT
The transistor schematic of the ncPFD is shown in Fig. 3(a).
The detector has a 0-rad phase offset. The main part of the
(a) (b)
circuit is the nc stage [4]. Delays (two inverters) are inserted
at the reference and slave inputs in order to remove the dead Fig. 3. (a) The ncPFD in zero degree phase offset version. (b) Modified
version with  rad phase offset.
zone in the phase characteristics around rad phase error. In
Fig. 4, waveforms for the circuit in Fig. 3(a) are shown when
The detector can easily be modified to one with -rad phase
the slave input lags the reference input.
offset, as shown in Fig. 3(b), where one, or in general an odd
Manuscript received March 11, 1997; revised August 21, 1997. number, of inverter(s) are used for the delays.
The author is with Electronic Devices, Department of Physics and Mea-
surement Technology, Linköping University, S-58183 Linköping, Sweden. If the phase detector is used only as a phase detector, i.e., not
Publisher Item Identifier S 0018-9200(98)00732-X. as a frequency detector, the circuit in Fig. 3(a) can be used as
0018–9200/98$10.00  1998 IEEE
296 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 4. Waveforms for the case when slave lags after the reference signal.
The pulse width of the up signal is larger than for the down signal.

Fig. 6. Magnified phase characteristics at zero phase error of the ncPFD


(solid line), conPFD (dashed line), and the ptPFD (dash-dot line) from SPICE
level-2 simulations of extracted layout, VDD = 3:0 V and f = 50 MHz.

Fig. 5. Phase characteristics of the ncPFD (solid line), conPFD (dashed line),
and the ptPFD (dash-dot line) from SPICE level-2 simulations of extracted
layout, VDD = 3:0 V and f = 50 MHz.

a -rad phase detector by switching the up and down signals.


The equilibrium point will then be on the negative slope of the
phase characteristics at rad instead of at the positive slope
at zero, Fig. 5. Similarly, the -rad phase detector, Fig. 3(b),
can be modified to a 0-rad phase detector.
Fig. 7. Phase characteristics for three cases with different duty cycles. The
III. PHASE CHARACTERISTIC reference input duty cycle is 50% for all cases and the slave input has the
duty-cycles 45%, 50%, and 55% for the dashed, solid, and dashed–dotted
The phase characteristic of the proposed ncPFD is shown lines, respectively.
in Fig. 5 together with the characteristics of the conventional
PFD (conPFD) of Fig. 1 [2] and the precharge type PFD
(ptPFD) shown in Fig. 2 [3]. Unlike the conPFD and ptPFD, characteristics. The phase characteristics are checked for three
there is no dead-zone in the characteristics of the ncPFD. A different duty cycles, 45, 50, and 55%.
magnification of the characteristics at zero phase is shown When both the reference and slave have the same duty cycle,
in Fig. 6. The dead zone of the conPFD can be reduced by the phase offset is not affected. There is a dead zone at -rad
inserting delay at the output of the four-input-NAND-gate. But when the duty cycle is less than 50%. A duty cycle of 45%
if delays are inserted in the feedback signals from the up gives a dead zone width of 0.50 rad, 1.6 ns, at rad. This
and down outputs of the ptPFD, the dead zone unfortunately dead zone may result in a metastable state of the control loop.
increases. When the duty cycle is different for the two inputs, the
In an ncPFD, when the PLL is locked, both up and down phase offset will be nonzero, Fig. 7. A duty cycle difference
signals are active. Therefore the phase offset of the PLL of 5% at 50 MHz, i.e., 1 ns, gives a phase offset of
depends on the matching between the up and down currents rad, i.e., 630 ps.
of the charge pump. The phase characteristic of the ncPFD is not affected by
All data in this section are based on simulations of extracted variations of the rise and fall times when they are in the range
layout with SPICE (level-2) when V and of 300 ps up to 600 ps.
MHz unless otherwise stated. The layout was made in a
0.8- m standard CMOS process and the N and P-transistors B. Maximum Operation Frequency
are 2.0 and 4.0 m wide, respectively. The outputs were A maximum operation frequency definition can be found in
connected to 4.0 fF capacitors, and the inputs were driven [3]. The definition is that the maximum operation frequency
with inverters with a tapering factor of one. is one over the shortest period with correct up and down
signals when the inputs have the same frequency and 90
A. Duty-Cycle and Transition-Time Dependence phase difference. This definition is easily applicable on flip-
The output of the ncPFD depends on the pulse-width of flop-based PFD’s where this frequency is easily identified.
the input signals. Hence, the duty cycle will affect the phase Unfortunately, the degradation of the performance of the
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998 297

Fig. 10. Waveforms for the case when the slave has a higher frequency than
the reference signal. The down signal has higher duty cycle than the up signal.

Fig. 8. The width of the dead zones of the ncPFD (solid), ptPFD (dashed),
and conventional PFD (dash-dot) as function of frequency. The frequency
resolution is 100 MHz and the supply voltage is 5.0 V. The plot is based on
SPICE simulations of extracted layout.

Fig. 11. Frequency sensitivity for the ncPFD (solid), ptPFD (dash-dot),
and conPFD (dashed). The plot is based on behavioral simulations with
20 different initial phases for each frequency and the mean-value for each
frequency is plotted. The reference frequency is 50 MHz.

The average frequency sensitivities of the ncPFD, ptPFD,


and conPFD are shown in Fig. 11. The frequency sensitivity
Fig. 9. Maximum frequency as function of supply voltage for the ncPFD
is represented by the rate of change in the control voltage
(solid line), the ptPFD (dash-dot line), and the conPFD (dashed line). The of the loop filter of a PLL when the slave input is driven
frequency resolution is 25 MHz. The plot is based on simulations of extracted by a pulse generator with a fixed frequency instead of the
layout. The layouts are made in a standard 0.8-m CMOS process.
voltage-controlled oscillator (VCO) output. Each frequency
is simulated 20 times with different initial phases, i.e., skew
ncPFD is gradual for increasing frequency and this makes it between the inputs.
hard to find a specific frequency where the circuit starts to The ptPFD has the largest sensitivity, followed by the
malfunction. conPFD, and the ncPFD has the lowest. The sensitivity goes to
Therefore, we define the maximum operation frequency zero as the slave frequency approaches the reference frequency
to be the frequency where the size of the dead zone starts for both the ncPFD and ptPFD. But for the conPFD, the
to deviate significantly from the low-frequency value. This sensitivity is relatively high even for frequencies close to the
definition gives similar results for the flip-flop-based phase reference.
detectors as for the definition in [3], and it is applicable on the In Fig. 12 the sensitivity for the ncPFD is shown with the
ncPFD. An example of how the dead-zone-width varies with mean, minimum, and maximum values from the 20 simulations
the frequency is shown in Fig. 8. for each frequency. Note that the behavior of the minimum and
The maximum speeds for different supply voltages are maximum values are almost random.
plotted in Fig. 9 for the three PFD’s of Figs. 1, 2, and 3(a). For the ncPFD, the minimum absolute value of the sensitiv-
As seen, the maximum speed of the ncPFD and the ptPFD are ity is close to zero for certain frequencies, Fig. 12. Actually,
similar and the conPFD is approximately three times slower. the sensitivity is zero for some frequency ratios and phase
combinations. This is the case also for the ptPFD but not for
the conPFD. The condition for this seems to be that when the
IV. FREQUENCY CHARACTERISTICS frequency ratio of the reference and slave inputs is a rational
A frequency dependent phase detector always has some number and the ratio is in the interval 1/2 to 2, including the
kind of memory. For the ncPFD, the memory consists of the limits, the sensitivity is zero for certain initial phases. We have
two dynamic nodes at the output of the nc-stages. In Fig. 10, no general proof of the previous statement but, for example,
the frequency of the slave input is approximately three times the sensitivity of the ncPFD for as
higher than the reference input frequency, as a result, the down function of initial phase is shown in Fig. 13. The sensitivity is
signal has a higher duty cycle than the up signal. Thus the zero for the phases 0.0, 2.5, and 5.0 ns. This lack of sensitivity
slave frequency should decrease. may lead to false locking for a PLL in operation. However,
298 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 12. Frequency sensitivity for the ncPFD for a number of frequencies. Fig. 14. Lock-in process of a third-order PLL with the ncPFD as phase
The plot is based on behavioral simulations with 20 different initial phases
+
for each frequency. The solid line is the mean value and the “ ” symbols are
frequency detector. The loop filter and PLL data are shown in the upper right
corner.
the minimum and maximum values. The reference frequency is 50 MHz.

language M [6]. The loop filter used ideal R and C models


in circuit mode with analog voltages. The loop filter and PLL
data are shown as an inset in Fig. 14. A lock-in simulation is
shown in Fig. 14. The simulation is done with the presence of
300 ps peak-to-peak phase noise.
Because of the sawtooth-shaped frequency sensitivity of the
ncPFD (for a fixed frequency offset and varied initial phase),
Fig. 13, and the presence of noise, the lock-in time is not
deterministic but random. The lock-in times for 60 simulations
have been analyzed. Most simulations show a lock-in time of
7 s and the largest time is 16 s. There is no upper limit on
the lock-in time. One simulation took approximately 3 cpu-min
on a SPARC 10 workstation.

Fig. 13. Frequency sensitivity for the ncPFD when the slave frequency is VI. EXPERIMENTS
4/5 of the reference frequency. For the initial phases of 0.0, 2.5, and 5.0 ns
the sensitivity is zero. The phase detection properties of the ncPFD have been
verified experimentally with a test chip. The test chip is a line
receiver for serial data that utilizes several parallel samplers
this false locking will not be stable, since a small phase change
to receive bit rates of 2.0 Gb/s [7]. The phase detector was
results in a nonzero sensitivity and drives the loop back to lock.
used in a delay-locked loop (DLL) which generates control
One way to add small phase changes to the simulation is to
signals for the sampling switches used in the line receiver.
include phase noise which is always present in an oscillator.
The ncPFD, Fig. 3(a), was used as a -rad phase detector and
When we add phase noise of approximately 300 ps peak-to-
the delay line was half a wavelength long.
peak to the simulations, the normalized minimum sensitivity
The skew between the reference and slave signals is not
which was zero will increase to approximately 0.01. The
possible to measure directly. This quantity has been measured
improvement is not significant but the sensitivity will be
indirectly through measurement error compensation circuits to
nonzero and positive for all phases. Hence, false locking is
be about 125 ps at MHz. Unfortunately, there is no
avoided. To further enhance the phase noise during the lock in
control of how large the measurement error is.
process, one could use dithering techniques, i.e., add the signal
The circuit blocks used to measure the offset are shown in
from a noise/signal source to the control voltage of the VCO.
Fig. 15. The two clocks that we want to compare come from
the beginning and the end of the delay line. They are fed into
V. BEHAVIORAL MIXED-MODE SIMULATIONS two matched inverter chains where the propagation delay for
In order to understand the sensitivity to frequency errors rising and falling edges are matched against process variations
and lock-in properties of the proposed detector, a complete [8]. The delay from the multiplexer inputs to the oscilloscope
third-order charge pump PLL system was simulated using a screen for the two signal paths are not matched. Two mea-
multilevel mixed-mode simulator, Lsim [5]. The PFD was surements are done to compensate this. One where the delay
represented by a schematic simulated in switch mode. The line input signal goes uninverted through Output buffer 1 and
VCO, phase-noise generator, and charge pump are represented one where the same signal goes inverted through the Output
by behavioral models written in the hardware description buffer 2. The measured skew including the measurement error
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998 299

Fig. 15. DLL, phase offset measurement circuitry, and NMOS transistor to
access the control voltage.
Fig. 16. Oscilloscope screen dump of the drain voltage of an NMOS
transistor with external pull-up resistor where the gate is connected to the
for the measurements will be as follows: control voltage. Four different lock-in procedures are shown. The initial
control voltages are 0.0, 1.0, 2.0, and 3.0 V for the curves from top to
skew inv mux Buf bottom, respectively.

inv mux Buf (1)


skew inv mux Buf VII. CONCLUSIONS
inv mux Buf (2) A new PFD without a dead zone has been proposed.
The circuit topology is simple and has no feedback loops.
where is the real skew and inv and inv are the Simulation results indicate that the circuit can operate up
delays through the four inverters’ long chains for falling and to 800 MHz in 0.8- m CMOS with a 5-V supply. The
rising edges through the left and right chain, respectively. detector’s phase offset depends on the duty cycle of the inputs.
Similarly, inv and inv are for the five inverters’ long Measurements have been performed on the detector when it
chains. And mux and mux are the delays through the was used in a DLL as a phase detector and the functionality
multiplexers. The Buf and Buf are the delays through was verified.
the output-buffers and through the oscilloscope input-channels.
The sum of the skews (1) and (2) is REFERENCES
skew skew inv inv [1] R. E. Best, Phase-Locked Loops, 2nd ed. New York, NY: McGraw-
Hill, 1993.
inv inv (3) [2] N. H. E. Weste and K. Eshragrian, Principles of CMOS VLSI Design,
2nd ed. Reading, MA: Addison Wesley, 1993.
Note that the expression is independent of the mux and Buf [3] H. Kondoh, H. Notani, T. Yoshimura, H. Shibata, and Y. Matsuda,
delays. Hence, theoretically, if the rise and fall delays of the “A 1.5-V 250-MHz to 3.0-V 622-MHz operation CMOS phase-locked
loop with precharge type phase-detector,” IEICE Trans. Electron., vol.
inverter chains are matched properly, there will not be any E78-C, no. 4, pp. 381–388, Apr. 1995.
measurement error. [4] P. Larsson and C. Svensson, “Skew safety and logic flexibility in a true
In Fig. 16 an oscilloscope screen dump with four lock-in single phase clocked system,” in Proc. IEEE Int. Symp. Circuits Syst.,
1995, pp. II:941–944.
procedures is shown. The signal is the drain voltage of an [5] Mentor Graphics, Explorer Lsim User’s Manual. Mentor Graphics
NMOS transistor with an external pull-up resistor and with the Corp., 1992.
[6] Mentor Graphics, M Language User’s Guide. Mentor Graphics Corp.,
gate connected to the control voltage as shown in Fig. 15. The 1991.
lock-in time is less than 200 s. Ideally, the control voltage [7] H. O. Johansson, J. Yuan, and C. Svensson, “A 4 Gsamples/s Line-
should go monotonically to the equilibrium voltage. Therefore, Receiver in 0.8 m CMOS,” in Proc. Symp. VLSI Circuits, 1996, pp.
116–117.
the beating in the lock-in procedure when the initial control [8] M. Shoji, “Elimination of process-dependent clock skew in CMOS
voltage is 3.0 V is unexpected. The reason for this is unknown. VLSI,” IEEE J. Solid-State Circuits, vol. SC-21, pp. 875–880, Oct. 1986.
1654 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Rotary Traveling-Wave Oscillator Arrays:


A New Clock Technology
John Wood, Terence C. Edwards, Member, IEEE, and Steve Lipa, Student Member, IEEE

Abstract—Rotary traveling-wave oscillators (RTWOs) repre- Researchers have therefore looked to alternative oscillator
sent a new transmission-line approach to gigahertz-rate clock mechanisms for better phase stability and lower power con-
generation. Using the inherently stable LC characteristics of sumption. Previous transmission-line systems such as salphasic
on-chip VLSI interconnect, the clock distribution network be-
comes a low-impedance distributed oscillator. The RTWO operates distribution [6], distributed amplifiers [7], and adiabatic LC res-
by creating a rotating traveling wave within a closed-loop differen- onant clocks [8] provide only a sinusoidal or semisinusoidal
tial transmission line. Distributed CMOS inverters serve as both clock, making fast edge rates difficult to achieve.
transmission-line amplifiers and latches to power the oscillation This paper introduces the rotary traveling-wave oscillator
and ensure rotational lock. Load capacitance is absorbed into the (RTWO); a differential LC transmission-line oscillator which
transmission-line constants whereby energy is recirculated giving
an adiabatic quality. Unusually for an LC oscillator, multiphase produces gigahertz-rate multiphase (360 ) square waves with
(360 ) square waves are produced directly. RTWO structures low jitter. Extension of the RTWO to rotary oscillator arrays
are compact and can be wired together to form rotary oscillator (ROAs) offers a scalable architecture with the potential for
arrays (ROAs) to distribute a phase-locked clock over a large chip. low-power low-skew clock generation over an arbitrary chip
The principle is scalable to very high clock frequencies. Issues area without resorting to clock domains. Simulations predict
related to interconnect and field coupling dominate the design
process for RTWOs. Taking precautions to avoid unwanted signal rise and fall times of 20 ps on a 0.25- m process and a
couplings, the rise and fall times of 20 ps, suggested by simulation, maximum frequency limited only by the of the integrated
may be realized at low power consumption. Experimental results circuit technology used.
of the 0.25- m CMOS test chip with 950-MHz and 3.4-GHz rings Experiments show that although the RTWO operates differ-
are presented, indicating 5.5-ps jitter and 34-dB power supply entially, careful attention is required to guard against magnetic
rejection ratio (PSRR). Design errors in the test chip precluded
meaningful rise and fall time measurements. field couplings between the clock conductors and other struc-
tures if the potential performance of these oscillators is to be
Index Terms—Clocks, MOSFET oscillators, phase-locked oscil-
lators, phased arrays, synchronization, timing circuits, transmis- realized.
sion line resonators, traveling-wave amplifiers.
II. CONCEPT OF THE ROTARY CLOCK OSCILLATOR
I. INTRODUCTION A. Fundamentals and Structures
The basic ROA architecture is shown in Fig. 1. A represen-
C LOCKING at gigahertz rates requires generators with low
skew and low jitter to avoid synchronous timing failures.
The notion of a “clocking surface” becomes untenable at giga-
tative multigigahertz rotary clock layout has 25 interconnected
RTWO rings placed onto a 7 7 array grid. Each ring consists
hertz rates [1], frequently mandating that large VLSI chips are of a differential line driven by shunt-connected antiparallel in-
subdivided into multiple clock domains and/or utilize skew-tol- verters distributed around the ring. This arrangement produces
erant multiphase circuit design techniques [2]. a single clock edge in each ring which sweeps around the ring
Techniques such as distributed phase-locked loops (PLLs) at a frequency dependent on the electrical length of the ring.
[3] and delay-locked loops (DLLs) [4] can control systematic Pulses are synchronized between rings by hard wiring which
skew to within 20 ps, but are complex, introduce random skew forces phase lock.
(i.e., jitter), and have area penalties. H-tree distribution systems, Fig. 2 illustrates the theory behind the individual RTWO.
while simple, are difficult to balance and can use upwards of Fig. 2(a) depicts an open loop of differential transmission line
30% of a chip’s total power budget [5]. All these systems are (exhibiting LC characteristics) connected to a battery through
inherently single-phase, induce large amounts of simultaneous an ideal switch. When the switch is closed, a voltage wave be-
switching noise, and can be highly susceptible to this noise. gins to travel counterclockwise around the loop. Fig. 2(b) shows
a similar loop, with the voltage source replaced by a cross-con-
Manuscript received March 20, 2001; revised June 28, 2001. This work was nection of the inner and outer conductors to cause a signal in-
supported by Multigig Ltd., and also supported in part by the National Science version. If there were no losses, a wave could travel on this ring
Foundation under Award EIA-31332. indefinitely, providing a full clock cycle every other rotation of
J. Wood is with MultiGig, Ltd., Northampton NN8 1RF, U.K. (e-mail:
john.wood@multigig.com). the ring (the Möbius effect).
T. C. Edwards is with Engalco, Huntington, YO32 9NY, U.K. (e-mail: en- In real applications, multiple antiparallel inverter pairs are
quiries@engalco.com). added to the line to overcome losses and give rotation lock.
S. Lipa is with the Microelectronics Systems Laboratory, North Carolina State
University, Raleigh, NC 27695 USA. Rings are simple closed loops and oscillation occurs sponta-
Publisher Item Identifier S 0018-9200(01)08220-8. neously upon any noise event. Unbiased, startup can occur in
0018–9200/01$10.00 © 2001 IEEE
WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS 1655

Fig. 3. Waveforms of line voltage and line current for the 3.4-GHz clock
simulation example.

B. Waveforms
Fig. 3 shows simulated waveforms of a 3.4-GHz RTWO taken
at an arbitrary position on the ring. The design has the following
characteristics for reference:
• Conductors: Width m
Fig. 1. Basic rotary clock architecture. The = signs denote points with same • Pitch m
phase.
• Ring Length m
• Metallization: 1.75 m copper
• Loop inductance total nH
• Process: 0.25- m CMOS
• Nch total width: 2000 m
• Pch total width: 5000 m
• Number of inverters: 24 pairs.
Very large distributed transistor widths give substantial ca-
pacitive loading to the lines, thus lowering velocity to give a
reasonably low clock rate from a compact oscillator structure.
In application, up to 75% of this capacitance can come from
load capacitance, reducing the size of the drive transistors ac-
cordingly.
The upper traces of Fig. 3 show the simulated voltage wave-
forms on the differential line at points labeled A0, B0. The lower
traces show the current in the conductors to be 200 mA, while
the supply current is simulated at 84 mA with 4.5 mA of
Fig. 2. Idealized theory underlying the RTWO. (a) Open loop of differential ripple. This clearly illustrates that energy is recycled by the basic
conductors to a battery via a switch. (b) Similar loop but with the voltage source
replaced by the inner and outer conductors cross-connected.
operation of the RTWO. Just driving the 34 pF of capacitance
present would require 275 mA at this frequency (from ).

either rotational sense—usually in the direction of lowest loss.


C. Phase Locking
Deterministic rotation biasing mechanisms are possible, e.g., di-
rectional coupler technology or gate displacement [9]. Once a Interconnected rings, as in Fig. 1(a), will run in lockstep, en-
wave becomes established, it takes little power to sustain it, be- suring that the relative phase at all points of an ROA are known.
cause unlike a ring oscillator, the energy that goes into charging It is possible to use a large array of interconnected rings to dis-
and discharging MOS gate capacitance becomes transmission tribute a clock signal over a large die area with low clock skew.
line energy, which is recirculated in the closed electromagnetic For example, referring to Fig. 1(a), all the points marked with
path. This offers potential power savings as losses are not related the equals sign have the same relative phase as that ar-
to but rather to dissipation in the conductors where bitrarily marked as 0 . At any point along the loop, the two
can be reduced, e.g., by adoption of copper metallization. signal conductors have waveforms 180 out of phase (two-phase
1656 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Fig. 4. Voltage, current, and phase relationships versus rotation direction


(Poynting’s vector). Fig. 5. Three-dimensional view of the structure. The two differential lines
are shown, with current flow arrows (main and charge/boost) and encircling
H-fields. CMOS transistors are also shown complete with supply voltages (V
and V ) and both p- and n-channels.
nonoverlapping clock). A full 360 is measured along the com-
plete closed path of the loop. In principle, an arbitrary number of
clock phases can be extracted. Phase advances or retards depend
on the direction of rotation, and Fig. 4 shows the current–voltage
relationships for clockwise and counterclockwise rotation.

D. Network Rules
Although the square-ring shape is convenient to show dia-
grammatically, it is only one example of a more general net-
work solution which requires ROAs to conform closely to the
following rules. Fig. 6. Expanded view of short sections of the transmission line, including
1) Signal inversion must occur on all (or most) closed paths. three sets of back-to-back inverters as a wavefront passes.
2) Impedance should match at all junctions.
3) Signals should arrive simultaneously at junctions. F. Coherent Amplification, Rotation Locking
From 1) above, any odd number of crossovers are allowed on
Fig. 6 is an expanded view of a short section of transmission
the differential path and regular crossovers forming a braided
line with three sets of back-to-back inverters shown. It is as-
or “twisted pair” effect can dramatically reduce the unwanted
sumed that startup is complete and the rotating wave is sweeping
coupling to wires running alongside the differential line.
left to right. For this analysis, we view the inverter pairs as dis-
The differential lines would typically be fabricated on the top
crete latch elements.
metal layer of a CMOS chip where the reverse-scaling trend of
Each latch switches in turn as the incident signal, traveling on
VLSI interconnect offers increasingly high performance [10].
the low impedance transmission line, overrides the ON resistance
of the latch and its previous state. This “clash” of states occurs
E. Fields and Currents only at the rotating wavefront and therefore only one region is in
Fig. 5 illustrates a three-dimensional section of the ring struc- this cross-conduction condition at any one time. The transmis-
ture connected to a pair of CMOS inverters expanded to show sion-line impedance is of the order of 10 and the differential
the four individual transistors. The main current flow in the dif- on-resistance of the inverters is in the 100- –1-k range, de-
ferential conductors is shown by solid arrows, the magnetic field pending on how finely they are distributed throughout the struc-
surrounding these conductors by dashed loops, and the capac- ture.
itance charge/signal-boost current flowing through the transis- Once switched, each latch contributes for the remainder of the
tors by dashed lines. half cycle, adding to the forward-going signal. Coherent buildup
An important feature of differential lines is the existence of a of switching events occurs in this forward direction only. An
well-defined “go” and “return” path which gives predictable in- equal amount of energy is launched in the reverse direction, but
ductance characteristics in contrast to the uncertain return-cur- the latches in that direction cannot be switched further into the
rent path for single-ended clock distribution [11]. state to which they have already switched. The reverse-traveling
Capacitance arises mainly from the transistor gate and deple- components simply reduce the amount of drive required from
tion capacitance and interconnect capacitance does not domi- those latches.
nate. Importantly, it is the nonlinear latching action which is re-
indicates intrinsic gate resistance, i.e., the ohmic path sponsible for the self-locking of direction (a highly linear am-
through which the gate charge flows. The term implies a plifier has no such directionality).
parasitic gate term, but in reality, most of this resistance is in To clarify the above statements, Fig. 7 demonstrates how a
the series circuit of the channel under the gate electrode. This is large CMOS latch responds to an imposed differential signal.
shared by the D-S channel, as illustrated by the triangular region The curve trace shows a central differential-amplification re-
(shown with transistors operating in the pinchoff region). gion bounded by two absorptive ohmic regions (shaded) corre-
WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS 1657

Fig. 7. DC transfer characteristic of two back-to-back inverters to an imposed


differential signal.
(a)
sponding to the two latched states. Except at the wavefront lo-
cation where amplification takes place, the ring structures will
be terminated ohmically to the supplies.
The four-transistor “full-bridge” circuit minimizes supply
current ripple to the cross-conduction period.

G. Frequency and Impedance Relations


In simulation models (and indeed as fabricated), the RTWO (b)
transmission line is built up from multiple RLC segments, and
Fig. 8. Development of the rotary clock model. (a) Complete RF circuit.
therefore, these primary line constants must be identified. (b) Capacitance circuit.
Fig. 8(a) is the basic RF macromodel of a short length
(SegLen) of RTWO line with all significant RF components where
and parasitics annotated (as per Fig. 5). Suffixes identify conductor separation;
per-unit-length perlen, lumped lump and total (or loop) values. conductor width;
There are segments connected together, plus a crossover, conductor thickness.
to produce a closed ring of length RingLen. The phase velocity is given by
Fig. 8(b) is a capacitive equivalent circuit for the transistor
and load capacitances. AC0 indicates an ac ground point ( where
and ). SegLen
The differential lumped capacitance of one such seg- (3)
ment is given approximately by For heavily loaded RTWO structures, can be as low as 0.03
of (where is the free space velocity, i.e., m/s).
The clock frequency is given approximately by

(4)
RingLen
(1) (The 2 factor arises from the pulse requiring two complete
laps for a single cycle.)
where Differential characteristic impedance is given by
interconnect capacitance for the line AB;
gate overlap and Miller-effect feedback capaci-
tance; (5)
total channel capacitance;
drain depletion capacitance to bulk (substrate); Transmission line characteristics dominate over RC charac-
load capacitance added to a line. teristics when [14]
(Note that the is used to convert the in-parallel “to ground”
values into in-series differential values of capacitance.) (6)
is usually a small part of total capacitance and accu-
rate formulas are available [12] if needed. H. Bandwidth and Power Consumption
To calculate the per-unit-length differential inductance, i.e.,
accounting for mutual coupling, we use [13], expressed below. Seen from an RF perspective, Fig. 8(a) shows the RTWO to
be two push–pull distributed amplifiers folded on top of each
other. Distributed amplifiers exhibit very wide bandwidth be-
(2)
cause parasitic capacitances are “neutralized” by becoming part
1658 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

TABLE I
CHANGES OF CHARACTERISTICS WITH N

(a) (b) (c)


Fig. 9. A four-port junction of two RTWO rings carrying anticlockwise signals, with a noncoincident signal arrival time.

of the transmission-line impedance [15]. Performance is limited Most of the remaining losses in Table I are attributed to cross-
by the carrier transit time of the MOSFETs [16], not by the tra- conduction and parasitic losses. is a real loss mechanism
ditional digital inverter propagation time , which is not ap- for gigahertz signals, and RTWO rise/fall times can be doubled
plicable where gates and drains are driven cooperatively by an by this phenomenon. In newer CMOS processes, improves
imposed low-impedance signal, and where the load capacitance with shorter channel length.
is hidden in the transmission line.
Operation of the RTWO is largely adiabatic when the voltage
drop required to charge the capacitances is developed mainly III. MORE DETAILED CONSIDERATIONS
across the inductance:
A. Skew Control
(7)
Interconnected RTWO loops offer the potential to control
and when the intrinsic gate resistance is low relative to the re- skew in spite of relatively large open-loop time-of-flight
actance of the gate capacitance. mismatches. Functionally, phase averaging occurs by pulse
combination at the junction of multiple transmission lines.
(8) For a four-port junction, the normal operating mode will see
RTWO rise and fall times are controllable by setting the cutoff two pulses arriving at the junction simultaneously. These
frequency of the transmission lines. two sources will feed two output ports and signal flow will
be unimpeded by reflections if impedance is matched. This
(9) amounts to a situation similar to that described in [17], [18],
although for ROAs, the mechanism is LC transmission-line
energy combination, not ohmic combination of CMOS inverter
Edges become faster and cross-conduction losses are reduced
outputs.
when the structure is more distributed.
Where there exists a time-of-flight mismatch, one pulse ar-
Table I lists characteristic changes with , where
rives at the junction before the other. Fig. 9(a) depicts the oper-
with , and held
ation of a four-port junction between of two interwired but ve-
constant.
locity-mismatched RTWO loops. Each of these rings has been
The most significant power loss mechanism for the RTWO is
divided into segments numbered (each as Fig. 8). Four
power dissipated in the interconnect, given by
rings are wired together (similar to Fig. 16, shown later). Only
the junction of the rings and are considered here;
(10)
the latter having a higher open-loop operating frequency.
WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS 1659

Fig. 11. Segment of chip layout showing 90 routing beneath clock lines and
a tap to clock (CLK: CLK) loads.

The phase-locking phenomenon occurs at every junction of


the array (not just the junction considered here) and twice per
Fig. 10. Waveforms corresponding with Fig. 9. oscillation cycle which accounts for the smaller than expected
initial skew seen between the rings.
Simulations of typical arrays show that lockup is achieved
From simulation, two pulse-combination effects appear to be within a few nanoseconds from powerup after signals settle into
present, the simplest of which is the impedance match effect the lowest-energy state of coherent mesh.
where the first signal to arrive at a junction must try to drive
three transmission lines. If all ports have equal impedance, the
B. Coupling Issues Related to Layout
junction can only reach a quarter of the full signal value and a
reflection occurs driving an inverted signal back down the inci- The induced magnetic fields from the rotary clock structures
dent port [Fig. 9(b)]. Initially, detrimental effects on signal fi- can be strong. This is because is relatively high (square
delity arising from this reflection are overcome when the other waves). The magnetic coupling coefficient, however, depends
pulse arrives, whereupon the pulses combine and branch into on the angle between source and victim and falls to zero when
the output ports, as shown in Fig. 9(c). the angle becomes 90 .
The second pulse combination effect is believed to be due to Fig. 11 illustrates a 90 layout technique to minimize induc-
nonlinear MOSFET drain capacitance, which can modulate the tive coupling problems. The top metal M5 (running left to right)
velocity of the line. Reflections can drive the MOSFETS from is used to create the differential RTWO, while orthogonal M4
the ohmic state into the low-capacitance pinchoff region, locally is used as a routing resource for busses into and out of areas
increasing velocity. bounded by the clock transmission line.
Quantitative Results From Simulation: Fig. 10 presents the For capacitive coupling, fast rise and fall times imply high
results of a SPICE simulation of the above situation with an displacement currents and a potentially aggressive noise source.
extreme condition of velocity mismatch. A 50% variation of Differential transmission lines tend to mitigate such effects [19],
oxide thickness is modeled across a small 2.4 2.4 mm chip and in Fig. 11, the total capacitive coupling area between each of
having four interconnected rings. Thick oxide (lower ) devices the transmission-line conductors and any M4 conductor is bal-
are on the right side of the chip, giving a 22.5% phase velocity anced. If the clock source were ideally differential, no net charge
increase relative to the left side. would be coupled to the M4 wires. For the RTWO, distributed
Looking at these results with reference to Fig. 9 reveals inverters force the waveforms to be substantially differential and
that the first pulse arrives from ring and passes point nonoverlapping, keeping glitches below the sensitivity of a typ-
A at time ps and begins its rise time. Within ical gate.
this rise time, the leading edge reaches the nearby junction, For the five-metal test chip (Section V), a 45% utilization
where negative reflections bounce back to momentarily prevent of M4 was used for the 90 routing pattern immediately un-
A passing through the 1.5-V level. derneath the RTWO rings. This coverage allows the M4 to act
The second pulse arrives from the slower left-hand ring as both a routing resource and as an electrostatic shield similar
, reaching point B at approximately ps. It to [20], preventing electrostatic coupling to signal lines further
then combines with the first pulse at the junction to branch into below. Magnetic fields are not attenuated much by this config-
the two output ports without further reflections. uration, because the spaces between the thin perpendicular M4
By ps, the signals have reached points A and lines break up the circulating currents which could repel a mag-
B and are essentially coincident—forward progress of the netic field. Substrate magnetic fields [21] are, therefore, to be
waves in rings and are now synchronized. expected.
1660 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Coupling to co-parallel (0 ) victim conductors is potentially


much more problematic (discussed later in Section IV-C).

C. Tapoff Issues and Stub Loadings


It is possible to “tap into” the ROA structure (Fig. 11) any-
where along its length and extract a locally two-phase signal
with known phase relationship to the rest of the network. This
signal can then be routed via a fast differential transmission line
to other circuits and will generally represent a capacitive stub
on the RTWO ring.
For minimum signal distortion, the round-trip time-of-flight
(forward and backward along the stub) must be much less
than the rise time and fall time of the clock waveform:
Fig. 12. Signal at either end of a 2-pF total tap loading line.

(11)
D. Frequency/Impedance Adjustment
When the above condition is met, the capacitance can be taken
Rewriting (4) in the form below shows that frequency is set
as being effectively lumped on the main RTWO ring at the tap
only by the total inductance and capacitance of the RTWO loop.
point for the purposes of predicting oscillator frequency and ring
impedance.
Although not immediately apparent, this condition is achiev- (12)
able in practice due to three factors. The first factor is that the
tap line velocity is relatively fast for SiO dielectric. It is ap- Total loop inductance is proportional to RingLen and
proximately , while the main RTWO oscillator ring might varies strongly as a function of the width and pitch of the top
be operating at perhaps . The second factor is that the metal differential conductors. This allows a coarse frequency
tap length only has to be long enough to reach within a single selection through the top-metal mask definition. Unit-to-unit in-
RTWO ring. The third factor is that it requires two signal rota- ductance variation is expected to be small because of the good
tions on the RTWO to complete a clock cycle. These three fac- lithographic reproduction of the relatively large clock conduc-
tors work together to make the RTWO rings physically small tors and the weak sensitivity of inductance to metal thickness
compared to the expected speed-of-light dimensions. The dis- variations.
tances to be spanned by the fast tap wires are therefore short Total capacitance for the RTWO is the sum of all
enough that transmission-line effects on these lines are unim- lumped capacitances connected to the loop (1). tends to
portant—certainly at the clock fundamental frequency and even be dominated by gate-oxide capacitance from the drive
at higher harmonics. FETs and the clock load FETs. is inversely proportional
This can be illustrated by reference to a specific 3.4-GHz to gate-oxide thickness , which on a modern CMOS SiO
RTWO, 3200 m long with 20-ps rise/fall times. Within one of is controlled to approximately 5% variation over extended
these rise or fall periods, a stub transmission line with velocity wafer lots [24]. Drain depletion capacitances exist on bulk
is able to communicate a signal over a distance of 3 mm. CMOS where the active transistors connect to the ring.
For a stub length of 400 m (to reach the center of the ring), this During the VLSI layout phase, a CAD tool (expected re-
equates to 3.75 round-trip times along the stub. lease: Q1 2002) can target a fixed operating frequency. The
Fig. 12 shows simulated waveforms with 2 pF of total tool will be able to correct impedance discontinuities caused by
to-ground capacitance at the end of one such stub. Reflected lumped load capacitance by the addition of dummy “padding”
energy gives rise to the ringing which is evident with this level capacitance elsewhere around the loop, and postcompensate an
of capacitance. The line resistance of the stubs must be low to overly capacitive-loaded clock network by reducing the differ-
maintain reflective energy conservation. ential inductances through pitch reduction—hence restoring ve-
The ratiometric factors outlined above between ring length, locity and thus frequency. Alternatively, at the expense of using
frequency, rise/fall time, and stub lengths are expected to hold as more metallization, a new layout with more numerous, shorter
ROAs are scaled to higher frequencies and smaller ring lengths length rings could be used. The tool will need to simultaneously
without requiring special stub tuning measures. solve impedance matching issues [refer to Section II-A, (5)]. By
Capacitive Loading Limits: Substantial total-chip capacitive manipulation of both and simultaneously, it is possible to
loading can be tolerated by the RTWO relative to conventionally control and independently, as shown diagrammatically in
resonant systems [8], [22], [23]. However, the loading effects of Fig. 13. For example, velocity can be reduced by increasing
interconnect, active, and stub capacitances cannot be increased both and by the same factor to cancel the effect on .
without limit. The consequential lowering of line impedance in- These adjustments can support arbitrary branch-and-combine
creases circulating currents until losses become a concern. networks (at least in theory).
Eventually, the impedance becomes so low relative to the loop Post fabrication, adding together the sources of variation and
resistance that the relation (6) cannot be maintained, whereupon given that frequency is related to and , a 5% ini-
oscillation ceases altogether. tial tolerance of operating frequency between parts is expected.
WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS 1661

IV. SIMULATED PERFORMANCE


A. Approach
To enable rapid “what-if” evaluation of potential RTWO
structures, a simulation/visualization program known as Rotary
Explorer [32] has been developed. Rotary Explorer is GUI
Fig. 13. Differential line with varing trace separations and capacitive inverter driven and parametrically creates a SPICE deck of macro-
loadings indicating the effects of altering several parameters.
models linking to FASTHENRY subcircuits [33] for multipole
magnetic analysis of skin, proximity, and LR coupling effects
Matching within a die should be better, but temperature gradi- in the time domain. MOSFETs are modeled using BSIM3v3
ents and transistor size variations as they affect capacitance will nonquasi-static model with an external resistor added to model
lead to phase velocity changes requiring correction by the Skew (Fig. 8). The BSIM4 model [34], which properly accounts
Control mechanism (described in Section III-A). for as a D-S channel component, was not available.
Temperature can alter frequency through variation of With the Rotary Explorer program, it is possible to simulate
and . Inductance variation is assumed to be negligible RTWO rings independently or as interlocked arrays. The
compared to capacitance variation and is not considered. Gate- effects of tap loads, oxide thickness variations, and magnetically
oxide thickness variation could potentially affect , but for induced “victim” noise can be evaluated.
SiO dielectric, with properties similar to quartz, this can be ig- As a visualization aid, Rotary Explorer gives a “live” display
nored. More significant are temperature variations of drain de- of color-coded SPICE voltages projected onto a scaled image
pletion capacitance and of transistor . of the ROA structure being simulated. This aids in the intuitive
To tune an ROA clock to an exact reference frequency, al- understanding of reflections and how the structure achieves a
lowing limited “speed-binning” and reduced internal phase mis- steady-state phase-locked operation.
matches, closed-loop control of distributed switched capacitors
[9] or varactors [25] is envisaged. B. Results
Two very important performance metrics for any oscillator
E. Active Compensation for Interconnect Losses are its sensitivity to changes in temperature and supply voltage.
Resistive interconnect losses make it difficult to commu- Simulations of these effects on a nominally 3.34-GHz rotary
nicate high-frequency clock signals over a large chip without clock resulted in the data given in Tables II and III.
waveshape distortion and attenuation, which impacts on the Supply Induced Jitter: Following on from the above and in
practicality of reflective energy conservation schemes [6], [22], light of the RTWO’s time-of-flight oscillation mechanism, it is
[23]. The skin effect loss mechanism has been evident in clock inferred that such voltage sensitivity will also apply to phase
tree conductors for some time [26] and is frequency dependent. modulation versus voltage, i.e., jitter—at least at low supply-
High-speed H-trees tend to use hierarchical buffers within the noise frequencies. For a single RTWO ring, the power-supply
trees to maintain amplitude and edge rates. induced jitter will be related to and the power-supply
Active compensation of VLSI differential transmission rejection ratio (PSRR) by
lines to overcome clock attenuation was shown by Bußmann
and Langmann [27] to be applicable to sine-wave signals. (13)
Shunt-connected negative impedance convertors (NICs) were
where , because of the distributed nature of the oscillator,
used with linear compensation to prevent oscillations.
is the mean supply voltage deviation as experienced along the
The distributed inverters used within RTWOs afford active
path of an edge as it travels two complete rotations. To improve
compensation for transmission-line losses, raising the apparent
PSRR, plans are in place to add voltage-dependent capacitance
of the resonant rings and helping to maintain a uniformly high
to the structure to give first-order compensation.
clock amplitude around the structure.
From simulations, we see that jitter reduces for multiple ring
structures due to averaging effects.
F. Logic Styles
Two-phase latched logic [28] is the style most compatible C. Coupling II—Simulated Coupling
with RTWO. It is highly skew tolerant and through dataflow- The Rotary Explorer program makes it easy to simulate cou-
aware placement [27] offers the potential to exploit the full 360 pled noise between an RTWO ring and user defined victim trace
of clock phase to reduce clock-related surging [29], which in (drawn with the aid of a mouse). Simulated results are shown in
future systems could exceed 500 A [30]. Conventional single- Table IV for a 3.4-GHz RTWO configured to have 20 ps rise
phase D-latch designs can be driven where timing improve- and fall times, and with geometry as shown in Fig. 14.
ments through skew scheduling [31] might be possible. A lo- Peak coupling magnitude occurs at 60- m victim length. A
cally four-phase system to support domino logic [2] could be trace longer than this will see a coupling cancellation effect that
implemented by wrapping two loops of RTWO line around the approaches zero for each pitch of the braiding it traverses.
region to clock. Unfortunately, all of these techniques are be- Fig. 15 illustrates a notably strong coupled signal waveform
yond the capability of current logic synthesis tools. at victim distance m, with no loading on the victim
1662 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

TABLE II
VARIATIONS WITH TEMPERATURE

TABLE III
VARIATIONS WITH DC SUPPLY VOLTAGE V

TABLE IV
INDUCED NOISE AS A FUNCTION OF VICTIM DISTANCE AND LENGTH

Fig. 14. Crossover traces, a visualization output from the Rotary Explorer tool.

trace and one end connected to ground. Note the more sensitive
Fig. 15. Example of notably strong coupled signal waveform.
noise scale.
The absolute maximum coupling occurs if victim distance is
allowed to go to zero. In this case, mutual coupling between ag- by coupling to any highly conductive structure in which eddy
gressor and victim is 100% with no cancellation effects from the currents can flow to decrease and distort the inducing field. Cou-
other differential trace. As a numerical example, it follows that plings to less conductive circuits such as the substrate give a loss
a 2.5-V signal with a rise time of 20 ps on a transmission line mechanism which can be modeled as a shunt term in the trans-
with a velocity of has the 2.5-V gradient over 430 m of mission-line equations. LC resonance in the small-scale coupled
length (Fig. 4 illustrates the concept). Over the 60- m length structures is unlikely because of the high resonant frequencies.
discussed above, this equates to 348 mV. Slower edge rates, All of the coupling mechanisms mentioned are edge-rate depen-
faster transmission lines, and lower supply voltages reduce this dent, and this can limit the achievable rise and fall times of the
figure proportionally. RTWO by attenuating the high-frequency signal components.
Long-range inductive noise coupling from the differential Full RLC layout extraction is essential in the neighborhood of
transmission line is expected to be small, since (from a distance) the clock lines if routing is allowed in these areas. An alternative
the ‘go’ and ‘return’ currents are equal and opposite. proposal under investigation is to predefine a VLSI structure
Potential problems exist in short-range magnetic coupling to combining clock and power distribution into the same grid to
wiring in the vicinity of the clock lines. Inductance is lowered give consistent characteristics and shielding.
WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS 1663

Fig. 18. Clock frequency versus V for the large ring and I versus V
for the entire chip with all five rings.

Fig. 16. Die photograph of a prototype chip.

Fig. 19. Measured output on one of the 3.42-GHz rings.

overlooked. Transistors are now laid out according to RF design


rules with the gate driven from both sides of the device.
Fig. 18 shows that the oscillation frequency versus
is quite flat over a large . We calculate from the measured
slope that PSRR is approximately 34 dB for oscillators fabri-
Fig. 17. Measurement versus simulation waveforms for the large 965-MHz
ring. cated on this process. The oscillator was seen to be functional
down to 0.8-V supply voltage, although 1.1 V was required to
initiate startup.
V. SOME EXPERIMENTAL RESULTS
The test chip incorporates 15 pF of on-chip decoupling
Fig. 16 shows a die photograph of a prototype built using a capacitance per ring. No off-chip decoupling was required.
0.25- m 2.5-V CMOS process with 1- m Al/Cu top metal M5. Effectively, the equivalent of ten single-ended lines each having
The conductors are relatively wide in order to minimize resistive 10 impedance were active, but simultaneous switching
losses of the rather thin M5. The available top-metal area con- surges are low because of the distributed switching times of the
sumed by the transmission lines was 15%. A general feature of inverters.
the RTWO and ROA is that power can be reduced by increasing The quad of inner rings each have the following characteris-
the metal area devoted to clock generation. The simple substi- tics:
tution of copper metallization could halve the width of the lines
• Conductors: Width m
for the same power consumption.
• Pitch m
The prototype features a large ring independent of four inter-
• Ring Length m.
connected smaller rings. The 12 000- m outer ring uses 60- m
conductors on a 120- m pitch, with 128 62.5- m/25- m in- Total channel widths are 2000 m for the Nch FET and 5000 m
verter pairs distributed along its length. for the Pch FET spread over 40 pairs of inverters.
For the large ring, simulations predicted a clock frequency Fig. 19 shows the measured waveform from one of the
of approximately 925 MHz. Measurements of the actual perfor- 3.4-GHz rings. The oscillation frequency is 3.38 GHz versus a
mance versus simulated with V are shown in Fig. 17. simulated frequency of 3.42 GHz. However, the waveshape is
The oscillation frequency was 965 MHz. Jitter was measured disappointingly distorted, the amplitude is low, and even-mode
at 5.5 ps rms using a Tektronix 11 801A oscilloscope with an artifacts are visible.
SD-26 sampling head. Investigation of the fault identified a ‘co-parallel’ (0 )
The slower than simulated rise-time discrepancy is believed inductive coupling problem between the clock signal lines and
to be due to the large extrinsic gate electrode resistance on the and supply traces running directly beneath on M3 for
Pch FETs. At design time, the importance of this parameter was the complete loop length. Only when a complete FASTHENRY
1664 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

analysis was performed including these power traces was it [4] S. Tam, S. Rusu, U. N. Desai, R. Kim, J. Zhang, and I. Young, “Clock
apparent that induced current loops (circulating through the generation and distribution for the first IA-64 microprocessor,” IEEE J.
Solid-State Circuits, vol. 35, pp. 1545–1552, Nov. 2000.
decoupling capacitors) were strongly attenuating the rotary [5] C. J. Anderson et al., “Physical design of a forth-generation power
signal. In this condition, the latching action (Fig. 7) does not GHz microprocessor,” in ISSCC 2001 Dig. Tech. Papers, Feb. 2001,
pp. 232–233.
fully develop and the rings support linear amplification of noise [6] V. L. Chi, “Salphasic distribution of clock signals for synchronous sys-
signals—hence the problematic multimode action. (This effect tems,” IEEE Trans. Comput., vol. 43, pp. 597–602, May 1994.
was much less severe on the large 965-MHz ring because the [7] B. Kleveland et al., “Monolithic CMOS distributed amplifier and oscil-
lator,” in ISSCC Dig. Tech. Papers, Feb. 1999, pp. 70–71.
lines were much closer to the magnetically neutral [8] W. Athas, N. Tzartzanis, L. J. Svensson, L. Peterson, H. Li, X. Jiang,
center line of the transmission line). The problem can be P. Wang, and W.-C. Liu, “AC-1: A clock-powered microprocessor,”
mitigated by use of braided transmission lines. (as detailed in in Proc. Int. Symp. Low-Power Electronics and Design, Aug. 1997,
[Online] Available: http://www.isi.edu/acmos/people/nestoras/pa-
Section IV-C). pers/97-08.MontereyAC1.ps.
Analysis of the test chip showed that 90 coupling between [9] J. Wood. PCT/GB00/00175. MultiGig Ltd.. [Online]. Available:
M5 and the orthogonal thin M4 lines is not a significant http://www.delphion.com/cgi-bin/viewpat.cmd/WO00044093A1
[10] B. Kleveland, T. H. Lee, and S. S. Wong, “50-GHz interconnect design
problem, making it possible to route power and signals between in standard silicon technology,” presented at the IEEE MTT-S Int.
regions bounded by the rotary clock structures. Microwave Symp., Baltimore, MD, June 1998, [Online] Available:
http://smirc.stanford.edu/papers/mtts98p-bendik.pdf.
[11] B. Kleveland, X. Qi, L. Madden 1, R. W. Dutton, and S. S. Wong, “Line
VI. CONCLUSION AND FURTHER WORK PLANNED inductance extraction and modeling in a real chip with power grid,” pre-
sented at the IEEE IEDM Conf., Washington, D. C., Dec. 1999, [Online]
This paper has described the rotary traveling-wave oscillator Available: http://gloworm.stanford.edu/tcad/pubs/device/iedm.pdf.
[12] N. Delorme et al., “Inductance and capacitance analytic formulas for
(RTWO) and its potential application to gigahertz-rate VLSI VLSI interconnect,” Electron. Lett., vol. 32, no. 11, May 23, 1996.
clocking. The oscillator is unique for a resonant-style LC-based [13] C. S. Walker, Capacitance, Inductance and Crosstalk Anal-
oscillator in that it produces square waves directly and can ysis. Norwood, MA: Artech, 1990, p. 95.
[14] A. Deutsch et al., “Modeling and characterization of long on-chip inter-
be hardwired to form rotary oscillator arrays (ROAs). Being connections for high-performance microprocessors,” IBM J. Res. De-
LC-based, the oscillator is stable and jitter is low. velop., vol. 39, no. 5, pp. 547–567, Sept. 1995. p. 549.
[15] J. B. Beyer et al., “MESFET distributed amplifier design guidelines,”
The formulas presented here give practical adiabatic oscil- IEEE Trans. Microwave Theory Tech., vol. MTT-32, pp. 268–275, Mar.
lator designs suitable for VLSI fabrication. The structure and 1984.
operation of the RTWO is fundamentally simple and amenable [16] Y. Tsividis, Operation and Modeling of the MOS Transistor, 2nd
ed. New York: McGraw-Hill, 1999, pp. 339–340.
to analysis. We find that agreement between simulation and [17] H. Larsson, “Distributed synchronous clocking using connected ring
measurement is good. oscillators,” Master’s thesis, Computer Systems Engineering Centre for
We need to demonstrate skew control (believed to be inherent) Computer System Architecture, Halmstad Univ., Halmstad, Sweden,
Jan. 1997. [Online] Available: http://www.hh.se/ide/ccaweb/publica-
to fully establish that the simulated performance of multiring tions/97/distclock/9705.ps.
ROAs is realizable, and to measure susceptibility to induced [18] L. Hall, M. Clements, W. Liu, and G. Bilbro, “Clock distribution
high-frequency noise. Further work is planned to establish firm using cooperative ring oscillators,” in Proc. IEEE 17th Conf. Ad-
vanced Research in VLSI (ARVLSI’97), 1997, [Online] Available:
mathematical/analytical foundations for the prediction of both http://www.computer.org/proceedings/arvlsi/7913/79130062abs.htm.
jitter and skew and to determine exact stability criteria for ar- [19] T. C. Edwards and M. B. Steer, Foundations of Interconnect and Mi-
crostrip Design, Chichester, U.K.: Wiley, 2000, ch. 6. sec. 6.11.
rayed oscillators. Currently, a test chip using braided transmis- [20] C. P. Yue and S. S. Wong, “On-chip spiral inductors with patterned
sion line design to minimize coupling and incorporating varac- ground shields for Si-based RF ICs,” IEEE J. Solid-State Circuits, vol.
tors to control frequency is awaiting packaging and test. 33, pp. 743–752, May 1998.
[21] C. P. Yue and S. S. Wong, “A study on substrate effects of silicon-based
Looking to the future, our simulations predict that the oscil- RF passive components,” in MTT-S Int. Microwave Symp. Dig., June
lator scales well. On a more modern 0.18- m copper process, 1999, pp. 1625–1628.
10.5-GHz square-wave oscillator/distributors should be realiz- [22] M. E. Becker and T. F. Knight Jr. Transmission line clock driver. pre-
sented at 1999 IEEE Int. Conf. Computer Design. [Online]. Available:
able consuming less than 32 mA per ring using slimmer 10- m http://www.computer.org/proceedings/iccd/0406/04060489abs.htm
conductors. From simulation, the RTWO also appears to be vi- [23] P. Zarkesh-Ha and J. D. Meindl, “Asymptotically zero power dissipation
Gigahertz clock distribution networks,” IEEE Electrical Performance
able on SOI processes.
and Electronic Packaging, pp. 57–60, Oct. 1999.
[24] K. Bernstein, K. Carrig, C. M. Durham, and P. A. Hansen, High Speed
CMOS Design Styles. Norwood, MA: Kluwer, 1998, p. 22.
ACKNOWLEDGMENT [25] T. Soorapanth, C. P. Yue, D. Shaeffer, T. H. Lee, and S. S. Wong, “Anal-
ysis and optimization of accumulation-mode varactor for RF ICs,” pre-
The authors would like to thank P. Franzon and M. Steer, both sented at the Symp. VLSI Circuits, Honolulu, HI, June 11–13, 1998,
of North Carolina State University, for their assistance, and the [Online] Available: http://smirc.stanford.edu/papers/VLSI98p-chet.pdf.
Raunds and British public library service. [26] H. B. Bakoglu, J. T. Walker, and J. D. Meindl, “A symmetric clock-
distribution tree and optimized high speed interconnections for reduced
clock skew in ULSI and WSI circuits,” in IEEE Int. Conf. Computer
REFERENCES Design, Oct. 1986, pp. 118–122.
[27] M. Bußmann and U. Langmann, “Active compensation of interconnect
[1] E. G. Friedman, High Performance Clock Distribution Net- losses for multi-GHz clock distribution networks,” IEEE Trans. Circuits
works. Boston, MA: Kluwer, 1997. and Syst. II, vol. 39, pp. 790–798, Nov. 1992.
[2] D. Harris, Skew Tolerant Circuit Design. San Mateo, CA: Morgan [28] M. C. Papaefthymiou and K. H. Randall, “Edge-triggering vs.
Kaufmann, 2000. two-phase level-clocking,” presented at the 1993 Symp. Re-
[3] G. A. Pratt and J. Nguyen, “Distributed synchronous clocking,” IEEE search on Integrated Systems, Mar. 1993, [Online] Available:
Trans. Parallel Distributed Syst., vol. 6, pp. 314–328, Mar. 1995. http://www.eecs.umich.edu/~marios/papers/sis93.ps.
WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS 1665

[29] L. Benni et al., “Clock skew optimization for peak current reduction,” Terence C. Edwards (M’89) received the M.Phil.
J. VLSI Signal Processing, vol. 16, pp. 117–130, 1997. degree in microwaves.
[30] International Semiconductor Roadmap for Semiconductors (1999). He is the Executive Director of Engalco, a con-
[Online]. Available: http://public.itrs.net/files/1999_SIA_Roadmap/De- sultancy firm based in the U.K., mainly specializing
sign.pdf in signal transmission technologies and the global
[31] I. S. Kourtev and E. G. Friedman, Timing Optimization Through Clock RF and microwave industry. He researches and takes
Skew Scheduling. Boston, MA: Kluwer, 2000. responsibility for regular releases of Microwaves
[32] MultiGig, Ltd. Rotary Explorer. [Online]. Available: http://www. North America, published 1995, 1998, and 2001.
multigig.com/software.htm He has authored several publications (including
[33] M. Kamon, M. J. Tsuk, and J. K. White, “FASTHENRY: A multipole-ac- papers published in the IEEE TRANSACTIONS ON
celerated 3-D inductance extraction program,” IEEE Trans. Microwave MICROWAVE THEORY AND TECHNIQUES), has led
Theory Tech., vol. 429, pp. 1750–1758, Sept. 1994. management seminars on fiber optics, presented a paper on mobile technologies
[34] BSIM Research Group. (2000–2001) The BSIM4 Short-Channel at the IMAPS Microelectronics Symposium, Philadelphia, PA, October 1997,
Transistor Model. Univ. of California at Berkeley. [Online]. Available: and has written several articles and books. These include (jointly with Prof.
http://www-device.eecs.berkeley.edu/~bsim3/bsim4.html Michael Steer) one recently on MICs (New York: Wiley) and on gigahertz and
terahertz technologies (Norwood, MA: Artech, 2000). He is on the editorial
advisory board for the International Journal of Communication Systems. He
regularly consults for both national and overseas companies and is on the
prestigious IEE (London) President’s List of Consultants.
Mr. Edwards is a Fellow of the Institution of Electrical Engineers (IEE), U.K.

Steve Lipa (S’00) received the B.S. degree in elec-


John Wood is the Engineering Director of MultiGig, trical engineering from the University of Virginia,
Ltd., a U.K. technology startup specializing in multi- Charlottesville, in 1980, and the M.S. degree in
gigahertz circuit design I.P. electrical engineering from North Carolina State
Previously, he has worked as a consultant design University, Raleigh, in 1993. He is currently working
engineer on multidomain design projects in mechan- toward the Ph.D. degree in electrical engineering at
ical, power electronics, infrared optics, and software North Carolina State University.
development roles. He holds a number of patents He is currently a Research Assistant and Labo-
which have been licensed for manufacture in the ratory Manager with the Microelectronics Systems
fields of infrared plastic welding and high-speed Laboratory at North Carolina State University. He
digital signaling. His technical interests include has ten years of experience as an Integrated Circuit
all areas of engineering design, but particularly Design Engineer, primarily in the design of high-speed digital logic circuits.
electromagnetics, VLSI circuit design, and high-speed analog techniques. His current research is in the area of high-speed clock distribution.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999 97

A 1.6-GHz Dual Modulus Prescaler Using the Extended


True-Single-Phase-Clock CMOS Circuit Technique (E-TSPC)
J. Navarro Soares, Jr., and W. A. M. Van Noije

Abstract—The implementation of a dual-modulus prescaler (di- effective channel length) is presented. The prescaler imple-
vide by 128/129) using an extension of the true-single-phase-clock mentation purpose is the evaluation of the E-TSPC technique
(TSPC) technique, the extended TSPC (E-TSPC), is presented. potentialities.
The E-TSPC [1], [2] consists of a set of composition rules for
single-phase-clock circuits employing static, dynamic, latch, data- This paper is organized as follow. In Section II, the principal
precharged, and NMOS-like CMOS blocks. The composition features of the E-TSPC technique, blocks and design rules,
rules, as well as the CMOS blocks, are described and discussed. are presented. In Section III, some different dual-modulus im-
The experimental results of the complete dual-modulus prescaler, plementations are analyzed. Experimental results and compar-
implemented in a 0.8 m CMOS process, show a maximum 1.59 isons are reported in Section IV, and the principal conclusions
GHz operation rate at 5 V with 12.8 mW power consumption.
They are compared with the results from other recent implemen- are drawn in Section V.
tations showing that the proposed E-TSPC circuit can reach high
speed with both smaller area and lower power consumption.
II. E-TSPC CIRCUIT BLOCKS AND COMPOSITION RULES
Index Terms— CMOS digital, high-speed circuits, prescalers,
single-phase-clock design.
A. Basic CMOS Blocks
An E-TSPC circuit should use any of the blocks: CMOS
I. INTRODUCTION static block, n-dynamic block [Fig. 1(a)], p-dynamic block

F OR MORE than 15 years, CMOS has been the main


technology for very-large-scale integration (VLSI) system
design. From the beginning to nowadays, several CMOS clock
[Fig. 1(c)], n-latch block [Fig. 1(e)], p-latch block [Fig. 1(g)],
and high (PH) and low (PL) data-precharged blocks (Fig. 2).
In Fig. 1, the clocked transistors of the n- and p-latches are
policies have been proposed. The pseudotwo-phase logic was placed close to the power rail, following the suggestion of [11].
one of the earliest techniques [3]. Later on, two-phase logic This configuration can attain a higher speed but suffers charge-
structures were proposed. The domino technique [4] associated sharing problems. Clocked transistors close to either the power
successfully both two-phase and dynamic CMOS circuits. rail or the block output are admissible latch configurations.
With the NORA technique [5], [6], an extensive no-race In data-precharged blocks [10], some input signals, called
approach for two-phase and dynamic circuits was developed. precharging inputs or pc-inputs, control the output precharge
A single-phase-clock policy was introduced in [7] [the true (see Fig. 2). If all PH block pc-inputs are high, or if all PL
single-phase-clock (TSPC)]. This technique was subsequently block pc-inputs are low, then the PH or PL block is precharged.
advanced by [8]–[10]. In this case, the PH block output goes to low, and the PL
Single-phase-clock policies are superior to the others due block output to high. In Fig. 2, the CMOS static block that
to the simplification of the clock distribution. They reduce the executes the logic function is drawn, along
wiring costs and the number of clock-signal requirements (no with all equivalent PH and PL blocks [Fig. 2(b) and (c)]. The
problems with phase overlapping, for instance). Consequently, pc-inputs of each block are also indicated. The PH and PL
higher frequencies and simpler designs can be achieved. blocks that have the output precharged when the clock is low
Introduced by [1] and [2], the extended true-single-phase- will be called n-Dp blocks; similarly, the PH and PL blocks
clock CMOS circuit technique (E-TSPC), an extension of the that have the output precharged when the clock is high will
TSPC, consists of composition rules for single-phase circuits be called p-Dp blocks.
using static, dynamic, latch, data-precharged, and NMOS-like
blocks. The composition rules enlarge the block-connection B. Composition Rules
possibilities and avoid races; additionally, NMOS-like blocks
enhance the technique for high-speed operations. First, the definition of data chains, fundamental to the design
The design of a dual-modulus prescaler (divide by 128/129) rules, is given.
with the E-TSPC in a standard 0.8 m CMOS process (0.7 m Definition: An n-data chain is any noncyclic signal prop-
agation path:
1) containing at least one n-latch, one n-dynamic, or one
n-Dp block;
Manuscript received February 16, 1998; revised May 25, 1998. This work
was supported in part by FAPESP and CNPq, Brazil. 2) starting in a circuit external input, or in the output of a
The authors are with the LSI/PEE, Escola Politécnica, University of p-latch, p-dynamic, or p-Dp block; when this output is
São Paulo, São Paulo, S.P. 05508-900 Brazil (e-mail: navarro@lsi.usp.br;
noije@lsi.usp.br). followed by static blocks in the normal data flow, the
Publisher Item Identifier S 0018-9200(99)00410-2. data chain starts in the output of the last static block;
0018–9200/99$10.00  1999 IEEE
98 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

(a) (b) (c) (d)

(e) (f) (g) (h)


Fig. 1. Construction blocks of the E-TSPC circuit technique: (a) n-dynamic and (b) NMOS-like n-dynamic blocks; (c) p-dynamic and (d) NMOS p-dynamic
blocks; (e) n-latch and (f) NMOS-like n-latch blocks; and (g) p-latch and (h) NMOS-like p-latch blocks.

(b)

(a)

(c)
Fig. 2. Transformation from (a) a static block into data-precharged blocks: (b) PH blocks and (c) PL blocks.

3) going through static, n-dynamic, n-Dp, or n-latch blocks; For the p-data chains, an equivalent definition applies,
4) regardless of the number and ordering of the blocks replacing n with p and vice versa.
defined above; When clock is high, n-data chains are in evaluation phase;
5) finishing in a circuit external output, or in the input of otherwise, they are in holding phase. P-data chains evaluate
the first p-latch, p-dynamic, or p-Dp block. when clock is low.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999 99

Fig. 3. Example of n-data chains. The blocks mentioned in the text are named and hatched in the figure.

In Fig. 3, part of a circuit schematic is depicted with seven


complete n-data chains. Some examples are the data chain
starting at input and going through blocks , , ,
and ; the data chain starting at and going through ,
, , , and ; and the data chain starting at and
going through , , , and .
Five of the six E-TSPC composition rules are now listed.
Their purpose is to ensure the observance of some constraints
during the evaluation and holding phases. To simplify the rule
statements, the symbol will be used to denote n or p in
nouns like -data chain, -dynamic block, etc. Fig. 4. Two TSPC D-flip-flops connected in series.
Composition Rule ( ): The -data chain input should be
an input of a dynamic block, an input of a latch, or a nonpc- Composition Rule ( ):
input of a Dp block. The -data chain must have one of the following two
Composition Rule ( ): A -latch must not drive, directly configurations:
or through static blocks, a -dynamic or a -Dp block. ) at least one dynamic block and one latch;
Composition Rule ( ): The number of inversions between: ) at least two latches and an even number of inversions
) any two adjacent dynamic blocks must be odd1; (latches or static blocks) between them.
) any two adjacent Dp-blocks of the same type (PH and It is worth noting that these five composition rules are very
PH or PL and PL) must be odd; similar to the five rules proposed in the NORA technique [6].
) any two adjacent Dp-blocks of complementary types In a circuit where all data chains obey the five rules, it can be
must be even; proved that (six theorems presented in [1] and [2]):
) a PH (PL) block and an adjacent n- (p)-dynamic (or a) all data-precharged blocks are precharged during the
vice versa) in an n- (p)-data chain must be even; holding phase of the data chains to which they belong;
b) the dynamic and the data-precharged blocks are not
) a PL (PH) block and an adjacent n- (p)-dynamic (or
incorrectly discharged during the evaluation phase;
vice versa) in an n- (p)-data chain must be odd.
c) the output of the data-chain last latch is steady during
(Two blocks are called adjacent if there are only static blocks the holding phase of the data chain.
between them.)
Composition Rule ( ): Consider the last dynamic block in C. Exception Rule
the -data chain (when it exists). The number of inversions
Although the above-described rules are necessary to avoid
(due to any block) from this dynamic block up to at least one
race problems, typical TSPC systems do not follow some of
-latch must be even.
them. The most common exception is found in connecting
1 Through all the rules, zero inversion will be considered even. two D-flip-flops (D-FF’s), as shown in Fig. 4. In such a
100 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

TABLE I
CONDITIONS FOR CORRECT OPERATION OF THE NMOS-LIKE BLOCKS

configuration, the p-data chains are constituted of only one


p-latch block, namely, or ( violation). In con-
sequence, the p-latch output may change during its holding
time. A faulty sequence example is depicted below: consider
an initial state on which the signals clock, input, and output a
are low, and both blocks and are evaluating. At
the end of the evaluation period, the outputs and are
high. Subsequently, when the clock goes to high, the other
blocks will evaluate. Suppose that works properly, holding
its former value (high). In this case, the node goes to low,
output a goes to high, and goes to low. As a result, the
transistor is cut, and the final value of node will depend
on the circuit delays.
Commonly, the delay between nodes and is long
enough to ensure that is fully discharged through transistors Fig. 5. Schematic of the dual-modulus prescaler (divide by 128/129).
and ; in this case, the second D-FF works properly.
reduce considerably the maximum speed. In such applications,
A simple exception rule is added to cover the utilization of
the p-data chains are limited to one block, and most logic
the well-established TSPC D-FF’s (Fig. 4).
operations are handled with n-data chains with limited logic
Exception Rule ( ): Configurations similar to that of
dept. Thus, deep pipelines will be necessary to implement
Fig. 4, where rules and are not obeyed, are accepted if
complex and fast logic designs.
enough delay exists.
NMOS-like dynamic and latch blocks can be used to mini-
The data chains where is applied, to the detriment of
mize this difficulty and also to increase the n-data chain speed.
and , do not have a latch with steady output at the holding
They are ratioed logic blocks, where the n-transistor section
phase. Since the correct operation of the circuit will depend
and the p-transistor section may conduct simultaneously. A
on the block delays, the exception rule should be used with
similar technique was used in [12], but restricted to D-FF’s.
caution.
In Fig. 1, the NMOS-like versions of the dynamic and latch
Considering the connection rules presented in former works
blocks are drawn. To assure a correct operation, these blocks
[7]–[10], our six proposed rules differ in the following aspects.
should satisfy the constraints summarized in Table I. The
a) The “nonlatched domino logic,” a timing strategy con- transistor section that must impose the output value, when both
sidered in [10], is not accepted in our proposal. sections are conducting, is drawn with bold lines in Fig. 1.
b) The proposed rules permit a more flexible usage of both The NMOS-like blocks are faster due to the reduced number
data-precharge blocks, due to the distinction between pc of transistors in series, but, unfortunately, they consume more
and nonpc-inputs, and static logic blocks (static logic is power. In consequence, they should be used only in critical
allowed between dynamic and latch blocks). In Fig. 2, data chains, where the desirable speed has not been reached.
where no rule violations occur, several connections not Since the connection characteristics do not depend on whether
allowed by former work rules are provided, for instance, it is a conventional or an NMOS-like block, the composition
the connection between blocks and , between rules ( – and ) are valid and necessary for both; as a
and , between and , etc. result, NMOS-like blocks and conventional blocks can replace
one another, and the judicious selection of NMOS-like blocks
is made easy.
D. NMOS-Like Logic Extension Summarizing, the static blocks, the n/p-dynamic, the n/p-
When high speed is also a requirement, restrictions on the latch, the PH/PL data-precharged, the NMOS-like blocks, and
use of p-dynamic and p-latch blocks should be imposed. These the composition rules – and compose the E-TSPC
blocks have at least two p-transistors in series, which may technique.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999 101

Fig. 6. Transistor schematic of the divide-by-4/5 counter DG4 . The transistor width or, when the length is different from 0.8 m, the transistor width/length,
in m, is also indicated in the figure.

TABLE II
MAXIMUM SPEED AND POWER-CONSUMPTION RESULTS FOR
THE FOUR DESIGNED DIVIDE-BY-4/5 COUNTERS (SPICE
SIMULATIONS, SLOW PARAMETERS, AND VDD = 5 V)
Design Speed Power
(GHz) (w/MHz)
DG1 0.98 3.27
DG2 1.28 4.45
DG3 1.39 4.85
DG4 1.67 5.62

III. DUAL-MODULUS DESIGN


Dual-modulus prescalers, a circuit with applications in fre-
Fig. 7. Photograph of the prescaler test chip.
quency synthesis systems, have been frequently used to com-
pare different high-speed implementations [12] and [13], our
current goal. A high-speed dual-modulus prescaler (divide The maximum speed and the power consumption for each
by 128/129) was designed using a standard 0.8 m CMOS design are shown in Table II. These results were obtained with
bulk process. SPICE simulations from the extracted netlists of the layouts
The schematic of the dual-modulus prescaler is depicted for slow parameters, room temperature, and power supply at
in Fig. 5. The circuit inside the cross-hatched box, composed 5 V. The comparison of the results exhibits some advantages
of three D-FF’s and two logic gates, forms a divide-by-4/5 of the E-TSPC technique. From the to approach,
counter. The div32 signal selects if it counts up to four (div32 the speed improvement is higher than 70%, and from
high) or up to five (div32 low). The five D-FF’s at to is 20%. On the other hand, the power consumption
the bottom of the figure form a divide-by-32 counter. The increases 72% from to . As uses only NMOS-
fractional division ratio of the prescaler, 128 or 129, is selected like blocks, the latter result is not surprising, and confirms that
according to the signal. these blocks should be restricted to critical circuit parts. Since
Four different approaches were applied to draw a layout of the composition rules favor the replacement of conventional
the divide-by-4/5 counter, which is the critical high-speed part blocks with NMOS-like ones and vice versa, E-TSPC circuits
of the prescaler. The approaches are: can reach high speed and keep the power consumption low.
) design with conventional rise edge-triggered TSPC To better evaluate the above results, the following notes
D-FF (Fig. 4); should be taken into account:
) design with rise edge-triggered D-FF, and further • all approaches use small transistor sizes, usually minimum
optimization applying the E-TSPC technique; sizes (as indicated in Fig. 6);
) design with a modified fall edge-triggered D-FF [12]; • the Fig. 5 divide-by-4/5 counter schema was slightly
) design with fall edge-triggered D-FF, and further modified for each design ( ) to conform
optimization applying the E-TSPC technique. with its structure characteristics;
In Fig. 6, the transistor schematic of the approach, with • the NOR configuration of Fig. 6 is similar to an NMOS
transistor dimensions, is depicted. The three cross-hatched logic, but the load is now a PMOS transistor. It is faster
boxes mark the D-FF’s; the first D-FF (left) has a buffered than the CMOS static NOR and is used in the , ,
output. and approaches;
102 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

TABLE III
AREA, SPEED, AND POWER-CONSUMPTION
RESULTS FOR FOUR DIFFERENT PRESCALERS

measured circuit attained 1.59 GHz and 8.0 mW/MHz power


Fig. 8. Measured results for the prescaler maximum frequency (fmax ), left
axis (3 ), and current consumption at fmax , right axis (o), as a function of consumption with 5 V power supply. It can be advantageously
the power supply. compared with other implementations in terms of area and
power consumption; in terms of speed, it matches the fastest
• and blocks, Fig. 6, drive the clock signal to TSPC prescaler. The studies done during the design reveal that,
the divide-by-32 counter. All four designs have similar to take full advantage of the TSPC technique, every possible
configuration. configuration should be considered. The E-TSPC, being an
IV. EXPERIMENTAL RESULTS extension of TSPC, permits exploring a larger number of
solutions and, in consequence, finding the best configuration.
The full prescaler circuit, occupying a 0.0126 mm area, The dual-modulus prescaler results exhibit some significant
was formed with the counter . The D-FF’s of the 32 improvements produced by the E-TSPC.
asynchronous counter were built with conventional rise edge-
triggered TSPC D-FF (Fig. 4). The clock signal from the REFERENCES
divide-by-4/5 counter, Fig. 6, is inverted before being sent to
[1] J. Navarro and W. Van Noije, “E-TSPC: Extended true single-phase
the 32 counter. This expedient allows a longer time interval clock CMOS circuit technique,” in VLSI: Integrated Systems on Silicon,
for preparation of the signal div32. IFIP International Conference on VLSI, R. Reis and L. Claesen, Eds.
The prescaler test chip, whose photograph is shown in London, U.K.: Chapman & Hall, 1997, pp. 165–176.
[2] , “E-TSPC: Extended true single-phase-clock CMOS circuit tech-
Fig. 7, was mounted on an alumina substrate with the chip-on- nique for high speed applications,” SBMICRO, J. Solid-State Devices
board technique. A coplanar radio-frequency probe was used Circuits, vol. 5, pp. 21–26, July 1997.
to feed the unique prescaler high-speed signal, the clock input. [3] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design,
1st ed. Reading, MA: Addison-Wesley, 1985.
In Fig. 8, the measured maximum frequency and current [4] R. H. Krambeck, C. M. Lee, and H.-F.S. Law, “High-speed compact
consumption as a function of the power supply are shown. circuits with CMOS,” IEEE J. Solid-State Circuits, vol. SC-17, pp.
Since the used pulse generator has a maximum excursion of 614–619, June 1982.
[5] N. F. Gonçalves and H. J. De Man, “NORA: A racefree dynamic CMOS
3 V, the circuit real maximum frequencies are expected to technique for pipelined logic structures,” IEEE J. Solid-State Circuits,
be slightly higher than the measured results for power supply vol. SC-18, pp. 261–266, June 1983.
[6] N. F. Gonçalves, “NORA: A racefree CMOS technique for register
above 3 V. transfer systems,” Ph.D. dissertation, Katholieke Universiteit Leuven,
Performance results of this work, of two recently published Leuven, Belgium, 1984.
prescalers using TSPC D-FF’s, and of a new prescaler ar- [7] Y. Ji-ren, I. Karlsson, and C. Svensson, “A true single-phase-clock
dynamic CMOS circuit technique,” IEEE J. Solid-State Circuits, vol.
chitecture are summarized in Table III. In [13], the prescaler SC-22, pp. 899–901, Oct. 1987.
is implemented with rise edge-triggered TSPC D-FF’s, which [8] J. Yuan and C. Svensson, “High speed CMOS circuit technique,” IEEE
were size optimized to reach maximum speed; in consequence, J. Solid-State Circuits, vol. 24, pp. 62–70, Feb. 1989.
[9] M. Afghahi and C. Svensson, “A unified single-phase clocking schema
not only the circuit speed but also the area and power con- for VLSI systems,” IEEE J. Solid-State Circuits, vol. 25, pp. 225–235,
sumption are high. Fall edge-triggered TSPC D-FF’s with Feb. 1990.
small-sized transistors and with some NMOS-like blocks are [10] P. Larsson, “Skew safety and logic flexibility in a true single phase
clocked system,” in Proc. IEEE ISCAS, Seattle, WA, May 1995, pp.
used in [12]. The resulting circuit has a small area and a low 941–944.
power consumption but a reduced maximum operation rate. [11] Q. Huang, “Speed optimization of edge-triggered nine-transistor D-flip-
flop for gigahertz single-phase clocks,” in Proc. IEEE ISCAS, Chicago,
Our implementation, with the E-TSPC technique and small- IL, May 1993, pp. 2118–2121.
sized transistors, provides the smallest area and the lowest [12] B. Chang, J. Park, and W. Kin, “A 1.2 GHz CMOS dual-modulus
power consumption; the speed, in addition, is comparable to prescaler using new dynamic D-type flip-flops,” IEEE J. Solid-State
Circuits, vol. 31, pp. 749–752, May 1996.
[13] and [14]. [13] Q. Huang and R. Rogenmoser, “Speed optimization of edge-triggered
CMOS circuits for gigahertz single-phase clocks,” IEEE J. Solid-State
V. CONCLUSIONS Circuits, vol. 31, pp. 456–465, Mar. 1996.
[14] J. Craninckx and M. S. J. Steyaert, “A 1.75-GHz/3-V dual-modulus
A complete high-speed dual-modulus prescaler (divide by divide-by-128/129 prescaler in 0.7-m CMOS,” IEEE J. Solid-State
128/129) was developed in a 0.8 m CMOS process. The Circuits, vol. 31, pp. 890–897, July 1996.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002 835

A CMOS Monolithic -Controlled Fractional-N 16


Frequency Synthesizer for DCS-1800
Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE

Abstract—A monolithic 1.8-GHz 16 -controlled fractional-


phase-locked loop (PLL) frequency synthesizer is implemented in a
standard 0.25- m CMOS technology. The monolithic fourth-order
type-II PLL integrates the digital synthesizer part together with
a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz
2 mm2 . To investigate
16
dual-path loop filter on a die of only 2
the influence of the modulator on the synthesizer’s spectral
purity, a fast nonlinear analysis method is developed and exper-
imentally verified. Nonlinear mixing in the phase-frequency de-

16
tector (PFD) is identified as the main source of spectral pollution in
fractional- synthesizers. The design of the zero-dead zone
PFD and the dual charge pump is optimized toward linearity and
spurious suppression. The frequency synthesizer consumes 35 mA
from a single 2-V power supply. The measured phase noise is as Fig. 1. Principle of 16 fractional-N synthesis.
low as 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz.
The measured fractional spur level is less than 100 dBc, even
for fractional frequencies close to integer multiples of the refer- digital noise coupling, the modulator is scheduled for inte-
ence frequency, thereby satisfying the DCS-1800 spectral purity gration on the digital baseband signal processing IC of the full
constraints. transceiver system.
Index Terms—CMOS RF integrated circuits, 16
modulator, The paper describes the design of a monolithic 1.8-GHz
fractional- frequency synthesis, phase-locked loop, phase noise. -controlled fractional- PLL frequency synthesizer. In
Section II, the influence of noise on PLL bandwidth
I. INTRODUCTION requirements is theoretically analyzed for multistage noise
shaping (MASH) and multibit single-loop modulators.

T HE END of the 20th century was characterized by the unri-


valed growth of the telecommunication industry. The main
cause was the introduction of digital signal processing in wire-
Next, a fast nonlinear analysis method is presented, which
predicts possible degradation of the PLL spectral purity by
in-band noise leakage and re-emerging of spurious tones.
less communications, driven by the development of high-per- The nonlinearities in the phase-frequency detector (PFD)
formance low-cost CMOS technologies for VLSI. However, the charge pumps are identified as the main trouble spots. The
implementation of the RF analog front end remains the bottle- fourth-order type-II PLL building-block design is discussed in
neck. This is reflected in the large effort put into monolithic Section IV, focusing on integrated filter and voltage-controlled
CMOS integration of RF circuits both by academics and in- oscillator (VCO) design and on the realization of a linear phase
dustry [1]–[3]. error-to-charge-pump current conversion. In Section V, the
The goal of this work is the monolithic integration in stan- experimental results of the fractional- synthesizer prototype
dard CMOS technology of a frequency synthesizer to enable the are presented and compared to the simulations, showing good
full integration of a transceiver front end in CMOS, including correspondence.
a low-IF receiver and a direct upconversion transmitter [1]. To
achieve a high degree of integratability and fast settling under
II. THE FRACTIONAL- SYNTHESIZER
low-noise constraints, a fractional- synthesizer topology
has been chosen [4] (Fig. 1). fractional- synthesis circum- A. Introduction
vents the severe speed–spectral purity–resolution tradeoff of the A block diagram of a fractional- synthesizer is shown
classic phase-locked loop (PLL) synthesizer, by providing syn- in Fig. 1. The modulator output controls the instantaneous
thesis of fractional multiples of the reference frequency. Spu- division modulus of the prescaler, such that the mean division
rious tones that emerge from the fractional division are whitened modulus is , with the number of bits of the
and noise shaped by the action and ultimately filtered by modulator and the input word. The corresponding phase
the loop filter. To prevent degradation of the spectral purity by changes at the prescaler output are quantized, leading to possible
spurious tones and quantization noise. By selecting higher order
Manuscript received November 5, 2001; revised January 31, 2002. modulators, the spurious energy is whitened and shaped to
The authors are with the Katholieke Universiteit Leuven, Department high-frequency noise, which can be removed by the low-pass
Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: bram.de-
muer@esat.kuleuven.ac.be). loop filter. As a result, for a given frequency resolution, an ar-
Publisher Item Identifier S 0018-9200(02)05856-0. bitrary high can be chosen, by assigning the proper number
0018-9200/02$17.00 © 2002 IEEE
836 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 2. Third-order multibit single-loop 16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability
reasons.

of bits to the modulator. The loop bandwidth is not restricted


by the reference spur suppression, resulting in faster settling and
higher integratability. Additionally, the division modulus is de-
creased by a factor (with the minimum number of
bits for the frequency resolution, i.e., 7.02 in this case), so that
noise of the PLL blocks, except for the VCO, is less amplified.

B. The Modulators
The influence of both third-order MASH and multibit
single-loop modulators on the spectral purity of the
fractional- synthesizer is investigated. Since the order of
the integrated PLL loop filter is three, the order of the
modulators must also be three or higher to ensure that
noise has at least a 20-dB/dec rolloff at intermediate offset
frequencies, causing no degradation of the output phase noise.
Both modulators have an internal accuracy of 16 bit and 1 LSB
dithering is applied to further randomize any spurious energy. Fig. 3. Maximum PLL bandwidth f versus the reference frequency and
The dithering sequence is third-order noise shaped to avoid an different16 modulator orders, for the type-II fourth-order PLL. The dashed
curve is for the third-order single-loop modulator. The targeted phase-noise
increased noise floor.
The MASH or cascade 1-1-1 modulator is chosen be-
0
specification is 136 dBc/Hz at 3 MHz for DCS-1800.

cause it is easy to integrate in CMOS and is unconditionally


coupling and sensitivity to PLL nonlinearities, as will be
stable. The noise transfer function (NTF) of the MASH modu-
discussed in Section III.
lator is and contains three poles at the
origin of the plane. The result is harsh LF noise shaping and C. Theoretical Analysis
and substantial HF noise. In the time domain, this is reflected
in the intensive prescaler modulus switching. To synthesize a To theoretically model the impact of control on the spec-
frequency of , all moduli between 64 and 71 are tral purity of the synthesizer, a linear-time-invariant (LTI) PLL
employed. model is employed, with the quantization noise as an ad-
The multibit single-loop modulator is shown in Fig. 2. ditive noise source at the prescaler output. The prescaler
For ease of integration, the feedforward and feedback coeffi- with control can be looked upon as a digital-to-phase (D/P)
cients are a power of 2. Only four output bits are needed to con- converter. Every reference cycle, the prescaler subtracts
trol the prescaler moduli, but five output bits are used, to avoid rad from its input signal, with determined
overlap of the intended input operating range and the unstable by the modulator output. The resulting quantization noise
input regions. The NTF of the presented modulator is given in on the division modulus, and thus output phase, is approximated
(1) and contains only one pole at the origin of the plane and by uniformly distributed white noise [5]. The quantization noise
two low- Butterworth poles at , with a passband power is with for both modu-
gain of 3.2. lators with the modulus range and the number of signif-
icant output bits. The phase noise contribution of the
(1) modulator at the output of the synthesizer is found in (2) [6],
with the closed-loop transfer function of the fourth-order
Although the single-loop modulator is more complex than type-II PLL.
the MASH modulator, it offers a higher flexibility in terms of
noise shaping. The HF quantization noise of the modulator (2)
can be spread out by proper pole positioning. As a result, the
prescaler modulus switching is less intense. Only the moduli Since the main advantage of fractional- synthesizers
between 66 and 69 are needed to synthesize . is the decoupling of the reference frequency and the PLL
The reduced HF switching has advantageous effects on noise bandwidth , the influence of the noise on the bandwidth
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 837

III. FAST NONLINEAR ANALYSIS METHOD


The theoretical analysis suggested that applying control
to the prescaler would not cause any problems for the spectral
purity of the PLL. Practice, however, proves this wrong. A fast
nonlinear analysis method is developed which can take into ac-
count the nonlinearity of the PLL building blocks. The analysis
method is at the same time sufficiently fast to sweep simulations
over different degrees of nonlinearities and operating points, and
is capable of performing sufficiently long transient simulations
to get accurate fast Fourier transforms (FFTs) of the phase vari-
able. The fractional operation of the PLL is simulated in discrete
time and in open loop under locked conditions to avoid drift of
the phase error. To further speed up the simulation, the building
blocks are represented by high-level models with parameters to
model any nonlinear behavior or mismatch in critical transis-
tors. The simulations are performed in Matlab [9].
Fig. 4. Maximum PLL bandwidth f versus the reference frequency and To find the phase error, generated by the modulation of
16
different 18
modulator orders for < :15 . The dashed curve is for the the division modulus, the variation of the number of RF pulses,
third-order single-loop modulator. , at the output of the divider is monitored. Every reference
cycle, the number of RF pulses at the divider output is deter-
requirement is examined. To comply with the most stringent mined by the number of pulses swallowed by the control,
DSC-1800 phase noise specification, i.e., 133 dBc/Hz :
at 3 MHz offset [7], the target phase noise is
(3 MHz) dBc/Hz. In Fig. 3, the maximum PLL
bandwidth is plotted versus the reference frequency (4)
for different MASH modulator orders. The dashed line is the
maximum bandwidth for the single-loop multibit modu- The resulting quantized phase changes are compared with the
lator of Section II-B. For a reference frequency of 26 MHz, not phase that would be expected when the loop would be in lock,
much is gained from increasing the modulator order. For a high i.e., the phase corresponding to the fractional part of the divi-
bandwidth and thus a fast PLL, the reference frequency and/or sion modulus . The result is the instantaneous accumu-
the modulator order should be increased leading to an increased lated phase error :
power consumption and circuit complexity. The maximum
bandwidth is 87 kHz for the third-order MASH modulator and (5)
62 kHz for the single-loop multibit modulator.
Apart from the out-of-band phase-noise constraint, the in-
The phase error is converted to current pulses, , in the
tegrated in-band phase noise, determining the rms phase error
charge pump. The (phase-error charge-pump cur-
of the PLL is of importance. To be sure that the
rent) conversion is modeled to contain any PFD nonlinearity.
does not corrupt the rms phase error, the dynamic range of the
Mismatch in the up and down current sources, resulting in gain
modulator must be higher than the dynamic range of the PLL
mismatch for positive and negative phase errors is modeled by
[8]. The integrated in-band frequency noise is given by
. The occurrence of a dead zone is modeled by
with the noise bandwidth of the PLL
and 10 the in-band phase noise in dBc/Hz. The noise
bandwidth of the presented PLL is . The max- (6)
imum bandwidth of the PLL is calculated in (3) [8].
By taking an FFT of the current pulses, the current noise
(3) spectrum is obtained. The current noise spectrum is modeled
as a phase-noise source which is subjected to its corresponding
The maximum PLL bandwidth is plotted versus the ref- closed-loop transfer function, obtained from the LTI PLL
erence frequency of the PLL for different MASH modulator model. This means that the filter is modeled by its linear
orders in Fig. 4. For the single-loop multibit modulators transfer function, which includes parasitic gain and pole
(dashed curve), the actual maximum bandwidth can be calcu- position changes. The nonlinear conversion from voltage to
lated to be 25% smaller than in (3), due to the Butterworth poles. frequency/phase in the VCO is modeled by the variation of the
In the case of a third-order modulator, a 1.5 rms phase error (to VCO gain, when changing the operating point of the PLL.
ensure at least an overall rms phase error of 2 ) and a of The analysis tool enables the evaluation and comparison of
26 MHz, the maximum bandwidth is 810 and 614 kHz, respec- the effect of MASH and single-loop noise on the PLL.
tively. Obviously, the constraint posed on the modulator This analysis is performed with the following nonlinearities: a
noise due to in-band noise contributions is much less severe than 0.1% dead zone and a gain mismatch of 2%. The internal ac-
the constraint due to the out-of-band phase noise at 3 MHz. curacy of both modulators is 16 bit. The reference frequency
838 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

1
Fig. 5. Simulation results. The phase error  for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP [i] for
(c) the MASH modulator and (d) the single-loop multibit modulator.

is 26 MHz and the fractional division number is 67.92. The domain, this effect corresponds to the smaller phase excursions.
output frequency is 1.76592 GHz, i.e., 2.08 MHz offset from The difference in phase error between MASH and single-loop
an integer multiple of . In Fig. 5(a) and (b), the time-domain modulators is reflected in a lower noise floor, i.e., a 10-dB dif-
phase error is plotted for both modulators. Note that the ference. In addition, previously unnoticed spurious tones appear
fractional- PLL frequency synthesizer can hardly be called a in the output spectrum at with .
phase-locked loop, since the loop is never in lock! Due to the Fig. 6 shows the noise of both modulators as it appears at
shaping of the HF noise in the single-loop modulator, the in- the PLL output for an ideal (dotted) and a nonlinear
stantaneous phase error is smaller than for a MASH modulator. conversion (solid). The results of the ideal case closely match
This has two important consequences. First, the on-time of the the theoretical results of Section II-C (solid light gray). Due
charge pumps is smaller for the single-loop modulator, making it to nonlinearity, the simulated output spectrum of the integer-
less sensitive to noise coupling from the substrate and the power PLL (the dash-dotted line) is seriously deteriorated by noise
supply. Second, the sensitivity to the nonlinear con- in the PLL noise bandwidth, increasing the . Especially,
version in terms of noise leakage is reduced. the MASH converter is critical in terms of in-band noise due
To be able to examine the effect of nonlinearities in the fre- to the higher phase error [see Fig. 5(a)], despite the inherently
quency domain, the FFTs of the charge-pump current pulses lower LF noise of the MASH modulator. Note that the sim-
are plotted in Fig. 5(c) and (d). A noise floor appears in ulations are performed without taking into account noise cou-
the output spectrum as well as spurious tones, although the pling through the substrate or power-supply lines. As a conse-
output is perfectly randomized and dithered. Due to the non- quence, the actual spurious performance of the fractional-
linear mixing in the PFD charge pump, noise at folds PLL could be worse than simulated. The presented simulation
back to lower offset frequencies, similar to the effect of a non- results are for a division modulus 67.92, close to an integer mul-
linear DAC in a multibit ADC. Since the noise at is tiple of . When analyzing division moduli in between integer
much lower for the single-loop modulator, its noise leakage multiples of , noise leakage is still observed, but the spurious
due to the nonlinear mixing in the PFD is also lower. In the time tones are well below the phase noise.
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 839

Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a)
the MASH modulator and (b) the single-loop multibit modulator.

Fig. 6. Simulation results. The 16 noise at the output of the PLL for (a) the
PFD. This effect can be worsened by substrate and power-
supply coupling with signals at .
MASH modulator and (b) the single-loop multibit modulator. The results are
plotted for an ideal PFD (dotted), which closely corresponds to the theoretical
results (solid light gray) and for a nonlinear PFD (solid). They are compared to IV. PLL BUILDING-BLOCK CIRCUIT DESIGN
the simulated integer PLL phase noise (the dash-dotted line).
A. The Fourth-Order Type-II PLL
The explanation for the re-emerging of spurious tones is that A fourth-order type-II PLL is integrated, including a 4-bit
the modulator is unable to sufficiently decorrelate the successive prescaler, a zero-dead-zone PFD, a dual charge pump, and a
output samples. To quantify the correlation in the modulator 3-step equalizer, together with an on-chip LC-tank VCO and a
output, the discrete time autocorrelation estimate is calculated third-order dual-path 35-kHz low-pass loop filter (see Fig. 8).
and plotted for both modulators for inputs close to an integer The equalizer performs a 3-step piecewise equalization of the
value (see Fig. 7). The autocorrelation calculations show corre- loop gain, by keeping the product of the VCO gain and the
lation, although 1–LSB noise-shaped dithering is applied. The charge-pump current constant. To prevent switching between
autocorrelation of the single-loop modulator shows large different equalization states, the state transitions exhibit hys-
correlation peaks, explaining the higher spurious tones in the teresis.
output phase-noise spectrum of the PLL. With the autocorrela-
tion estimate, the necessary internal accuracy of the mod- B. The 4-Bit Prescaler
ulators is found to be at least 13 bits for MASH and 16 bit The first high-speed division of the prescaler is done
for single-loop modulators to sufficiently decorrelate the with two differential single-transistor-clocked (DSTC) logic
modulator output for inputs close to integers. A second possible n-latches [10], forming a differential dynamic D-flip-flop. The
source of tones is the downconversion of tones which are inher- flip-flop operates with rail-to-rail internal signals to minimize
ently present around [5], by the nonlinear mixing in the the residual prescaler phase noise [11] to levels insignificant to
840 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 8. Fully integrated fourth-order type-II phase-locked loop.

the overall phase-noise performance. The 16-modulus division TABLE I


SUMMARY OF THE LOOP PROPERTIES AND PERFORMANCE OF THE
(64 79) is implemented with the phase-switching topology FOURTH-ORDER TYPE-II PLL
[12]. The division moduli are generated by switching between
the 90 -spaced output phases of the second D-flip-flop. When
the 90 spacing is not ideal, spurs appear at 1/4, 2/4, and 3/4 of
the PLL reference frequency. It takes careful layout and circuit
design to equalize the delays of the different quadrature paths,
such that these spurious tones are suppressed to negligible
levels.

C. The Voltage-Controlled Oscillator


The LC VCO with on-chip inductor combines a 30% tuning
range at only 2 V and an excellent phase-noise performance
over a large frequency range. To minimize the VCO phase
noise, a simulator-optimizer program has been developed
which searches the optimal inductor geometry for a given
technology. The resulting hollow octagonal balanced inductor
has a as high as 9 with an inductance of 2.86 nH, for a
standard 0.25- m CMOS technology with only two metal
with a multiplication factor in the dual charge pumps. The
layers (0.6 and 1.0 m) [13].
addition realizes the low-frequency zero needed for loop sta-
The VCO is implemented as a single differential pMOS-only
bility in a type-II PLL, without adding the actual capacitor [12].
topology, leading to an enhanced tuning range, without in-
The total number of capacitors is the same as in a classical
creasing the power consumption and the VCO gain, [13].
fourth-order type-II PLL, but for the same phase noise the in-
For the frequency range of interest, is between 100 and
tegrated capacitance is more than 5 times smaller. Due to the
200 MHz/V, explaining the need for equalization of the loop
rather high VCO gain, the integrated capacitance is still 1.4 nF
gain. The VCO output is buffered from the prescaler input to
to be able to comply with the DCS-1800 phase-noise require-
prevent kickback noise from entering the tank. The measured
ments. An extra pole is added at 210 kHz to ensure
phase noise is as low as 127.5 dBc/Hz at 600 kHz and
enough suppression at higher offset frequencies. A filter op-
142.5 dBc/Hz at 3 MHz for a carrier frequency of 1.82 GHz.
timization model is developed, determining all pole and zero
positions and the capacitance–resistance tradeoff to obtain low
D. The 35-kHz Dual-Path Loop Filter noise and high integratability [14]. The results of the optimiza-
To achieve full integration, a dual-path filter topology has tion at 1765.92 MHz are listed in Table I. The total phase noise
been implemented (Fig. 8). Two filter paths, one active integra- is without the noise. The MASH and single-loop (SL)
tion ( ) and one passive low-pass filter are added noise contributions result from the nonlinear analysis. As
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 841

Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left)
the dummy current branch, denoted by the suffix d, and the output branch.

seen in Section II-C, the loop bandwidth needs to be smaller than at a fixed level (see Fig. 8). Additionally, the charge-pump cur-
62 kHz for noise suppression. However, to ensure sufficient rent is designed to be at least a magnitude larger than the fixed
suppression of the low-frequency fractional spurious tones for parasitic charge injection of the switch transistors. The current
inputs close to the integers, the bandwidth is designed to 35 kHz. switches are implemented with pMOS and nMOS transistors to
Despite the rather low loop bandwidth for a fractional- syn- compensate charge injection. Finally, a timing control scheme
thesizer, a settling time of less than 293 s for a 104-MHz step [Fig. 9(a)] is developed to control the charge-pump switches.
is simulated. The up and down control pulses of the PFD are converted to syn-
chronized control signals to drive both the output current branch
E. The Conversion and the dummy current branch of the charge pump [Fig. 9(b)].
Fig. 9(a) shows the dummy and output control signals. The
The nonlinear analysis of Section III identified nonlinearity dummy control is delayed versus the output control by
of the conversion as the main cause of noise leakage modifying the thresholds of the second inverter-string (indicated
and spurious tones. Therefore, the PFD and charge-pump cir- by high and low) such that the current always flows, pre-
cuits are carefully optimized toward spurious suppression as venting hard on/off switching of the current sources. To equalize
such and toward a highly linear phase-error detection for rise and fall times and force a perfect rad relation between
spurious suppression. nMOS and pMOS control signals, latches at the outputs of both
First, the reference spur generation by the PFD charge-pump inverter strings are implemented. Capacitors at the control out-
circuit is carefully minimized. The integration in the first path of puts lower the rise and fall times to prevent large charge injec-
the loop filter is done actively to keep the charge-pump output tions by fast switching.
842 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 10. IC microphotograph and the measurement setup in which it is embedded.

To linearize the conversion, the phase detection


is performed by a zero-dead-zone PFD [15], to prevent a hard
nonlinearity around 0 phase error. Due to the delay added in
the PFD, both the up and down current sources are on, for small
or zero phase errors, enabling the PFD to react to very small
phase errors. The on-time fraction of the charge pump due to
the delay is less than 10%. This value is a tradeoff between
dead-zone prevention and sensitivity to noise coupling, when
the charge pumps are on. To further minimize digital noise cou-
pling, the sampling in the PFD and the computational events
in the modulator and prescaler are offset in phase. Conse-
quently, the phase-error decision making is done in a relatively
quiet environment. To make sure that the gains for positive and
negative phase-error detection are equal, the current source tran-
sistors are oversized to ensure sufficient matching. As a side ef-
fect, the current source noise, which can seriously affect the
in-band noise, is decreased. Additionally, the timing control of Fig. 11. Measured output spectrum of the 16 fractional- N PLL at 1.76592
Fig. 9(a) provides synchronization between the two filter paths 0
GHz. All spurious tones are well below 75 dBc/Hz.
and the switches of the charge pumps themselves, thereby en-
suring equal positive and negative phase-error detection gain. 1.76592 GHz, i.e., for a fractional division by 67.92 for compar-
HSPICE simulations of the PFD charge-pump circuit are per- ison with the simulated results. The input to the modulators
formed and show no dead zone and no gain mismatch with ideal is a 16-bit word ( ), resulting in a frequency resolution of
transistor matching. around 400 Hz. The power-supply voltage is only 2 V. Fig. 11
shows the output spectrum of the fractional- PLL over a span
of 55 MHz. The reference spurs are well below 75 dBc, due
V. EXPERIMENTAL RESULTS to the careful charge-pump timing control.
To measure the fractional performance of the frequency syn-
Fig. 10 shows the IC microphotograph and the measurement thesizer, the Matlab data is stored in the data generator memory.
setup in which it is embedded. The fractional measure- Unfortunately, the maximum memory capacity is only 128 kbit,
ments are performed by controlling the PLL divider moduli with leading to large spurious tones at the output at low offset fre-
an HP80000 data generator, which generates the 4-bit control quencies. These large tones corrupt the gain calibration, which is
word. The 4-bit output bit stream is generated using Matlab. performed by the phase-noise measurement system every offset
This provides a flexible way to test different kinds of mod- frequency decade, such that accurate measurements of the phase
ulators, without the need for redesigns. All presented measure- noise at offsets smaller than 10 kHz are not feasible. The mea-
ments are performed with a 26-MHz reference frequency and at sured phase noise of the PLL with the MASH modulator and the
DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800 843

TABLE II
SUMMARY OF MEASURED SPECIFICATIONS COMPARED TO THE
DCS-1800 SPECIFICATIONS

Fig. 12. Phase-noise measurement with the 16 single-loop multibit converter


at 1.76592 GHz compared to the phase noise at integer division (light).

noisy control pulses are close to the LC tank and the bonding
wires of the VCO power supply. Without proper shielding, the
VCO phase noise is seriously degraded by this noise coupling.
In Fig. 13, the measured noise and the noise as sim-
ulated in Section III (dashed) is compared. The dash-dotted line
is the simulated phase noise of the PLL without control. The
simulated noise leakage closely matches the measured re-
sults, except at very low offsets due to the limited memory. The
phase noise at high offsets is increased versus the simulated PLL
results due to noise coupling. Second-order tones are larger in
measurements, since the models in the simulator do not include
second-order effects and noise coupling. Tones at 520 kHz are
believed to come from subharmonic tones present in the
Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz modulator output [5], which are amplified by mixing through
compared to the simulated 16 noise at the output of the PLL (dashed), and
with the simulated PLL output without 16 control (dash-dotted).
noise coupling. When comparing the results for the MASH and
the single-loop modulator, the measured results are less pro-
nounced than the simulated results (see Fig. 6). The measured
single-loop multibit modulator is presented in Figs. 12 and 13.
phase noise for the single-loop modulator is however a few deci-
Small spurs are present at 2.08 MHz as predicted by the simu-
bels lower than for the MASH modulator. Note that all measure-
lations in Fig. 6. The spur level is well below 100 dBc, due to
ments are performed for frequencies close to integer multiples
careful PFD charge-pump design. The phase noise at 600 kHz
of .
is lower than 120 dBc/Hz. The measured settling time of the PLL is 226 s for a
In Fig. 12, the measured phase noise of the PLL with a
104-MHz frequency step. The power consumption of the PLL
multibit single-loop modulator (dark) is compared to the phase
is 70 mW from a 2-V power supply. The fully integrated
noise at integer division (light). Noise at lower offsets origi-
low-phase-noise VCO is responsible for almost 66% of the
nates from the modulator due to noise folding in the PFD,
total power consumption. The IC area is 2 2 mm , including
as predicted by the simulations. As a result, the rms phase error
bonding pads and bypass capacitors. Table II shows the mea-
is increased from 1.7 to 3 . Note that the phase noise
sured specifications compared to the DCS-1800 specifications
of the PLL at integer divisions is as low as 124 dBc/Hz
[1]. The specifications of the IC prototype comply with the
at 600 kHz, which is only 0.3 dB higher than predicted by
DCS-1800, only the is degraded due to the limited
the PLL simulations (see Table I). The measured results for
resolution of the measurement setup.
fractional division are much noisier than predicted by simu-
lation. The phase noise at offset frequencies close to 10 kHz
is increased due to the limited memory of the data generator. VI. CONCLUSION
The noise at higher offset frequencies is corrupted by noise A monolithic 1.8-GHz -controlled fractional- PLL
coupling from the data generator. As can be seen in Fig. 10, frequency synthesizer is implemented in a standard 0.25- m
the -control bonding wires, which conduct rail-to-rail, very CMOS technology. The monolithic fourth-order type-II PLL
844 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

integrates the digital synthesizer part together with a fully [11] B. De Muer and M. S. J. Steyaert, “A single-ended 1.5-GHz 8/9 dual-
integrated LC VCO, a high-speed prescaler, and a 35-kHz modulus prescaler in 0.7-m CMOS with low phase-noise and high
input sensitivity,” in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC),
dual-path loop filter on a die of only 2 2 mm . To investigate The Hague, Sept. 1998, pp. 256–259.
the influence of the modulator on the synthesizer’s spectral [12] J. Craninckx and M. S. J. Steyaert, “Low-phase-noise fully integrated
purity, a fast nonlinear analysis method is developed, showing CMOS frequency synthesizers,” Ph.D. dissertation, Katholieke Univ.
Leuven, Belgium, 1997.
good correspondence with measurements, in contrast to the [13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, “A 1.8-GHz
results of the theoretical analysis. Nonlinear mixing in the highly tunable low-phase-noise CMOS VCO,” in Proc. IEEE Custom
phase-frequency detector and the VCO is identified as the main Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585–588.
[14] B. De Muer and M. S. J. Steyaert, “Fully integrated CMOS frequency
source of spectral pollution in fractional- synthesizers. synthesizers for wireless communications,” in Analog Circuit Design,
MASH and single-loop multibit modulators are compared W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell,
for use in fractional- synthesis. Although the MASH is stable MA: Kluwer, 2000, pp. 287–323.
[15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.
and easy to integrate, the single-loop modulator presents a
better solution, showing less sensitivity to noise leakage and
noise coupling and providing more flexibility. The measured
phase noise is lower than 120 dBc/Hz at 600 kHz and Bram De Muer (S’00) was born in Sint-Amands-
139 dBc/Hz at 3 MHz. The measured fractional spur level is berg, Belgium, in 1973. He received the M.Sc.
lower than 100 dBc, satisfying the DCS-1800 spectral purity degree in electrical engineering in 1996 from the
Katholieke Universiteit Leuven, Belgium, where
requirements. All measurements are performed for frequencies he is currently working toward the Ph.D. degree
close to integer multiples of the reference frequency, where the on high frequency low-noise integrated frequency
synthesizer is most sensitive to spurious tones. synthesizers at the ESAT-MICAS laboratories.
He has been a Research Assistant with
ESAT-MICAS laboratories since 1996. His research
REFERENCES is focused on integrated low-phase-noise VCOs with
on-chip planar inductors and high-speed prescaler
[1] M. S. J. Steyaert, J. Janssens, B. De Muer, M. Borremans, and N. Itoh, “A design, leading to fully integrated 16 fractional-N synthesizers in CMOS
2-V CMOS cellular transceiver front-end,” IEEE J. Solid-State Circuits, technology.
vol. 35, pp. 1895–1907, Dec. 2000.
[2] T. Cho, E. Dukatz, M. Mack, D. Macnally, M. Marringa, S. Mehta, C.
Nilson, L. Plouvier, and S. Rabii, “A single-chip CMOS direct-conver-
sion transceiver for 900-MHz spread-spectrum digital cordless phones,”
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Michel S. J. Steyaert (S’85–A’89–SM’92) was born
Francisco, CA, Feb. 1999, pp. 228–229. in Aalst, Belgium, in 1959. He received the M.S.
[3] A. Rofougaran, G. Chang, J. J. Rael, J. Y.-C. Chang, M. Rofougaran, P. degree in electrical-mechanical engineering and
J. Chang, M. Djafari, J. Min, E. W. Roth, A. A. Abidi, and H. Samueli, the Ph.D. degree in electronics from the Katholieke
“A single-chip 900-MHz spread-spectrum wireless transceiver in 1-m Universiteit Leuven (K.U. Leuven), Heverlee,
CMOS—Part II: Receiver design,” IEEE J. Solid-State Circuits, vol. 33, Belgium, in 1983 and 1987, respectively.
pp. 547–555, Apr. 1998. From 1983 to 1986, he obtained an IWONL fel-
[4] M. Copeland, T. Riley, and T. Kwasniewski, “Delta–sigma modulation lowship (Belgian National Foundation for Industrial
in fractional-N frequency synthesis,” IEEE J. Solid-State Circuits, vol. Research) which allowed him to work as a Research
28, pp. 553–559, May 1993. Assistant at the Laboratory ESAT at K.U. Leuven.
[5] S. R. Norsworthy, R. Schreier, and G. C. Themes, Delta–Sigma Data In 1987, he was responsible for several industrial
Converters: Theory, Design and Simulation. New York: IEEE Press, projects in the field of analog micropower circuits at the Laboratory ESAT as
1997. an IWONL Project Researcher. In 1988, he was a Visiting Assistant Professor
[6] B. Miller and R. Conley, “A multiple modulator fractional divider,” at the University of California, Los Angeles. In 1989, he was appointed by
IEEE Trans. Instrum. Meas., vol. 40, pp. 578–583, June 1991. the National Fund of Scientific Research (Belgium) as a Research Associate,
[7] “Digital cellular communication system (Phase 2+); Radio transmission in 1992 as a Senior Research Associate, and in 1996 as a Research Director
and reception,” Eur. Telecommun. Standards Inst., ETSI 300 190 (GSM at the Laboratory ESAT, K.U. Leuven. Between 1989 and 1996, he was also
05.05 version 5.4.1), 1997. a part-time Associate Professor and since 1997 an Associate Professor at
[8] W. Rhee, B.-S. Song, and A. Ali, “A 1.1-GHz CMOS fractional-N
16
the K.U. Leuven. His current research interests are in high-performance and
frequency synthesizer with a 3-b third-order modulator,” IEEE J. high-frequency analog integrated circuits for telecommunication systems and
Solid-State Circuits, vol. 35, pp. 1453–1460, Oct. 2000. analog signal processing.
[9] The Mathworks Inc., Matlab User’s Guide, Version 5. Englewood Dr. Steyaert received the 1990 European Solid-State Circuits Conference
Cliffs, NJ: Prentice Hall, 1997. Best Paper Award, the 1995 and 1997 ISSCC Evening Session Award, the
[10] J. Yuan and C. Svensson, “New single-clock CMOS latches and flip- 1999 IEEE Circuit and Systems Society Guillemin–Cauer Award, and the
flops with improved speed and power savings,” IEEE J. Solid-State Cir- 1991 NFWO Alcatel-Bell-Telephone award for innovative work in integrated
cuits, vol. 32, pp. 62–69, Jan. 1997. circuits for telecommunications.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000 1039

A Family of Low-Power Truly Modular Programmable Dividers


in Standard 0.35-m CMOS Technology
Cicero S. Vaucher, Igor Ferencic, Matthias Locher, Sebastian Sedvallson, Urs Voegeli, and Zhenhua Wang

Abstract—A truly modular and power-scalable architecture for


low-power programmable frequency dividers is presented. The ar-
chitecture was used in the realization of a family of low-power fully
programmable divider circuits, which consists of a 17-bit UHF di-
vider, an 18-bit -band divider, and a 12-bit reference divider. Key
circuits of the architecture are 2/3 divider cells, which share the
same logic and the same circuit implementation. The current con-
sumption of each cell can be determined with a simple power opti-
mization procedure. The implementation of the 2/3 divider cells is
presented, the power optimization procedure is described, and the
input amplifiers are briefly discussed. The circuits were processed
in a standard 0.35 m bulk CMOS technology, and work with a
nominal supply voltage of 2.2 V. The power efficiency of the UHF
divider is 0.77 GHz/mW, and of the -band divider, 0.57 GHz/mW.
Fig. 1. Fully programmable divider based on a dual-modulus prescaler.
The measured input sensitivity is 10 mVrms for the UHF divider,
and 20 mVrms for the -band divider.
Index Terms—CMOS integrated circuits, current-mode logic, time-to-market demands architectures providing easy optimiza-
frequency synthesizers, phase-locked loop, programmable fre- tion of power dissipation, fast design time and simple layout
quency counter, programmable frequency divider. work. High reusability, in turn, requires an architecture which
provides easy adaptation of the input frequency range and of
I. INTRODUCTION the maximum and minimum division ratios of existing designs.
The choice of the divider architecture is therefore essential

T HE feasibility of RF functions implemented in CMOS


technology has been demonstrated by a.o. the work
presented in [1]–[3]. They show that the scaling of CMOS
for achieving low-power dissipation, high design flexibility and
high reusability of existing building blocks. A modular architec-
ture complies with these requirements, as shall be demonstrated
technologies to deep submicron has made CMOS a technolog-
in this paper. The focus of the paper is first on the truly modular
ical option for the low-gigahertz frequency range. However, for
architecture and on the implementation of the circuits. Then the
CMOS to become a commercial option for RF building blocks
power optimization procedure and the design of the input am-
requires compliance to all trends of the consumer market:
plifier are briefly discussed. Finally, a collection of measured
miniaturization, low cost, high reliability and long battery
data and the conclusions are presented.
lifetime. Bulk CMOS technologies presently available satisfy
the low cost and reliability trends by standard design practice.
Complying to miniaturization and long battery lifetime, on the II. PROGRAMMABLE DIVIDER ARCHITECTURES
other hand, demands CMOS building blocks with low-power
dissipation and good electromagnetic compatibility (EMC) A. Architecture Based on a Dual-Modulus Prescaler
characteristics. A critical RF function in this context is the Fig. 1 depicts the divider architecture based on a dual-mod-
frequency synthesizer, more particularly the programmable ulus prescaler [4], [5]. The design of the dual-modulus prescaler
frequency divider. The divider consists of logic gates which itself has been extensively treated in the literature [4], [6]–[10].
operate at (or close to) the highest RF frequency. Due to the On the other hand, the architecture of Fig. 1 has some undesir-
divider’s complexity, high operation frequency normally leads able characteristics. One readily notices the lack of modularity
to high power dissipation. of the concept: besides the dual-modulus prescaler, the archi-
Other crucial aspects of the present-day consumer electronics tecture requires two additional counters for the generation of a
industry are the short time available for the introduction of new given division ratio. The programmable counters—which are, in
products in the market, and the short product lifetime. On top fact, fully programmable dividers, albeit not operating at the full
of that, the lifespan of a given CMOS technology is also short, RF frequency—represent a substantial load at the output of the
due to the aggressive scaling of minimum feature sizes. Short dual-modulus prescaler, so that power dissipation is increased.
Besides, the additional design and layout effort required for
Manuscript received November 16, 1999; revised January 24, 2000. the programmable counters increase the time-to-market of new
C. S. Vaucher is with the Philips Research Laboratories, 5656AA Eindhoven, products. These properties led us to conclude that the dual-mod-
The Netherlands. ulus-based architecture is not an interesting option for the real-
I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z. Wang are with
Philips Semiconductors Zurich, 8045 Zurich, Switzerland. ization of building blocks with high reusability, high flexibility,
Publisher Item Identifier S 0018-9200(00)03878-6. and short design time.
0018–9200/00$10.00 © 2000 IEEE
1040 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

(a)

(b)

Fig. 2. Programmable prescaler. (a) Basic architecture. (b) With extended division range.

B. Programmable Prescaler Architectures 1) can be realized. The division range is thus rather limited,
The “basic” programmable prescaler architecture is depicted amounting to roughly a factor two between maximum and
in Fig. 2(a). The modular structure consists of a chain of 2/3 di- minimum division ratios.1 The division range can be extended
vider cells connected like a ripple counter [11]. The structure by combining the prescaler with a set-reset counter [13]. In that
of Fig. 2(a) is characterized by the absence of long delay loops, case, however, the resulting architecture is no longer modular.
as feedback lines are only present between adjacent cells. This The divider implementation presented in Fig. 2(b) extends
“local feedback” enables simple optimization of power dissipa- the division range of the basic prescaler, whilst maintaining the
tion. Another advantage is that the topology of the different cells modularity of the basic architecture [14]. The operation of the
in the prescaler is the same, therefore facilitating layout work. new architecture is based on the direct relation between the per-
The architecture of Fig. 2(a) resembles the one presented in [12], formed division ratio and the bus programmed division word
which is also based on 2/3 divider cells. Yet there are two fun- Let us introduce the concept of effective
damental differences. First, in [12] all cells operate at the same length of the chain. It is the number of divider cells that are
(high) current level. Second, the architecture of [12] relies on effectively influencing the division cycle. Deliberately setting
a common strobe signal shared by all cells. This leads to high the mod input of a certain 2/3 cell to the active level overrules
power dissipation, because of high requirements on the slope of the influence of all cells to the right of that cell. The divider
the strobe signal, in combination with the high load presented chain behaves as if it has been shortened. The required effective
by all cells in parallel. length corresponds to the index of the most significative (and
The programmable prescaler operates as follows. Once in a active) bit of the programmed division word. Only a few extra
OR gates are required to adapt to the programmed division
division period, the last cell on the chain generates the signal
This signal then propagates “up” the chain, being re- word, as depicted on the right side of Fig. 2.
clocked by each cell along the way. An active mod signal en- With the additional logic the division range becomes:
ables a cell to divide by 3 (once in a division cycle), provided • minimum division ratio: ;
that its programming input is set to 1. Division by 3 adds one • maximum division ratio: .
extra period of each cell’s input signal to the period of the output We see that the minimum and maximum division ratios can be
signal. Hence, a chain of 2/3 cells provides an output signal set independently, by choice of and respectively. Subse-
with a period of quent changes in an optimized design can be realized with low
risk. A somewhat similar technique, applied to an asynchronous
programmable counter, is described in [9].

III. TRULY MODULAR PROGRAMMABLE DIVIDERS FAMILY

(1) A. Realized Circuits


The modular architecture of Fig. 2(b) was applied in the re-
In (1), is the period of the input signal , and alization of a family of fully programmable frequency dividers.
are the binary programming values of the cells 1 1In principle, it is also possible to divide by 3 , but the gap between this
to respectively. The equation shows that all integer division value and the continuous division range makes it useless in standard synthesizer
ratios ranging from (if all = 0) to (if all = applications.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000 1041

Fig. 3. Family of truly modular programmable dividers, and corresponding division range of the different implementations.

Fig. 4. Functional blocks and logical implementation of a 2/3 divider cell. Fig. 5. SCL implementation of an AND gate combined with a latch function.

Three circuits were implemented: an 18-bit -band divider, a to swallow one extra period of the input signal. In other words,
17-bit UHF divider, and a 12-bit reference divider. The architec- the cell divides by 3. If = 0, the cell stays in division by 2
ture and the division range of the dividers is presented in Fig. 3. mode. Regardless of the state of the input, the end-of-cycle
The -band divider was used as the basis for the UHF and for logic reclocks the signal, and outputs it to the preceding
the reference divider. The UHF divider consists of the same cir- cell in the chain signal).
cuitry as the -band divider, except for the first 2/3 cell, which
was removed. The reference divider is simply the -band di- C. Circuit Implementation of the 2/3 Divider Cells
vider stripped off its six high frequency cells. The use of standard rail-to-rail CMOS logic techniques
makes the integration of digital functions with sensitive RF
B. Logic Implementation of the 2/3 Divider Cells signal processing blocks difficult, due to the generation of large
A 2/3 divider cell comprises two functional blocks, as de- supply and substrate disturbances during logic transitions.
picted in Fig. 4. The prescaler logic block divides, upon control Source coupled logic (SCL), often referred to as MOS current
by the end-of-cycle logic, the frequency of the input signal mode logic (MCML), has better EMC properties, because of
either by 2 or by 3, and outputs the divided clock signal to the the constant supply current and differential voltage switching
next cell in the chain. The end-of-cycle logic determines the mo- operation [8]. Besides, SCL has lower power dissipation than
mentaneous division ratio of the cell, based on the state of the rail-to-rail logic, for (very) high input frequencies [15].
and signals. The signal becomes active once The logic functions of the 2/3 cells are implemented with the
in a division cycle. At that moment, the state of the input is SCL structure presented in Fig. 5. The logic tree combines an
checked, and if = 1, the end-of-cycle logic forces the prescaler AND gate with a latch function. Three AND_latch circuits are
1042 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Fig. 6. Transient simulation of optimized L-band divider.

used to implement Dlatch1, Dlatch3, Dlatch4 and the AND gates TABLE I
of the 2/3 cells (see Fig. 4). Therefore, six logic functions are SCALING OF CURRENTS IN THE 2/3 DIVIDER CELLS

achieved, at the expense of three tail currents only. Dlatch2 is


implemented as a “normal” D latch (without the differential pair
connected to the b–bn inputs).
The nominal voltage swing is set to 500 mV in the high fre-
quency (and high current) cells, and to 300 mV in the low cur-
rent cells A). The voltage is generated by the tail
current, set by the current source and by the load resistances

D. Power Dissipation Optimization


The absence of long delay loops in the architecture of Fig. 2
enables fast and reliable optimization of power dissipation,
since simulation runs may be done for clusters of two cells circuitry. High input sensitivity enables the divider to be directly
each time. coupled to a wide range of VCO’s, without the need for external
The critical point in the operation of the programmable (discrete) buffers. In addition, the input amplifier performs other
prescaler are the divide by 3 actions [11]. There is a maximum important functions, which are listed below.
delay between the mod and the clock signals in a given cell that • It provides reverse isolation, to prevent the divider activity
still allows properly timed division by 3. The maximum delay from “kicking-back” and disturbing the VCO.
is where is the period of the cell’s input • It provides single-ended to differential conversion of the
signal. The input frequency for each cell is scaled down by the (very often) single-ended VCO signal.
previous one. As a consequence, the maximum allowed delay • It enables the VCO to be ac coupled to the divider func-
increases as one moves “down” the chain. As the delay in a cell tion, and provides a signal to the first divider cell with the
is inverse proportional to the cell’s current consumption (which proper dc level.
is a property of current mode logic circuits), the currents in the
The required amplification of the UHF amplifier, set by sensi-
cells may be scaled down as well.
tivity requirements ( 20 dBm), has been split into two differen-
The results of a transient simulation with the optimized high
tial stages. Each differential pair operates with 50- A nominal
frequency cells of the -band divider are presented in Fig. 6.
The influence of current consumption on the slope of the dig- current, and has load resistances of 14 k The -band input
ital signals (and hence on the time delay) is clearly observed. amplifier is a scaled version of the UHF input amplifier. The tail
Table I presents the tail current and the resistance values of the currents were doubled, and the drain resistances were halved.
optimized divider cells. Layout optimization took about three The nominal low frequency small signal gain of the UHF ampli-
iteration cycles. Transient simulations, including extracted par- fier is 26 dB; the gain of the -band amplifier is 23 dB. Negative
asitics, showed that layout parasitics caused a decrease of about feedback from the output node to the input was implemented
30% in the highest operation frequency, when compared to the with 50 k resistances. The feedback provides dc biasing to the
original simulations. first stage, and allows AC coupling of the VCO signal to the first
differential pair.
E. Input Amplifiers
The input amplifier provides the required amplification of the IV. MEASUREMENTS
voltage-controlled oscillator (VCO) signal to “digital” levels, The control currents for the UHF and -band dividers can be
determined by the sensitivity specifications and by the divider set externally, through input pins. The input amplifier current is
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000 1043

Fig. 7. Sensitivity of the UHF divider, for different divider current settings. Division ratio = 511, nominal current is I = 10 A.

Fig. 8. Sensitivity curves of the L-band divider, for a few divider and amplifier current settings. Division ratio = 1023.

Fig. 9. Maximum operation frequency of the UHF and L-band dividers, as Fig. 10. Minimum input level for correct division of the frequency dividers,
function of divider current consumption (excluding input amplifiers). as function of the input amplifiers consumption.

by a differential input signal, carried over printed circuit board


controlled by the input current and the 2/3 divider cell cur- (PCB) strip-lines with a characteristic impedance of 50 The
rents by input current The curves presented in this section strip-lines were terminated with discrete resistances of 50 to
were obtained with the nominal supply voltage of 2.2 V, except ground, which were set close to the input leads of the input am-
where otherwise noted. plifiers. The maximum operation frequencies of the UHF and
-band dividers, as function of the current consumption, are
A. Input Sensitivity and Maximum Operation Frequency plotted in Fig. 9. The effect of the input amplifiers current con-
Fig. 7 presents sensitivity curves of UHF divider, for different sumption on the input sensitivities is displayed Fig. 10. Setting
current settings of the control current The nominal value of the UHF amplifier current to 230 A yields a sensitivity value
is 10 A. Fig. 8 shows sensitivity curves of the -band di- in excess of 10 mVrms.
vider, for different current settings in the divider and input am- The influence of the supply voltage on the maximum oper-
plifier. Such as the UHF divider, the -band divider is highly ating frequency was found to be small ( 5% for decreased
sensitive over a large frequency range. The circuits were driven from 2.2 V down to 1.8 V). It is interesting to mention that
1044 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Fig. 11. Phase noise of the reference and UHF divider, measured at 10 MHz. - - - Reference divider (I = 10 , F = 20 MHz, 010 dBm).
0
— Reference divider (I = 10 , F = 20 MHz, 20 dBm).

Fig. 12. Comparison of power efficiency (GHz/mW).

MCML circuits have been demonstrated to operate with supply level of the 20-MHz input signal. For the UHF divider, how-
voltages as low as 1.2 V [15], without significant loss of speed. ever, no dependency of the noise floor on the level of the input
signal at 640 MHz was observed. An increase of 25% in current
B. Phase Noise led to a change in noise floor from 122 dBc/Hz (nominal bias,
The phase noise of the UHF and reference dividers was = 10 A) to 124 dBc/Hz (with = 12.5 A). The noise
measured with a dedicated phase noise measurement system. floor of the reference divider went from 127.5 dBc/Hz down
We used coherent demodulation techniques (phase-locked to 130 dBc/Hz, with increased bias. Fig. 11 shows that the
loop configuration), and employed a low-noise 10 MHz signal high frequency cells of the UHF divider (see Fig. 3) contribute
source during the evaluation of the circuits. To facilitate the significantly to the phase noise, specially in the “ region.”
measurements, we implemented signal taps on the output An increase of noise of about 15 dB is observed, compared
of certain cells on the divider chain. The UHF divider was to the noise of the single reference divider’s cell.
provided with a tap on the output of its sixth cell; the
reference divider had the output of the first cell tapped. C. Power Efficiency
Fig. 11 presents the phase noise of the UHF divider, with Fig. 12 presents the power efficiency of the UHF and -band
nominal settings for the supply current. The straight lines rep- dividers, in comparison to recently published data on low-power
resent measured phase noise of the reference divider. We see a dividers and tuning systems. Power efficiency is defined here as
dependency of the noise floor of the reference divider on the the ratio of the divider’s maximum operation frequency to its
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000 1045

power dissipation, with dimensions of GHz/mW. The authors ACKNOWLEDGMENT


have found that (most of) the dividers presented in the literature The authors wish to thank G. van Veenendaal, of PS-Sys-
do not include an input amplifier. Therefore, only the current tems Laboratory, Eindhoven, The Netherlands, for evaluation
consumption of the “core” divider circuits is taken in the calcu- work done on the programmable dividers. Many thanks go to
lations. This leads to a fair comparison of the available data. D. Kasperkovitz and J. de Haas for the support provided during
Refs. [2] and [8] describe prescalers implemented in bulk the project.
CMOS technology. Reference [16] proposes a new synthe-
sizer architecture, where the divider is “powered-down” after REFERENCES
lock has been achieved. Ref. [14] describes a fully pro-
[1] S. Wu and B. Razavi, “A 900 MHz/1.8 GHz CMOS receiver for
grammable divider implemented in an ultrathin-film 0.25- m dual-band applications,” IEEE J. Solid-State Circuits, vol. 33, pp.
CMOS/SIMOX process. The CMOS/SIMOX divider power 2178–2185, Dec. 1998.
[2] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800
efficiency is about 30% higher than the -band divider’s. Our frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp.
divider, however, is implemented in a standard 0.35- m bulk 2054–2065, Dec. 1998.
technology. The power efficiency of a bipolar dual-modulus [3] Q. Huang et al., “GSM transceiver front-end circuits in 0.25-m
CMOS,” IEEE J. Solid-State Circuits, vol. 34, pp. 292–303, Mar. 1999.
prescaler [6] included as well, for technology benchmarking. [4] Y. Kado et al., “An ultralow power CMOS/SIMOX programmable
Its power efficiency is similar to the power efficiency of counter LSI,” IEEE J. Solid-State Circuits, vol. 32, pp. 1582–1587,
the CMOS/SIMOX divider. The fully programmable dividers Oct. 1997.
[5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New
described here demonstrate that architectural choices and op- York, NY: Wiley, 1997.
timization procedures can take standard 0.35 m CMOS to [6] T. Seneff et al., “A sub-1 mA 1.6 GHz silicon bipolar dual modulus
prescaler,” IEEE J. Solid-State Circuits, vol. 29, pp. 1206–1211, Oct.
performance levels comparable to more expensive technolo- 1994.
gies, such as bipolar and CMOS/SIMOX processes. [7] J. Craninckx and M. Steyaert, “A 1.75 GHz/3 V dual-modulus di-
vide-by-128/129 prescalar in 0.7 m CMOS,” IEEE J. Solid-State
Circuits, vol. 31, pp. 890–897, July 1996.
V. CONCLUSION [8] F. Piazza and Q. Huang, “A low power CMOS dual modulus prescaler
for frequency synthesizers,” IEICE Trans. Electron., vol. E80-C, pp.
This paper presented a truly modular and power-scalable 314–319, Feb. 1997.
[9] P. Larsson, “High-speed architecture for a programmable frequency di-
architecture for low-power fully programmable frequency vider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol.
dividers. The flexibility and reusability properties of the 31, pp. 744–748, May 1996.
architecture were demonstrated with the realization of a [10] J. Navarro Soares, Jr. and W. A. M. Van Noije, “A 1.6 GHz dual-mod-
ulus prescaler using the extended true-single-phase-clock CMOS circuit
family of programmable divider circuits, consisting of the technique (E-TSPC),” IEEE J. Solid-State Circuits, vol. 34, pp. 97–102,
UHF divider (17 bits), the -band divider (18 bits), and the Jan. 1999.
[11] C. S. Vaucher and D. Kasperkovitz, “A wide-band tuning system for
reference divider (12 bits). The UHF and reference divider fully integrated satellite receivers,” IEEE J. Solid-State Circuits, vol. 33,
were implemented by simple removal of divider cells from pp. 987–998, July 1998.
the -band circuitry. The implementation of the 2/3 divider [12] N. H. Sheng et al., “A high-speed multimodulus HBT prescaler for fre-
quency synthesizer applications,” IEEE J. Solid-State Circuits, vol. 26,
cells was presented, and the power dissipation optimization pp. 1362–1367, Oct. 1991.
procedure was described. To cope with EMC considerations, [13] C. S. Vaucher, “An adaptive PLL tuning system architecture combining
the dividers were implemented in CMOS SCL (current mode high spectral purity and fast settling time,” IEEE J. Solid-State Circuits,
vol. 35, pp. 490–502, Apr. 2000.
logic). The circuits were processed in a standard 0.35- m [14] C. S. Vaucher and Z. Wang, “A low-power truly modular 1.8 GHz pro-
bulk CMOS technology, and operate with a nominal supply grammable divider in standard CMOS technology,” in Proc. 25th Eur.
Solid-State Circuits Conf., Sept. 1999, pp. 406–409.
voltage of 2.2 V. The power efficiency of the UHF divider [15] M. Mizuno et al., “A GHz MOS adaptive pipeline technique using MOS
is 0.77 GHz/mW, and of the -band divider, 0.57 GHz/mW. current-mode logic,” IEEE J. Solid-State Circuits, pp. 784–791, June
The measured input sensitivity, including the input amplifiers, 1996.
[16] A. R. Shahani et al., “Low-power dividerless frequency synthesis using
is 10 mVrms for the UHF divider, and 20 mVrms for the aperture phase detection,” IEEE J. Solid-State Circuits, vol. 33, pp.
-band divider. 2232–2239, Dec. 1998.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001 761

A 10-Gb/s CMOS Clock and Data Recovery Circuit


with a Half-Rate Linear Phase Detector
Jafar Savoj, Student Member, IEEE, and Behzad Razavi, Member, IEEE

Abstract—A 10-Gb/s phase-locked clock and data recovery The next section of the paper presents the CDR architec-
circuit incorporates an interpolating voltage-controlled oscillator ture and design issues. Section III deals with the design of the
and a half-rate phase detector. The phase detector provides a building blocks. Section IV describes the experimental results.
linear characteristic while retiming and demultiplexing the data
with no systematic phase offset. Fabricated in a 0.18- m CMOS
technology in an area of 1 1 0 9 mm2 , the circuit exhibits an
RMS jitter of 1 ps, a peak-to-peak jitter of 14.5 ps in the recovered II. ARCHITECTURE
clock, and a bit-error rate of 1 28 10 6 , with random data
input of length 223 1. The power dissipation is 72 mW from a The choice of the CDR architecture is primarily determined
2.5-V supply. by the speed and supply voltage limitations of the technology
Index Terms—Clock recovery, half-rate CDR, optical communi- as well as the power dissipation and jitter requirements of the
cation, oscillators, phase detectors, PLLs. system.
In a generic CDR circuit, shown in Fig. 1, the phase de-
I. INTRODUCTION tector compares the phase of the incoming data to the phase of
the clock generated by the voltage-controlled oscillator (VCO),

W ITH THE exponential growth of the number of Internet


nodes, the volume of the data transported by its back-
bone continues to rise rapidly. The load of the global Internet
producing an error that is proportional to the phase difference
between its two inputs. The error is then applied to a charge
pump and a low-pass filter so as to generate the oscillator control
backbone is expected to be as high as 11 Tb/s by the year 2005, voltage. The clock signal also drives a decision circuit, thereby
indicating that the required bandwidth must increase by a factor retiming the data and reducing its jitter.
of 50 to 100 every seven years. If attempted in a 0.18- m CMOS technology, the architec-
Among the available transmission media, optical fibers have ture of Fig. 1 poses severe difficulties for 10-Gb/s operation.
the highest bandwidth with the lowest loss, serving as an attrac- Although exploiting aggressive device scaling, the CMOS
tive solution for the Internet backbone. However, the electronic process used in this work provides marginal performance
interface proves to be the bottleneck in designing high-speed for such speeds. For example, even simple digital latches or
optical systems. In order to push the speed of operation be- three-stage ring oscillators fail to operate reliably at these
yond the capabilities of the fabrication processes, a number of rates. These issues make it desirable to employ a “half-rate”
transceivers can be fabricated on the same chip. The input and CDR architecture, where the VCO runs at a frequency equal to
output signals can be carried either over a bundle of fibers, half of the input data rate. The concept of the half-rate clock
or on a single fiber that uses wave-division multiplexing. In has been used in [2]–[5]. However, [2] and [3] incorporate
this scenario, both the power dissipation and the complexity a bang–bang phase detector, possibly creating a large ripple
of each transceiver become critical. While stand-alone building on the control line of the oscillator and hence high jitter. The
blocks of optical transceivers have been built in GaAs and sil- circuit reported in [4] inherently has a smaller output jitter as a
icon bipolar technologies [1], [2], full integration of many trans- result of using a linear phase detector, but it fails to operate at
ceivers makes it desirable to use CMOS technology. speeds above 6 Gb/s in 0.18- m CMOS technology. The circuit
This paper describes the design of the first 10-Gb/s CMOS of [5] benefits from a new linear phase detection scheme, but it
clock and data recovery (CDR) circuit. A linear phase detector may not operate properly with certain data patterns.
(PD) is introduced that compares the phase of the incoming data Another critical issue in the architecture of Fig. 1 relates to the
with that of a half-rate clock. The CDR circuit also incorpo- inherently unequal propagation delays for the two inputs of the
rates a three-stage interpolating ring oscillator to achieve a wide phase detector: most phase detectors that operate properly with
tuning range. Fabricated in a 0.18- m CMOS technology, the random data (e.g., a D flip-flop) are asymmetric with respect to
circuit achieves an RMS jitter of 1 ps with a pseudorandom se- the data and clock inputs, thereby introducing a systematic skew
quence of while dissipating 72 mW from a 2.5-V supply. between the two in phase-lock condition. Since it is difficult
to replicate this skew in the decision circuit, the generic CDR
Manuscript received August 21, 2000; revised December 1, 2000. This work architecture suffers from a limited phase margin, unless the raw
was supported in part by the Semiconductor Research Corporation and in part speed of the technology is much higher than the data rate.
by Cypress Semiconductor. The problem of the skew demands that phase detection and
The authors are with the Electrical Engineering Department, University of
California, Los Angeles, CA 90095-1594 USA (e-mail: razavi@icsl.ucla.edu). data regeneration occur in the same circuit such that the clock
Publisher Item Identifier S 0018-9200(01)03020-7. still samples the data at the midpoint of each bit even in the
0018–9200/01$10.00 © 2001 IEEE
762 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 1. Generic CDR architecture.

Fig. 3. Effect of clock duty cycle distortion.

Another important aspect of CDR design is the leakage of


data transitions to the oscillator. In Fig. 2, such leakage arises
from: 1) capacitive feedthrough from to in the PD; 2)
capacitive feedthrough from and to through
the multiplexer; and 3) coupling of to the oscillator
through the substrate. To minimize these effects, the VCO is
followed by an isolation buffer and all of the building blocks
incorporate fully differential topologies.

III. BUILDING BLOCKS


Fig. 2. Half-rate CDR architecture.
A. VCO
presence of a finite skew. For example, the Hogge PD [6] auto- The design of the VCO directly impacts the jitter performance
matically sets the clock phase to the optimum point in the data and the reproducibility of the CDR circuit. While LC topolo-
eye (but it fails to operate properly with a half-rate clock). gies achieve a potentially lower jitter, their limited tuning range
The above considerations lead to the CDR architecture shown makes it difficult to obtain a target frequency without design
in Fig. 2. Here, a half-rate PD produces an error proportional to and fabrication iterations. Since the circuit reported here was
the phase difference between the 10-Gb/s data stream and the our first design in 0.18- m technology, a ring oscillator was
5-GHz output of the VCO. Furthermore, the PD automatically chosen so as to provide a tuning range wide enough to encom-
retimes and demultiplexes the data, generating two 5-Gb/s pass process and temperature variations.
sequences and . Although the focus of this work A three-stage differential ring oscillator [Fig. 4(a)] driving a
is point-to-point communications, a full-rate retimed output, buffer operates no faster than 7 GHz in 0.18- m CMOS tech-
, is also generated to produce flexibility in testing and nology. The half-rate CDR architecture overcomes this limita-
exercise the ultimate speed of the technology. The VCO has tion, requiring a frequency of only 5 GHz.
both fine and coarse control lines, the latter allowing inclusion As shown in Fig. 4(b), each stage consists of a fast and a slow
of a frequency-locked loop in future implementations. path whose outputs are summed together. By steering the cur-
In this work, a new approach to performing linear phase de- rent between the fast and the slow paths, the amount of delay
tection using a half-rate clock is described. Owing to its sim- achieved through each stage and hence the VCO frequency can
plicity, this technique achieves both high speed and low power be adjusted. All three stages in the ring are loaded by identical
dissipation while minimizing the ripple on the oscillator control buffers to achieve equal rise and fall times and thus improve
voltage. the jitter performance. Fig. 4(c) shows the transistor implemen-
It is interesting to note that half-rate architectures do suffer tation of each delay stage. The fast and slow paths are formed
from one drawback: the deviation of the clock duty cycle from as differential circuits sharing their output nodes. The tuning
50% translates to bimodal jitter. As depicted in Fig. 3, since both is achieved by reducing the tail current of one and increasing
clock edges sample the data waveform, the clock duty cycle dis- that of the other differentially. Since the low supply voltage
tortion pushes both edges away from the midpoint of the bits. makes it difficult to stack differential pairs under – and
Typical duty cycle correction techniques used at lower speeds – , the current variation is performed through mirror ar-
are difficult to apply here as they suffer from significant dy- rangements driven by pMOS differential pairs. Fig. 5 depicts
namic mismatches themselves. Thus, special attention is paid the small-signal gain and phase response of each delay stage.
to symmetry in the layout to minimize bimodal jitter. While providing a phase shift of 60 , each stage achieves a gain
SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT 763

Fig. 5. Small-signal gain and phase response of each delay stage.

Fig. 4. (a) Three-stage ring oscillator. (b) Implementation of each stage. (c)
Transistor-level schematic.

of 5.5 dB at 5 GHz, yielding robust oscillation at the target fre-


quency.
A critical drawback of supply scaling in deep-submicron
technologies is the inevitable increase in the VCO gain for
a given tuning range. To alleviate this difficulty, the control
of the VCO is split between a coarse input and a fine input.
The partitioning of the control allows more than one order of
magnitude reduction in the VCO sensitivity. The idea is that the
fine control is established by the phase detector and the coarse
control is a provision for adding a frequency detection loop.
The coarse control is provided externally in this prototype.
The fine control exhibits a gain of 150 MHz/V and the coarse
control, 2.5 GHz/V (Fig. 6). The tuning range is 2.7 GHz
( 54%).

B. Phase Detector
Fig. 6. VCO gain partitioning. (a) Fine control. (b) Coarse control.
Phase detectors generally appear in two different forms. Non-
linear PDs coarsely quantize the phase error, producing only a
positive or negative value at their output. Linear PDs, on the the phase error is obtained by taking the difference between the
other hand, generate a linearly proportional output that drops to width of two pulses, both of which are generated whenever a
zero when the loop is locked. data transition occurs. The width of one of the pulses is linearly
Compared to nonlinear PDs, linear PDs result in less charge proportional to the phase difference between the clock and the
pump activity, smaller ripple on the oscillator control line, and data, whereas the width of the other is constant. By using a dif-
hence lower jitter. In a linear PD, such as that described in [6], ferential error signal, pattern dependency of phase error is can-
764 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 8. Symmetric XOR gate.

riodic. The random nature of the data and the periodic behavior
of the clock in fact make the average value of Error pattern de-
pendent. For this reason, a reference signal must also be gen-
erated whose average conveys this dependence. The two wave-
forms and contain the samples of the data at the rising
and falling edges of the clock. Thus, contains pulses as
wide as half the clock period for every data transition, serving
as the reference signal.
While the two XOR operations provide both the Error and the
Reference pulses for every data transition, the pulses in Error
are only half as wide as those in Reference. This means that
the amplitude of Error must be scaled up by a factor of two
Fig. 7. (a) Phase detector. (b) Operation of the circuit.
with respect to Reference so that the difference between their
averages drops to zero when clock transitions are in the middle
celled because both pulses are present only when a data transi- of the data eye. The phase error with respect to this point is then
tion occurs. linearly proportional to the difference between the two averages.
For linear phase comparison between data and a half-rate In order to generate a full-rate output, the demultiplexed se-
clock, each transition of the data must produce an “error” pulse quences are combined by a multiplexer that operates on the
whose width is equal to the phase difference. Furthermore, to half-rate clock as well. This output can also be used for testing
avoid a dead zone in the characteristics, a “reference” pulse purposes in order to obtain the overall bit-error rate (BER) of
must be generated whose area is subtracted from that of the error the receiver.
pulse, thus creating a net value that falls to zero in lock. It is important to note that the XOR gates in Fig. 7 must be
The above observations lead to the PD topology shown in symmetric with respect to their two differential inputs. Oth-
Fig. 7(a). The circuit consists of four latches and two XOR gates. erwise, differences in propagation delays result in systematic
The data is applied to the inputs of two sets of cascaded latches, phase offsets. Each of the XOR gates is implemented as shown
each cascade constituting a flipflop that retimes the data. Since in Fig. 8 [7]. The circuit avoids stacking stages while providing
the flipflops are driven by a half-rate clock, the two output se- perfect symmetry between the two inputs. The output is single-
quences and are the demultiplexed waveforms of ended but the single-ended Error and Reference signals pro-
the original input sequence if the clock samples the data in the duced by the two XOR gates in the phase detector are sensed with
middle of the bit period. respect to each other, thus acting as a differential drive for the
The operation of the PD can be described using the wave- charge pump. The operation of the XOR circuit is as follows. If
forms depicted in Fig. 7(b). The basic unit employed in the cir- the two logical inputs are not equal, then one of the input tran-
cuit is a latch whose output carries information about the zero sistors on the left and one of the input transistors on the right
crossings of both the data and the clock. The output of each turn on, thus turning off. If the two inputs are identical,
latch tracks its input for half a clock period and holds the value one of the tail currents flows through . Since the average
for the other half, yielding the waveforms shown in Fig. 7(b) for current produced by the Error XOR gate is half of that generated
points and . The two waveforms differ because their cor- by the Reference XOR gate, transistor is scaled differently,
responding latches operate on opposite clock edges. Produced making the average output voltages equal for zero phase differ-
as , the Error signal is equal to ZERO for the portion of ence. Channel length modulation of transistor reduces the
time that identical bits of and overlap, and equal to the precision of current scaling between the two XOR gates. This ef-
XOR of two consecutive bits for the rest. In other words, Error fect can be avoided by increasing the length of the device.
is equal to ONE only if a data transition has occurred. The gain of the PD is determined by the value of the resistor
It may seem that the Error signal uniquely represents the and the tail current sources ( ). The voltage is gener-
phase difference, but that would be true only if the data were pe- ated on chip in order to track the variations over temperature and
SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT 765

Fig. 10. Charge pump and loop filter.

condition. Thus, their lengths and widths are relatively large to


minimize these effects.
The design of the loop filter is based on a linear time-invariant
model of the loop and is performed in continuous time domain.
The loop is in general a nonlinear time-variant system and can
only be assumed linear if the phase error is small. The time-
Fig. 9. Determination of PD gain. invariant analysis is valid if the averaging behavior of the loop
rather than its single-cycle performance is of interest, i.e., the
loop can be analyzed by continuous-time approximation if the
process. This voltage equals the output common-mode level of
loop bandwidth is small. Under this condition, the state of the
the latches preceding the XOR gate. It is generated using a dif-
CDR changes by only a small amount on each cycle of the input
ferential pair that is a replica of the preamplifier section of the
signal.
latch. Current source raises the common-mode level of the
A low-pass jitter transfer function with a given bandwidth
differential signal formed by the Error and Reference signals,
and a maximum gain in the passband is specified for a SONET
making compatible with the input of the charge pump.
system. The closed-loop transfer function of the CDR has a zero
It is instructive to plot the input/output characteristic of the
at a frequency lower than the first closed-loop pole. This results
PD to ensure linearity and absence of dead zone. This is accom-
in jitter peaking that can never be eliminated. But the peaking
plished by obtaining the average values of Error and Reference
can be reduced to negligible levels by overdamping the loop.
while the circuit operates at maximum speed. Fig. 9 shows the
As derived in [8], the closed-loop unity-gain bandwidth is
simulated behavior as the phase difference varies from zero to
approximated as
one bit period. The Reference average exhibits a notch where
the clock samples the metastable points of the data waveform. (1)
The Error and Reference signals cross at a phase difference ap-
proximately 55 ps from the metastable point, indicating that the where and are the gains of the VCO and PD, re-
systematic offset between the data and the clock is very small. spectively, and denotes the conversion gain of the charge
The linear characteristic of the phase detector results in minimal pump. Equation (1) can be used to determine the value of .
charge pump activity and small ripple on the control line in the The amount of the jitter peaking in the closed-loop transfer func-
locked condition. tion can be approximated as
The choice of the logic family used for the XOR gates and
the latches is determined by the speed and switching noise con- (2)
siderations. While rail-to-rail CMOS logic achieves relatively
high speeds, it requires amplifying the data swings generated by Equation (2) yields the required value of . In order to obtain
the stage preceding the CDR circuit (typically a limiting ampli- greater suppression of high-frequency jitter, a second capacitor
fier). Furthermore, CMOS logic produces enormous switching is added in parallel with the series combination of and .
noise in the substrate and on the supplies, disturbing the oscil- These components are added externally to achieve flexibility in
lator considerably. For these reasons, the building blocks em- defining the closed-loop characteristics of the circuit.
ploy current-steering logic. The phase detector incorporates an Another advantage of linear PDs over their bang–bang coun-
input buffer with on-chip resistive matching. terparts is that their jitter transfer characteristics is independent
of the jitter amplitude. It should also be mentioned that if the
C. Charge Pump and Loop Filter CDR is followed by a demultiplexer, the tight specifications for
Fig. 10 shows the implementation of the differential charge jitter peaking need not to be satisfied because such specifica-
pump. The common-mode feedback (CMFB) circuit senses the tions are defined for cascaded regenerators handling full-rate
output CM level by and , providing correction through data.
and . Both the matching and channel-length modulation Fig. 11 depicts the simulated behavior of the CDR circuit at
of – in Fig. 10 impact the residual phase error in locked the transistor level. The voltage across the filter is initialized to
766 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 11. Lock acquisition.

Fig. 13. (a) Spectrum of the recovered clock. (b) Recovered clock in the time
domain.

Fig. 12. Chip photograph.

a value relatively close to its value in phase lock. The loop goes
through a transition of 350 ns before it locks. The ripple on the
control line in phase lock is approximately 1 mV.

IV. EXPERIMENTAL RESULTS


The CDR circuit has been fabricated in a 0.18- m CMOS
process. Fig. 12 shows a photograph of the chip, which occu-
pies an area of mm . Electrostatic discharge (ESD)
protection diodes are included for all pads except the high-speed
lines. Nonetheless, since all of these lines have a 50- termina-
tion to , they exhibit some tolerance to ESD. The circuit is
tested in a chip-on-board assembly. In this prototype, the width
of the poly resistors was not sufficient to guarantee the nominal
sheet resistance. As a result, the fabricated resistor values de- Fig. 14. Measured jitter transfer characteristic.
viated from their nominal value by 30%, and the VCO center
frequency was proportionally lower than the simulated value at noise at 1-MHz offset is approximately equal to 106 dBc/Hz.
the nominal supply voltage (1.8 V). The supply was increased to Fig. 13(b) depicts the recovered clock in the time domain. The
2.5 V, to achieve reliable operation at 10 Gb/s. While such a high time-domain measurements using an oscilloscope overestimate
supply voltage creates hot-carrier effects in rail-to-rail CMOS the jitter, requiring specialized equipment, e.g., the Anritsu
circuits, it is less detrimental in this design because no transistor MP1777 jitter analyzer. The jitter performance of the CDR
in the circuit experiences a gate–source or drain–source voltage circuit is characterized by this analyzer. A random sequence
of more than 1 V. This issue is nonetheless resolved in a second of length produces 14.5 ps of peak-to-peak and 1 ps
design [9] by proper choice of resistor dimensions. The circuit of RMS jitter on the clock signal. These values are reduced to
is brought close to lock with the aid of the VCO coarse control 4.4 and 0.6 ps, respectively, for a random sequence of length
before phase locking takes over. .
Fig. 13(a) shows the spectrum of the clock in response to a The measured jitter transfer characteristics of the CDR is
10-Gb/s data sequence of length . The effect of the noise shown in Fig. 14. The jitter peaking is 1.48 dB and the 3-dB
shaping of the loop can be observed in this spectrum. The phase bandwidth is 15 MHz. The loop bandwidth can be reduced to
SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT 767

supply. The VCO, the PD, and the clock and data buffers con-
sume 20.7, 33.2, and 18.1 mW, respectively.

V. CONCLUSION
CMOS technology holds great promise for optical communi-
cation circuits. The raw speed resulting from aggressive scaling
along with high levels of integration provide a high performance
at low cost. A 10-Gb/s clock and data recovery circuit designed
in 0.18- m CMOS technology performs phase locking, data re-
generation, and demultiplexing with 1 ps of RMS jitter.

REFERENCES
[1] Y. M. Greshishchev and P. Schvan, “SiGe clock and data recovery IC
with linear type PLL for 10-Gb/s SONET application,” in Proc. 1999
Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1999, pp.
169–172.
[2] M. Wurzer et al., “40-Gb/s integrated clock and data recovery circuit in
a silicon bipolar technology,” in Proc. 1998 Bipolar/BiCMOS Circuits
and Technology Meeting, Sept. 1998, pp. 136–139.
[3] M. Rau et al., “Clock/data recovery PLL using half-frequency clock,”
IEEE J. Solid-State Circuits, vol. 32, pp. 1156–1159, July 1997.
[4] K. Nakamura et al., “A 6 Gb/s CMOS phase detecting DEMUX module
using half-frequency clock,” in Dig. Symp. VLSI Circuits, June 1998, pp.
196–197.
[5] E. Mullner, “A 20-Gb/s parallel phase detector and demultiplexer circuit
in a production silicon bipolar technology with f = 25 GHz,” in Proc.
1996 Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1996,
pp. 43–45.
[6] C. Hogge, “A self-correcting clock recovery circuit,” J. Lightwave
Fig. 15. (a) Recovered demultiplexed data. (b) Recovered full-rate data. Technol., vol. LT-3, pp. 1312–1314, Dec. 1985.
[7] B. Razavi, Y. Ota, and R. G. Swarz, “Design techniques for low-voltage
high-speed digital bipolar circuits,” IEEE J. Solid-State Circuits, vol. 29,
the SONET specifications, but the jitter analyzer must then gen- pp. 332–339, Mar. 1994.
erate large jitter and drives the loop out of lock. The loop band- [8] L. M. De Vito, “A versatile clock recovery architecture and monolithic
implementation,” in Monolithic Phase-Locked Loops and Clock
width can be reduced to the SONET specifications if a means of Recovery Circuits, Theory and Design, B. Razavi, Ed. New York:
frequency detection is added to the loop [9]. The circuit is then IEEE Press, 1996.
much less susceptible to loss of lock due to the jitter generated [9] J. Savoj and B. Razavi, “A 10-Gb/s CMOS clock and data recovery cir-
cuit with frequency detection,” in Int. Solid-State Circuits Conf. Dig.
by the analyzer. Tech. Papers, Feb. 2001, pp. 78–79.
Fig. 15 depicts the retimed data. The demultiplexed data
outputs are shown in Fig. 15(a). The difference between the
waveforms results from systematic differences between the
bond wires and traces on the test board. Fig. 15(b) depicts the Jafar Savoj (S’98) was born in Tehran, Iran, in 1974.
He received the B.S.E.E. degree from Sharif Univer-
full-rate output. Using this output, the BER of the system can sity of Technology, Tehran, in 1996 and the M.S.E.E.
be measured. With a random sequence of , the BER is degree from the University of California, Los An-
smaller that . However, a random sequence of geles (UCLA), in 1998. He is currently working to-
ward the Ph.D. degree at UCLA.
results in a BER of . This BER can be reduced if He spent the summer of 1998 with Integrated
the bandwidth of the output buffer driving the 10-Gb/s data is Sensor Solutions, San Jose, CA, working on the
increased. Furthermore, if the value of the linear resistors is design of high-precision interfaces for sensor appli-
cations. During the summer of 1999, he was with
adjusted to their nominal value, the increased operating speed NewPort Communications, Irvine, CA, developing
of the back-end multiplexer results in an improved BER [9]. CMOS clock and data recovery circuits for the SONET OC-192 standard.
The CDR circuit exhibits a capture range of 6 MHz and a Mr. Savoj received the IEEE Solid-State Circuits Society Predoctoral Fellow-
ship for 2000–2001, and the Beatrice Winner Award for Editorial Excellence at
tracking range of 177 MHz. The total power consumed by the the 2001 ISSCC. He is also a recipient of the Design Contest Award of the 2001
circuit excluding the output buffers is 72 mW from a 2.5-V Design Automation Conference.
ISSCC 2001 / SESSION 5 / GIGABIT OPTICAL COMMUNICATIONS I / 5.3

5.3 A 10Gb/s CMOS Clock and Data Recovery Circuit The phase detector operates at high speeds because it uses a
half-rate clock. Since in the locked condition, the rising and
with Frequency Detection falling edges of the quadrature clock coincide with data transi-
tions, the in-phase clock transitions sample the data at its opti-
Jafar Savoj, Behzad Razavi
mum point with no systematic offset, generating a full-rate out-
put stream. Also, since the phase-error signal is reevaluated only
Electrical Engineering Department, University of California, Los Angeles, CA
at data transitions, it incurs little ripple. Note that the output is
independent of the data transition density, resulting in reduction
Clock and data recovery (CDR) circuits operating in the 10Gb/s
of pattern-dependent jitter.
range have become attractive for the optical fiber backbone of
the Internet. While CDR circuits operating at 10Gb/s have been
With the small CDR loop bandwidths specified by optical stan-
designed in bipolar technologies, cost and integration issues
dards, circuits employing only phase detection suffer from an
make it desirable to implement these circuits in standard CMOS
extremely narrow capture range, e.g., about 1% of the center fre-
processes. This 10Gb/s CDR circuit is realized in 0.18µm CMOS
quency. For this reason, a means of frequency detection is neces-
technology. Architecture and circuit techniques circumvent the
sary to guarantee lock to random data. As with other phase
speed limitations of the devices. In contrast to previous work [1],
detectors, the half-rate PD of Figure 5.3.3 generates a beat fre-
this design incorporates an LC oscillator to reduce the jitter as
quency equal to the difference between the data rate and twice
well as a phase/frequency detector to achieve a wide capture
the VCO frequency. However, it does not provide knowledge of
range.
the polarity of this difference. Figure 5.3.4 depicts the half-rate
phase and frequency detector introduced in this work. A second
Shown in Figure 5.3.1, the CDR consists of a phase/frequency
PD is added and driven by phases that are 45° away from those
detector (PFD), a voltage-controlled oscillator (VCO), a charge
in the first PD. The circuit operates as follows. (1) If the clock is
pump, and a low-pass filter (LPF). The PFD compares the phase
slow, VPD1 leads VPD2; therefore, if VPD2 is sampled by the rising
and frequency of the input data to that of a half-rate clock, pro-
and falling edges of VPD1, the results are negative and positive,
viding two binary error signals for phase and frequency. The
respectively. (2) If the clock is fast, VPD1 lags VPD2. Therefore, if
PFD is designed so that, in addition to providing information
VPD2 is sampled by the rising and falling edges of VPD1, the
about the phase error, it retimes the data as well. Consequently,
results are the reverse of the previous case.
the CDR exhibits no systematic offset, i.e., inherent skews
between clock and data edges due to their unidentical paths
The output buffer delivering the 10Gb/s retimed data with high
through the loop do not degrade the quality of detection. The
current levels requires a bandwidth of more than 7 GHz. As
VCO provides four differential half-quadrature phases over the
shown in Figure 5.3.5, the buffer stage employs inductive peak-
full tuning range. All building blocks are fully differential.
ing [3]. The value of the spiral inductors is chosen so as to avoid
ripple in the passband. Since the quality factor of the inductors
Since the half-rate frequency detector requires clock phases that
is not critical here, the spiral structures have a linewidth of only
are integer multiples of 45°, the 5GHz VCO is designed as a ring
4µm to achieve a high self-resonance frequency.
structure consisting of four LC-tuned stages [Figure 5.3.2a]. If the
dc feedback around the ring is positive, all stages operate in-phase
The CDR circuit is fabricated in a 0.18µm CMOS technology. The
at the resonance frequency defined by the LC tanks. On the other
circuit is tested in a chip-on-board assembly while operating
hand, if the dc feedback is negative, the frequency shifts by a small
with a 1.8V supply. The phase noise of the clock in response to a
amount so as to allow each stage to contribute 45° of phase.
9.95328Gb/s data sequence of length 223-1 at 1MHz offset is
approximately equal to -107dBc/Hz. Figure 5.3.6a depicts the
The oscillator topology has two advantages over resistive-load ring
recovered clock and data. A pseudo-random sequence of length
oscillators. First, owing to the phase slope (Q) provided by the res-
223-1 produces 9.9ps of peak-to-peak and 0.8ps rms jitter on the
onant loads, it exhibits less phase noise. Second, its frequency of
clock signal. The jitter characteristics are measured by the
oscillation is only a weak function of the number of stages, gener-
Anritsu MP1777 jitter analyzer. The measured jitter transfer
ating multiple phases with no speed penalty. By comparison, a
characteristic of the CDR is shown in Figure 5.3.6b. The jitter
four-stage resistive-load ring operates at a lower frequency.
peaking is 0.04dB and the 3dB bandwidth is 5.2MHz. Despite
the small loop bandwidth, the frequency detector provides a cap-
Figure 5.3.2b shows the implementation of each stage. The loads
ture range of 1.43GHz, obviating the need for external refer-
are formed using on-chip spiral inductors and MOS varactors.
ences. The total power consumed by the circuit excluding the
Resistor R1 provides a shift in the output common-mode level,
output buffers is 91mW from a 1.8V supply. Figure 5.3.7 shows a
allowing both positive and negative voltages across the varactors
micrograph of the chip, which occupies 1.75x1.55mm2.
and thus maximizing the tuning range. Modeling each tank by a
parallel network, the required 45° phase shift slightly detunes Acknowledgments:
the circuit. The oscillation frequency is given by ω0=(LC)-0.5(1- The authors thank NewPort Communications for fabrication and test sup-
1/Q0)0.5, where Q0 denotes the Q of each stage at resonance. port. This work was supported by SRC and Cypress Semiconductor.

The phase detector (PD) is derived from the data transition References:
[1] J. Savoj and B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery
tracking loop described in Reference 2. In this PD, in-phase and Circuit,” Dig. of Symposium on VLSI Circuits, pp. 136-139, June 2000.
quadrature phases of a half-rate clock signal sample the data in [2] A. W. Buchwald, Design of Integrated Fiber-Optic Receivers Using
two double-edge-triggered flipflops (DETFFs). Figure 5.3.3 Heterojunction Bipolar Transistors, Ph.D. Thesis, University of California,
shows the implementation of the PD. Two latches operating on Los Angeles, Jan. 1993.
opposite clock phases and a multiplexer form a DETFF that sam- [3] J. Savoj and B. Razavi, “A CMOS Interface Circuit for Detection of 1.2-
Gb/s RZ Data,” ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999.
ples the data using both the positive and negative transitions of
a half-rate clock. The two signals V1 and V2 are therefore the in-
phase and quadrature samples of data, respectively, and one is
used to route the other or its complement.

• 2001 IEEE International Solid-State Circuits Conference 0-7803-6608-5 ©2001 IEEE


ISSCC 2001 / February 5, 2001 / Salon 9 / 2:30 PM

Figure 5.3.2: (a) Four-stage LC-tuned ring oscillator, (b) implementation of


Figure 5.3.1: CDR architecture. one stage.

Figure 5.3.3: Phase detector. Figure 5.3.4: Phase and frequency detector.

Figure 5.3.5: Output buffer. Figure 5.3.7: Die micrograph.

• 2001 IEEE International Solid-State Circuits Conference 0-7803-6608-5 ©2001 IEEE


Figure 5.3.6: (a) Recovered data and clock, (b) measured jitter transfer
characteristics.

• 2001 IEEE International Solid-State Circuits Conference 0-7803-6608-5 ©2001 IEEE


1320 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

A 40-Gb/s Integrated Clock and Data Recovery


Circuit in a 50-GHz Silicon Bipolar Technology
Martin Wurzer, Josef Böck, Herbert Knapp, Wolfgang Zirwas, Fritz Schumann, and Alfred Felder

Abstract—Clock and data recovery (CDR) circuits are key elec-


tronic components in future optical broadband communication
systems. In this paper, we present a 40-Gb/s integrated CDR
circuit applying a phase-locked loop technique. The IC has been
fabricated in a 50-GHz f T self-aligned double-polysilicon bipolar
technology using only production-like process steps. The achieved
data rate is a record value for silicon and comparable with the
best results for this type of circuit realized in SiGe and III–V
technologies.
Index Terms— Bipolar digital integrated circuits, clocks, data
communication, high-speed integrated circuits, phase-locked
loops, synchronization.

Fig. 1. Block diagram of the fiber-optic link.


I. INTRODUCTION

T HE demands for new services and increased flexibility


have accelerated the development of telecommunication
transport networks, which has resulted in the synchronous op-
difficult to integrate, and narrow pulses require a high . The
major advantages of the second approach are that the phase
between the extracted clock and the received data is locked and
tical networks (SONET)/synchronous digital hierarchy (SDH) that it can be implemented as a monolithic integrated circuit.
standards. Key elements of such high-capacity networks are The goal of our work was to implement a cost-effective
fiber-optic communication links. Time-division multiplexing and reliable clock and data recovery circuit for 40 Gb/s in
(TDM) systems operating at 10 Gb/s are now under devel- a production-near silicon bipolar technology. Therefore, an
opment using advanced silicon bipolar production technolo- approach based on a PLL technique has been selected and
gies to fabricate all high-speed IC’s. Next generations with will be described in this paper in more detail.
SONET/SDH are expected to operate at data rates of 40 Gb/s
[1]. To enable such large-capacity optical transmission systems II. FIBER-OPTIC LINK
to be put into practical use, very high-speed monolithic IC’s
The described circuit has been developed for use in 40-
are required as key components.
Gb/s TDM fiber-optic links. A block diagram of such a link is
It has been shown that basic digital functions like MUX
shown in Fig. 1. The time-division multiplexer collects several
and DMUX for 40-Gb/s optical-fiber TDM systems can be
data channels into a single high-speed data stream. An external
realized in silicon bipolar technology [2]. But clock and data
modulator converts the data from electrical to optical signals
recovery circuits in a silicon technology have so far only been
by modulating the light of a semiconductor laser diode (E/O
demonstrated for 20 Gb/s [3]. With more sophisticated SiGe
block). The O/E conversion on the receiving side is performed
or III–V technologies, 40-Gb/s operation has been achieved
by a photodiode followed by a transimpedance amplifier. This
[4]–[7]. Some of these solutions are hybrid.
bitstream is fed into the clock and data recovery unit. Its task is
All these realizations are based on either high-Q filters
to synchronize the local oscillator to the phase of the incoming
or phase-locked loops (PLL’s). The advantage of the first
data and to retime the data. In contrast to 10-Gb/s systems, the
concept is the easy implementation. The disadvantages are
decision function is now performed by a demultiplexer. This
that temperature and frequency variation of filter group delay
requires a DMUX with excellent retiming capability combined
makes sampling time difficult to control, the high-Q filter is
with a high input sensitivity [8].
Manuscript received January 11, 1999; revised March 23, 1999.
M. Wurzer is with Corporate Technology, Microelectronics, Siemens III. ARCHITECTURE OF THE CLOCK
AG, Munich 81730 Germany and the Institut für Nachrichtentechnik AND DATA RECOVERY CIRCUIT
und Hochfrequenztechnik, Technische Universität Wien, Austria (e-mail:
Martin.Wurzer@mchp.siemens.de). Fig. 2(a) shows the used concept of the CDR for the fiber-
J. Böck, H. Knapp, F. Schumann, and A. Felder are with Corporate optic link (Fig. 1) in more detail. The main processing blocks
Technology, Microelectronics, Siemens AG, Munich 81730 Germany are the demultiplexer consisting of two master–slave D-flip-
W. Zirwas is with Information and Communication Networks, Siemens
AG, Munich 81379 Germany. flops (DFF1, DFF2) in parallel and an additional master–slave
Publisher Item Identifier S 0018-9200(99)06493-8. D-flip-flop (DFF3), which forms the phase detector together
0018–9200/99$10.00  1999 IEEE
WURZER et al.: 40-Gb/s INTEGRATED CLOCK AND DATA RECOVERY CIRCUIT 1321

(a) (b)
Fig. 2. CDR circuit: (a) block diagram and (b) timing diagram.

Fig. 3. Circuit diagram of the master–slave D-flip-flop.

with DFF2 and the XOR gate. All these functions are integrated IV. CIRCUIT AND DESIGN PRINCIPLES
in a single chip. The fixed 90 phase shifter, voltage-controlled The circuit is designed for the single supply voltage of 5
oscillator (VCO), and loop filter have been realized externally V. The circuit principles used are seen in the circuit blocks
with commercially available components.
of a master–slave D-flip-flop (MS-DFF), shown in Fig. 3.
Fig. 2(b) shows the timing diagram. The incoming 40-Gb/s
For details, see [9]. The well-proven E CL (emitter–emitter
data signal is applied to flip-flops DFF1, DFF2, and DFF3.
coupled logic) is used with emitter followers at the inputs
DFF1 is toggled by CLK, DFF2 by CLK, and DFF3 by the
and current switches at the outputs. The series gating between
90 delayed clock signal. This results in the sampling of the
clock and data signals enables differential operation with low
input in the vicinity of midbit and each following potential
transition. If a transition is present, the phase relationship of voltage swings ( mV - ) resulting in an increase in
the data and the clock can be deduced to be early or late. If speed and a reduction of power consumption. Furthermore,
the midbit clock CLK is too early, DFF3 samples the same differential operation reduces time jitter and crosstalk and
bit; if it is too late, DFF3 samples the following bit. Under offers good common-mode suppression compared to single-
locked conditions, DFF3 samples at the edge of the data eye. mode operation [10]. Cascaded emitter followers are used
The XOR compares the output samples of DFF2 and DFF3. for level shifting and impedance transformation between the
The result is fed to the loop filter. The output signal of the various current switches. Multiple emitter followers improve
loop filter serves as the control signal of the VCO. the decoupling capability and increase the collector-base volt-
The advantages of this concept are that all components age of the current-switch transistors allowing for smaller
operate at half the data rate and that the input is demultiplexed transistors, resulting in lower collector-base capacitances [10].
at the same time. The disadvantage is that the input signal has On-chip matching resistors (50 ) at all data inputs are used
to drive three DFF’s in parallel. in order to reduce jitter introduced by reflections [2], [11].
1322 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

Fig. 5. Schematic cross section of a transistor.

TABLE I
Fig. 4. Chip micrograph (chip size: 0.9 2 0.9 mm2 ). DEVICE PARAMETERS (JUNCTION CAPACITANCES ARE ZERO-BIAS VALUES)

Meeting the required speed rather than low power con-


sumption was the main aim of this design. All transistor
sizes are individually optimized with respect to the function
of the transistor in the circuit. Extra attention was given to
the on-chip wiring. The lines on the chips were classified as
“critical” or “uncritical.” For example, the lines driven by
emitter followers are critical because they support ringing,
while the lines driven by current switches are uncritical [10].
The critical lines are then shortened at the cost of the uncritical
ones. The longer signal lines are realized as microstrip lines
(with the lowest metallization layer as a ground plane), mainly
to improve simulation accuracy. This leads to the layout shown
in Fig. 4.

V. CHIP TECHNOLOGY
The circuit has been fabricated in a self-aligned double-
polysilicon bipolar technology [12]. The fabrication starts
with buried layer formation. A 1- m epitaxial layer is grown
to compromise between high-transit-frequency and low
external collector-base capacitance . The isolation consists
of a channel stopper implantation combined with LOCOS field
oxide. The active base is formed by 5-keV BF implantation.
This low implantation energy in combination with optimized
annealing conditions allows for very steep base profiles. This
results in a narrow base width of about 50 nm and thus enables
a high transit frequency. A selectively implanted collector Fig. 6. Measured ECL gate delay versus current per gate with an 800-mV
improves the current-carrying capability of the transistors to differential voltage swing.
the high optimum collector current density of about 2
mA/ m . To minimize narrow emitter effects, an in situ doped
VI. MOUNTING AND MEASURMENT SETUP
emitter-polysilicon layer is used [13]. This prevents a reduction
of cutoff frequency even for 0.5- m design rules. A three-level For measurements, the clock and data recovery IC has been
metallization completes the process. mounted on a 15-mil ceramic substrate using
Fig. 5 shows a schematic cross section of a transistor. conventional bonding techniques. Special care has been taken
Except for epitaxy, only process steps of a 0.5- m CMOS to minimize the length of the bond wires by positioning the
production environment are necessary. The maximum transit surface of the chip on the same level as the signal, ground,
frequency of the transistors is GHz at V and supply lines of the mounting substrate. Due to differential
and mA m . Table I summarizes typical parameters operation, a pair of lines for each clock and data signal is
for transistors with effective emitter size of needed to connect the chip with the environment. Therefore,
m . The minimum gate delay for an ECL differentially a corresponding number of connectors are necessary. The
operating ring oscillator with output voltage swing of 800 minimum distance between them determines the minimum
mV - is measured to be 15.4 ps. This value is achieved for size of the test fixture. To avoid additional delay lines, the
a current per gate of 1.6 mA (see Fig. 6). length of the lines for the signals , and , , ,
WURZER et al.: 40-Gb/s INTEGRATED CLOCK AND DATA RECOVERY CIRCUIT 1323

Fig. 9. Eye diagram of the 20-Gb/s data signal at the output D2 of the 1 : 2
demultiplexer.

Fig. 7. Photograph of the package (package size: 70 2 70 mm2 ). (a)

(b)

Fig. 10. (a) Transmitter clock and (b) recovered clock.

Fig. 8. Eye diagram of the 40-Gb/s input data signal Din .

respectively, have to be the same. To achieve a compact layout


of these lines, coupled microstrip lines are used. At the input
, grounded coplanar lines are applied, which show lower
dispersion than microstrip lines. The realized test fixture is
shown in Fig. 7. It measures 70 70 mm .
Random pulse pattern generators for driving the circuit at
the required data rate of 40 Gb/s are not commercially avail-
able. A pulse generator has been built from basic high-speed Fig. 11. Jitter histogram of the recovered clock.
IC’s [2], [14]. Four 10-Gb/s pseudorandom bit sequences
(sequence length 2 1) have been multiplexed to a 40-Gb/s
nonreturn-to-zero signal.

VII. EXPERIMENTAL RESULTS


The clock and data recovery IC operates at the single supply
voltage of 5 V and consumes 1.6 W. It should be mentioned
that no additional cooling was applied. Fig. 8 shows the 40-
Gb/s input signal to the CDR circuit. To demonstrate the
input sensitivity of the circuit, the eye opening is artificially
reduced. In Fig. 9, an eye diagram of the well regenerated and
demultiplexed data signal is shown. Fig. 10 shows in (a) the
20-GHz transmitter clock and in (b) the recovered clock. The Fig. 12. VCO spectra.
jitter histogram of the extracted clock in the time domain is
displayed in Fig. 11. The measured rms time jitter as observed
on the sampling oscilloscope is about 0.8 ps. The measured VIII. CONCLUSION
signal spectra of the VCO are plotted in Fig. 12. The dashed An integrated clock and data recovery circuit operating up
line represents the free-running VCO and the solid line the to 40 Gb/s has been realized in a 0.5- m/50-GHz silicon
VCO phase-locked to the 40-Gb/s data signal shown in Fig. 8. bipolar technology using only production-like process steps.
The peak is about 35 dB above the floor caused by the statistics This data rate is the highest reported value for this type of
of the data. circuit in a silicon technology. This demonstrates that all
1324 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

digital functions necessary for a 40-Gb/s transmission system Josef Böck was born in Straubing, Germany, in
are feasible with silicon bipolar production technologies. 1968. He received the diploma degree in physics and
the Ph.D. degree from University of Regensburg,
Germany, in 1994 and 1997, respectively.
REFERENCES He joined Corporate Research and Development,
Siemens AG, Munich, Germany, in 1993, where
[1] K. Hagimoto, Y. Miyamoto, T. Kataoka, H. Ichino, and O. Nakajima, he first investigated narrow emitter effects in deep
“Twenty-Gbit/s signal transmission using simple high-sensitivity optical submicrometer silicon bipolar devices. His work on
receiver,” in OFC’92 Tech. Dig., Feb. 1992, p. 48. technology development and process integration for
[2] A. Felder, M. Möller, J. Popp, J. Böck, and H.-M. Rein, “46 Gb/s high-speed silicon bipolar transistors resulted in the
DEMUX, 50 Gb/s MUX, and 30 GHz static frequency divider in silicon SIEGET 45 microwave-transistor family. Currently,
bipolar technology,” IEEE J. Solid-State Circuits, vol. 31, pp. 481–486, he is working on process development for Si and SiGe bipolar technologies.
Apr. 1996.
[3] W. Bogner, U. Fischer, E. Gottwald, and E. Müllner, “20 Gbit/s TDM
nonrepeatered transmission over 198 km DSF using Si-bipolar IC for
demultiplexing and clock recovery,” in Proc. ECOC, Sept. 1996, paper
TuD.3.4. Herbert Knapp was born in Salzburg, Austria,
[4] W. Bogner, E. Gottwald, A. Schöpflin, and C.-J. Weiske, “40 Gbit/s un- in 1964. He received the Diplomingenieur degree
repeatered optical transmission over 148 km by electrical time division in electrical engineering from Technical University
multiplexing and demultiplexing,” Electron. Lett., vol. 33, no. 25, pp. Vienna, Austria, in 1997.
2136–2137, Dec. 1997. He joined Corporate Research and Development,
[5] R. Yu, R. Pierson, P. Zampardi, K. Runge, A. Campana, D. Meeker, Siemens AG, Munich, Germany, in 1993, where
K. C. Wang, A. Petersen, and J. Bowers, “Packaged clock recovery he has been involved in the design of integrated
integrated circuits for 40 GBit/s optical communication links,” in GaAs circuits for wireless communications. His current
IC Symp. Tech. Dig., Nov. 1996, pp. 129–132. research interests include the design of high-speed
[6] M. Mokhtari, T. Swahn, R. H. Walden, W. E. Stanchina, M. Kardos, T. and low-power microwave circuits.
Juhola, G. Schuppener, H. Tenhunen, and T. Lewin, “InP-HBT chip-set
for 40-Gb/s fiber optical communication systems operational at 3 V,”
IEEE J. Solid-State Circuits, vol. 32, pp. 1371–1383, Sept. 1997.
[7] M. Lang, Z.-G. Wang, Z. Lao, M. Schlechtweg, A. Thiede, M. Rieger-
Motzer, M. Sedler, W. Bronner, G. Kaufel, K. Köhler, A. Hülsmann,
and B. Raynor, “20–40 Gb/s 0.2-m GaAs HEMT chip set for optical Wolfgang Zirwas received the Diplomingenieur
data receiver,” IEEE J. Solid-State Circuits, vol. 32, pp. 1384–1393, degree in electrical engineering from Technical Uni-
Sept. 1997. versity Munich, Germany.
[8] A. Felder, M. Möller, M. Wurzer, M. Rest, T. F. Meister, and H.-M. He joined Siemens AG, Munich, in 1987. First,
Rein, “60 Gbit/s regenerating demultiplexer in SiGe bipolar technology,” he worked in the field of high-bit-rate fiber-optic
Electron. Lett., vol. 33, no. 23, pp. 1984–1986, Nov. 1997. communication systems. Later, he focused his work
[9] J. Hauenschild, A. Felder, M. Kerber, H.-M. Rein, and L. Schmidt, on broad-band access technologies (xDSL, HFC)
“A 22 Gb/s decision circuit and a 32 Gb/s regenerating demultiplexer for both residential and business users. He is now
IC fabricated in silicon bipolar technology,” in Proc. IEEE BCTM’92, working in the field of broad-band wireless systems.
Sept. 1992, pp. 151–154.
[10] H.-M. Rein and M. Möller, “Design considerations for very-high-speed
Si-bipolar IC’s operating up to 50 Gb/s,” IEEE J. Solid-State Circuits,
vol. 31, pp. 1076–1090, Aug. 1996.
[11] J. Hauenschild and H.-M. Rein, “Influence of transmission-line inter-
connections between Gbit/s IC’s on time jitter and instabilities,” IEEE Fritz Schumann received the Diplomingenieur de-
J. Solid-State Circuits, vol. 25, pp. 763–766, June 1990. gree in electrical engineering from Technical Uni-
[12] J. Böck, A. Felder, T. F. Meister, M. Franosch, K. Aufinger, M. Wurzer, versity Berlin, Germany, in 1981.
R. Schreiter, S. Boguth, and L. Treitinger “A 50 GHz implanted base Subsequently, he worked in the field of RF and
silicon bipolar technology with 35 GHz static frequency divider,” in microwave hybrid circuit and system design for
Symp. VLSI Technology Tech. Dig., June 1996, pp. 108–109. telecommunication and radar applications. In 1992,
[13] J. Böck, M. Franosch, H. Schäfer, H. v. Philipsborn, and J. Popp, “In- he joined the silicon bipolar IC design group, Cor-
situ doped emitter-polysilicon for 0.5 m silicon bipolar technology,” in porate Research and Development, Siemens AG,
Proc. ESSDERC’95, The Hague, the Netherlands, Sept. 1995, pp. 421– Munich, Germany. Since then, he has realized IC’s
424. for wireless and fiber-optic communication systems
[14] M. Möller, H.-M. Rein, A. Felder, and T. F. Meister, “60 Gbit/s time-
up to 60 Gb/s.
division multiplexer in SiGe-bipolar technology with special regard to
mounting and measuring technique,” Electron. Lett., vol. 33, no. 8, pp.
679–680, Apr. 1997.

Alfred Felder was born in Bruneck, South Tyrol, Italy, in 1963. He received
the Diplomingenieur and Ph.D. degrees in electrical engineering from the
Martin Wurzer was born in Innsbruck, Austria, in Technical University Vienna, Austria, in 1989 and 1993, respectively.
1966. He received the Diplomingenieur degree in He joined Corporate Research and Development, Siemens AG, Munich,
electrical engineering from the Technical University Germany, in 1989, where he has been engaged in the development of analog
Vienna, Austria, in 1994, where he is currently and digital high-speed silicon bipolar IC’s for future optical communication
pursuing the Ph.D. degree. systems in the gigabit-per-second range. From 1996 to 1998, he was Manager
He joined Corporate Research and Development, of the Technology Department of Siemens K.K. The department is the liaison
Siemens AG, Munich, Germany, in 1994, where office of the Corporate Technology of Siemens AG in Japan, responsible
he has been engaged in the development of digital for the cooperation with Japanese companies in research. Since 1998, he
high-speed silicon bipolar IC’s for future optical has been heading the business operation Signal Processing & Control within
communication systems in the gigabit-per-second the Siemens Semiconductor Group in Japan and has been responsible for
range. marketing of microcontrollers and digital signal processors.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001 1937

A Fully Integrated 40-Gb/s Clock and Data Recovery


IC With 1:4 DEMUX in SiGe Technology
Mario Reinhold, Claus Dorschky, Eduard Rose, Rajasekhar Pullela, Peter Mayer, Frank Kunz,
Yves Baeyens, Member, IEEE, Thomas Link, and John-Paul Mattia

Abstract—In this paper, a fully integrated 40-Gb/s clock Additionally, 40-Gb/s TDM will become more cost effective,
and data recovery (CDR) IC with additional 1:4 demulti- as the number of optical ports is reduced by a factor of 4 com-
plexer (DEMUX) functionality is presented. The IC is im- pared to 10-Gb/s TDM, resulting in fewer price-determining op-
plemented in a state-of-the-art production SiGe process. Its
phase-locked-loop-based architecture with bang-bang-type phase tical components, smaller system footprint, and reduced main-
detector (PD) provides maximum robustness. To the authors’ best tenance costs.
knowledge, it is the first 40-Gb/s CDR IC fabricated in a SiGe Regarding next-generation 40-Gb/s TDM links, the clock
heterojunction bipolar technology (HBT). The measurement re- and data recovery (CDR) IC is a key electronic component,
sults demonstrate an input sensitivity of 42-mV single-ended data
input swing at a bit-error rate (BER) of 10 10 . As demonstrated which strongly determines the overall transmission perfor-
in optical transmission experiments with the IC embedded in a mance. 40-Gb/s TDM designs must be architecturally robust
40-Gb/s link, the CDR/DEMUX shows complete functionality as and manufacturable to compete with 10-Gb/s TDM systems.
a single-chip-receiver IC. A BER of 10 10 requires an optical Accordingly, a fully integrated phase-locked loop (PLL)-based
signal-to-noise ratio of 23.3 dB.
approach with self-aligning bang-bang phase detector (PD) is
Index Terms—Bang-bang, BER, CDR, clock and data recovery, employed in this work. The IC is fabricated in a production
demultiplexer, DEMUX, dynamic frequency divider, jitter gener- state-of-the-art SiGe heterojunction bipolar technology (HBT)
ation, jitter tolerance, limiting amplifier, OSNR, phase detector,
phase-locked loop, PLL, SiGe, VCO. which provides advantages with respect to the achievable level
of integration, yield, cost-effectiveness, and process stability
compared to III-V process technologies.
I. INTRODUCTION

T ODAY’S commercially available highest capacity optical


transmission systems are based on multiple 10-Gb/s time-
division multiplexing (TDM) channels. These systems are ex-
II. CLOCK AND DATA RECOVERY EMBEDDED
THE OPTICAL LINK
IN

pected to be insufficient to meet the rapidly increasing demands The 40-Gb/s TDM optical link employs a 4:1 multiplexing
for higher bandwidth in the foreseeable future. scheme, as shown in the block diagram in Fig. 1. At the re-
The economically achievable transmission capacity of these ceiver, the incoming optical signal is first amplified by an op-
wavelength-division multiplexing (WDM) systems is currently tical preamplifier (OA), converted into electrical pulses by the
limited to 1.6 Tb/s, assuming 160 parallel 10-Gb/s TDM chan- photo diode, and then directly feeds the CDR/DEMUX. Data
nels in the C- and L-band at a channel spacing of 50 GHz. recovery is accomplished by the first 1:2 DEMUX. In the PLL-
This corresponds to a spectral efficiency of 0.2 (b/s)/Hz. By based clock recovery approach presented here, the PD output
increasing the channel bit rate to 40-Gb/s per TDM channel, forces the receive-side voltage-controlled oscillator (VCO) to
the fiber capacity can be better utilized. With the spectral ef- track the phase of the incoming data signal. The combination of
ficiency increased to 0.4 (b/s)/Hz, the total transmission capa- the PD and 1:2 DEMUX function allows the use of a 20-GHz
bility is 3.2 Tb/s, assuming 80 parallel 40-Gb/s channels and half-bit-rate clock. This half-bit-rate architecture is explained in
100-GHz channel spacing. more detail in Section IV.
This paper focuses on the CDR/DEMUX IC. However, the
remaining basic functions such as the 4:1 multiplexer (MUX)
Manuscript received March 26, 2001; revised July 15, 2001. and the driver IC have also been realized in this work program in
M. Reinhold, C. Dorschky, E. Rose, and F. Kunz were with Lucent Technolo- SiGe HBT and GaAs high-electron-mobility transistor (HEMT)
gies, Optical Networking Group, D-90411 Nürnberg, Germany. They are now technology, respectively.
with CoreOptics GmbH, D-90411 Nürnberg, Germany (e-mail: mario@coreop-
tics.com or mario_reinhold@gmx.de).
R. Pullela was with Lucent Technologies, Bell Labs, Murray Hill, NJ. He is
now with Gtran Inc., Westlake Village, CA 91362 USA. III. PROCESS TECHNOLOGY
P. Mayer and T. Link are with Lucent Technologies, Optical Networking
Group, D-90411 Nürnberg, Germany. The CDR/DEMUX IC presented in this work was designed
Y. Baeyens is with Lucent Technologies, Bell Labs, Murray Hill, NJ 07974 in a state-of-the-art SiGe HBT with 72-GHz and 74-GHz
USA. [1]. SiGe HBT provides superiority for a high level of in-
J.-P. Mattia was with Lucent Technologies, Bell Labs, Murray Hill, NJ 07974
USA. He is now with Big Bear Networks, Sunnyvale, CA 94086 USA. tegration compared to III-V technologies. The process features
Publisher Item Identifier S 0018-9200(01)09325-8. four metal layers in total including a thick metal layer on top.
0018–9200/01$10.00 © 2001 IEEE
1938 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 1. Clock and data recovery IC embedded in the optical link.

Fig. 2. CDR/DEMUX architecture.

Small-scale integrated analog and digital building blocks im- 1:2 DEMUX functions allows the use of a half-bit-rate clock
plemented in this process have been demonstrated for 40-Gb/s running at 20 GHz.
operation [2], [3]. The three upper eye diagrams in Fig. 3 illustrate the basic
principle of a common bang-bang PD. For the basic realization,
IV. CDR/DEMUX ARCHITECTURE three samples of the incoming data signal are necessary.
The half-bit-rate architecture of the CDR/DEMUX IC (Fig. 2) In locked condition, and sample two consecutive bits
is based on the concept reported in [4] and has already demon- while samples the data transition, as indicated in the first eye
strated its functionality for 10-Gb/s applications. diagram.
The nonlinear bang-bang PD described in [5] was modified In this implementation, two modifications to the common
for interlaced operation. The combination of the PD and the bang-bang PD are made. First, the 1:2 DEMUX and phase-
REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC 1939

each other, resulting in a minimum lead time of the high-speed


phase-control loop.
Two variants of the CDR/DEMUX exist. The higher integra-
tion variant of the CDR/DEMUX (CDR/DEMUX with VCO)
also contains an on-chip 40-GHz VCO, the proportional filter
of the PLL, and a frequency detector (FD) working as a low-
speed frequency acquisition aid, whereas in the lower integra-
tion variant (CDR/DEMUX without VCO), these parts are ex-
ternal to the IC.

V. BUILDING BLOCKS
The most challenging building blocks of the CDR/DEMUX
are the 40-GHz 2:1 frequency divider, the 40-GHz VCO, the
limiting data input amplifier and the transition sampling latches,
Fig. 3. PD principle. including the four-phase 20-GHz clock tree.

A. 2:1 Dynamic Frequency Divider


detection function are combined so that a half-bit-rate clock
Since the PD requires a 20-GHz four-phase clock, those
is used. However, for sampling in the bit transition, a four-
phase clock is essential. Second, unlike in previous approaches clock signals can be most accurately generated by dividing a
[6], six samples and are generated in 40-GHz signal with a 2:1 frequency divider resulting in differ-
order to process every single data transition as indicated in the ential 0 and 90 phase-shifted clock signals. Studies published
fourth eye diagram in Fig. 3. This increases the maximum PD earlier [7] using a similar technology show a maximum oper-
gain, reducing the jitter generation compared to a single-edge ating frequency of 42 GHz with a standard static frequency
PD. divider. To achieve a higher performance margin, a dynamic
The PD output signals and are derived frequency divider similar to [8] was employed. As it is indi-
from the following logical operations on the six samples after cated in the circuit diagram (Fig. 4), the divided clock signal
their synchronization with the clock phase C0. is stored by parasitic capacitances, which results in a higher
operating frequency compared to a static frequency divider. As
a drawback of the dynamic approach, the operating frequency
exhibits a lower limit.

B. On-Chip 40-GHz VCO


The internal 40-GHz VCO is based on a differential Col-
pitts topology using microstrip transmission lines instead of a
spiral inductor (Fig. 5). At an oscillation frequency of 40 GHz, a
In this design, an accurate four-phase clock at 20 GHz is grounded microstrip line can be modeled quite accurately com-
generated from the 40-GHz VCO output using a symmetrically pared to other forms of inductors.
loaded 2:1 frequency divider. The microstrip transmission lines exhibit an inductive input
Additionally, a limiting amplifier is implemented to improve impedance, since for the odd mode they see a virtual ground
the input sensitivity. It feeds the 40-Gb/s input data to the four termination. It is physically implemented with the signal line on
parallel latch chains generating the six samples. the upper thick metallization layer and a shielding ground plane
To process the two 1:2 demultiplexed 20-Gb/s signals on the lowest metal layer. This yields maximum inductive input
and with commercially available 10-Gb/s DEMUX ICs, an impedance per line length combined with minimum resistive
additional 2:4 demultiplexer is included, which produces the losses, which optimizes the quality factor .
10-Gb/s output signals D00, D01, D10, D11, and the 5-GHz As the PLL employs a high-speed and a low-speed filter
output clock C5G. in parallel, the VCO has two separate frequency tuning inputs.
The PLL filter has a parallel proportional ( ) integrating These tuning inputs and feed two ac-cou-
( ) structure ( filter). The high-speed filter aligns the pled reverse-biased varactor diodes controlling the VCO fre-
phase relation between clock and data. Thus, the digital PD quency. The varactors have minimum size, resulting in a max-
output pulses and each generate a dynamic imum VCO frequency modulation bandwidth as required by the
frequency step in the VCO frequency. Integration high-bandwidth bang-bang PD architecture.
of by the filter with low bandwidth It should be noted that the optimization of the free-running
controls the static VCO frequency. Since the filter and the VCO phase noise is not the major design goal. Since the
filter work at different speeds, both paths are decoupled from bang-bang PLL architecture provides very high bandwidth with
1940 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 4. Circuit diagram of the 2:1 dynamic frequency divider.

D. Clock Distribution and Latch

Fig. 7 shows the circuit diagram of the latch. The latch struc-
ture can be subdivided into the latch core and the local clock
input stage.
The steepness of the PD curve is strongly determined by the
metastability and the clock phase margin (CPM) of the latch as
sample the bit transition when the PLL is in lock. Two high
current-biased emitter followers in the data path provide a high
CPM and a small metastability region.
For optimal clock distribution, a local clock input stage con-
sisting of termination resistors and two emitter followers
is included in each latch cell, since a total clock line length
Fig. 5. Circuit diagram of the 40-GHz on-chip VCO.
of several millimeters cannot be avoided. For this reason, cur-
rent interfaces between each clock buffer and the latches are
respect to loop gain, the VCO phase noise is suppressed by its
employed. The concept of the clock distribution is illustrated
open loop gain when the PLL is in lock.
in Fig. 8. The two clock buffers are based on open-collector
C. Limiting Data Amplifier transadmittance stages. As stated before, a local clock input
stage consisting of termination resistors is included in
The limiting data input amplifier (Fig. 6) employs cascaded
every latch cell. Each clock buffer is loaded with ten latches.
chains of emitter followers (EF), transimpedance stages (TIS),
Impedance matching can be easily achieved by designing
and transadmittance stages (TAS) in accordance with the con-
cept of impedance mismatch [9]. the line impedance according to the number of loading
Layout aspects strongly influence the circuit performance; es- latches. The value of can be locally adapted with respect
pecially, it should not be degraded by signal interconnects. Since to signal splits, as it is shown in Fig. 8. If the line impedances
signal lines can be distinguished into critical and uncritical lines are chosen (with being the number of loading
[9], long transmission lines are arranged between current inter- latches), the clock amplitude can be increased. This is due
faces consisting of a TAS and its load, which is either repre- to the inductive peaking effect of the transmission line, as
sented by an active TIS or by passive resistors. For this reason, indicated in Fig. 9.
the limiting data amplifier is implemented—both in schematic The layout of the latch, which is shown as part of Fig. 10,
and layout—in the form of three separate amplifier blocks with employs orthogonal data and clock inputs. This implementation
a TIS–TAS interface. minimizes line length on the high-speed data path by running a
As the data signal has to be split into four latch chains (Fig. 2) data channel directly through the cascaded latch cells (Fig. 10).
and longer lines cannot be avoided, four TIS2 stages are driven In addition, clock channels run beside the cells to simplify the
in parallel by one TAS2. clock-tree routing.
REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC 1941

Fig. 6. Circuit diagram of the 40-Gb/s limiting data amplifier.

Since the self-oscillating frequency of the divider is roughly 41


GHz divided by 2, the circuit is very sensitive in the frequency
range centered around the nominal operating frequency of 40
GHz. The dynamic principle of the frequency divider results in
a minimum operating frequency of roughly 33 GHz at an input
power of 1 dBm.

B. On-Chip 40-GHz VCO


A similarly centered behavior can be measured for the
on-chip VCO represented by the tuning characteristic, as
shown in Fig. 14. The VCO tuning range expands from 37.7 to
41.2 GHz.

Fig. 7. Circuit diagram of the latch. C. Bang-Bang PD


The measured PD transfer function is
illustrated in Fig. 15. The curve shows a fairly steep slope in the
VI. PHYSICAL REALIZATION AND MEASUREMENT RESULTS lock-in point, which implies good sampling capability at data
The CDR/DEMUX with VCO dissipates 5.4 W in total. All transition and a small metastability region, corresponding to the
high-speed blocks operate at 5.5-V supply voltage and the 2:4 observed high CPM of 240 .
DEMUX at 4.2-V supply voltage. The whole die (Fig. 11)
occupies an area of 3005 m . D. Jitter Generation and Jitter Tolerance
Closed-loop PLL measurements are performed with mounted The rms jitter of the recovered clock is measured using
ICs using single-ended 4-b interleaved OC192 SONET signals a sampling oscilloscope. By excluding the trigger jitter of the
with pseudorandom bit sequence (PRBS) payload for test equipment ( ps) from the original measurement
electrical back to back and PRBS payload for the result of less than 1.2 ps, the overall CDR rms jitter generation
optical transmission experiments. For these measurements, the can be calculated to be approximately 0.7 ps.
CDR/DEMUX and the external components are mounted on For SONET and SDH systems, jitter tolerance masks are de-
a duroid substrate ( ) using standard wire bonding, fined. As standardization is not finalized for 40-Gb/s systems
as shown in Fig. 12. The IC is placed into a cavity, reducing yet, the tolerance mask is extrapolated from the 10-Gb/s specifi-
the bondwire length. The substrate is attached onto a grounded cation. The measured jitter tolerance curve, given in Fig. 16, ex-
brass box. The 40-Gb/s input data is fed single-ended into the hibits sufficient margin relative to the extrapolated BELLCORE
high-speed box via a V-connector. mask [10]. For jitter frequencies higher than the corner fre-
quency MHz of the PLL, the jitter tolerance is de-
A. 2:1 Dynamic Frequency Divider termined by the CPM of the first latches and the PLL has no in-
Fig. 13 illustrates the input sensitivity of the dynamic fre- fluence. For jitter frequencies lower than MHz, the
quency divider in single-ended operation. A maximum oper- jitter tolerance decreases with 20 dB per decade. Large amounts
ating frequency of more than 44.5 GHz is observed giving suffi- of jitter can be tolerated at low frequencies, since the filter pro-
cient margin with respect to temperature and process variations. vides high gain in this frequency range.
1942 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 8. Block diagram of the 20-GHz quadrature clock distribution.

Fig. 9. Rough odd-mode simplification of the clock distribution.

Fig. 10. Layout of the entire clock tree and enlargement of the latch.
E. Electrical Sensitivity (BER)
The bit-error rate (BER) curve as measure of the overall
electrical CDR performance is given in Fig. 17. For the photodiode to the CDR/DEMUX as the optical measurement
CDR/DEMUX with external VCO, a very high sensitivity results demonstrate.
of 28-mV single-ended voltage swing at is
measured. Due to minor modifications of the limiting amplifier F. Performance in System Application
resulting in lower bandwidth, the CDR/DEMUX with internal The performance of the CDR/DEMUX embedded in a op-
VCO provides slightly less sensitivity. For the same BER, a tical fiber link (refer to Fig. 1) can be characterized by the op-
42-mV single-ended voltage swing is necessary. A contribution tical signal-to-noise ratio (OSNR) measurement, as shown in
of the internal VCO to the performance degradation can be Fig. 18. This is due to the fact that in the given configuration the
ruled out, since a variant with external VCO and modified sensitivity is limited by the noise of the OA.
limiting amplifier showed similar performance degradation. A 50- terminated photodiode is directly connected to the
Such high input sensitivity allows a direct connection of the CDR/DEMUX without any electrical amplifier in between, so
REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC 1943

Fig. 14. Tuning curve of the 40-GHz on-chip VCO.

Fig. 11. CDR/DEMUX micrograph (CDR/DEMUX with VCO).

Fig. 15. Measured PD transfer function.

Fig. 12. CDR/DEMUX test fixture with mounted CDR/DEMUX.

Fig. 16. Jitter tolerance measurement result.

Fig. 13. Input sensitivity of the 2:1 dynamic frequency divider.

that the CDR/DEMUX works as a single-chip-receiver IC. An


externally modulated signal is transmitted over an 80-km op-
tical-fiber link. To achieve a BER of 10 , a minimum OSNR
of 23.3 dB is required. Fig. 17. BER measurement.
1944 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Claus Dorschky received the Dipl.-Ing. degree in


electrical engineering from Friedrich Alexander
University, Erlangen, Germany, in 1986.
He has been working in the development depart-
ment for high-speed optical transmission systems
at Philips Kommunikation Industries (later Lucent
Technologies), Nürnberg, Germany, for 14 years. His
research interests include design and integration of
analog and mixed-signal full custom ICs for 10- and
40-Gb/s as well as integration of optical receivers
and transmitters into single wavelength and DWDM
transmission systems at those bitrates. In early 2001, he cofounded CoreOptics
Inc., Nürnberg, Germany.

Fig. 18. OSNR measurement.

Eduard Rose was born in Kischinjow, Moldova,


VII. CONCLUSION in 1973. He received the Dipl.-Ing. degree in elec-
In this paper, the implementation of a fully integrated trical engineering from Ruhr-University Bochum,
Germany, in 1998.
CDR/DEMUX for 40-Gb/s TDM application in a state-of-the- He joined Lucent Technologies, Optical Net-
art SiGe HBT process has been demonstrated. Key success fac- working Group, Nürnberg, Germany, in 1999, where
tors in the design are, first, the robust half-bit-rate architecture he started developing different analog and digital
high-speed bipolar ICs for SDH/Sonet systems. He
with bang-bang PD, and second, its implementation by parti- is currently with CoreOptics Inc., Nürnberg, working
tioning the whole chip into key building blocks interconnected on a second-generation chipset for a 40-Gb/s optical
via robust current interfaces. Therefore, schematic design and link system.
layout have to be done concurrently.
This concept has been applied throughout the IC, but mainly
in the 40-Gb/s data and the 20-GHz clock distribution. Optical
system measurements show the feasibility of the CDR/DEMUX Rajasekhar Pullela received the B.Tech. degree
in electrical and communications engineering from
as a single-chip-receiver IC. the Indian Institute of Technology, Madras, India, in
1993. From 1993 to 1998, he worked as a graduate
REFERENCES student researcher at the University of California,
[1] T. F. Meister et al., “SiGe base bipolar technology with 74-GHz f Santa Barbara. During this period, he received M.S.
and 11-ps gate delay,” Proc. IEEE Int. Electron Devices Meeting and Ph.D. degrees in electrical engineering, studying
(IEDM), pp. 739–742, Dec. 1995. device physics and high-speed circuit design.
[2] J. Müllrich, T. F. Meister, M. Rest, W. Bogner, A. Schöpflin, and H.-M. During 1998–2000, he worked as a Member of
Rein, “40-Gbit/s transimpedance amplifier in SiGe bipolar technology Technical Staff at Bell Laboratories, Lucent Tech-
for the receiver in optical-fiber links,” Electron. Lett., vol. 34, pp. nologies, Murray Hill, NJ, designing high-speed ICs
452–453, 1998. for fiber-optic communication systems. Since 2000, he has been with Gtran,
[3] A. Felder, M. Möller, M. Wurzer, M. Rest, T. F. Meister, and H.-M. Inc., Newbury Park, CA.
Rein, “60-Gbit/s regenerating demultiplexer in SiGe bipolar tech-
nology,” Electron. Lett., vol. 33, pp. 1984–1985, 1997.
[4] J. Hauenschild et al., “A plastic packaged 10-Gb/s BiCMOS clock and
data recovery 1:4-demultiplexer with external VCO,” IEEE J. Solid-
State Circuits, vol. 31, pp. 2056–2059, Dec. 1996. Peter Mayer was born in Germany on July 11,
[5] J. J. D. H. Alexander, “Clock recovery from random binary signals,” 1964. He received the Dipl.-Ing. degree in electrical
Electron. Lett., vol. 11, pp. 541–542, Oct. 1975. engineering from Friedrich Alexander University,
[6] M. Wurzer et al., “A 40-Gb/s integrated clock and data recovery circuit Erlangen, Germany, in 1989.
in a 50-GHz f silicon bipolar technology,” IEEE J. Solid-State Cir- In 1989, he joined Philips Kommunikation In-
cuits, vol. 34, pp. 1320–1324, Sept. 1999. dustries, Nürnberg, Germany, and has been working
[7] M. Wurzer et al., “42-GHz static frequency divider in a Si/SiGe bipolar on 622-Mb/s optical interface circuits. In 1998,
technology,” in IEEE ISSCC Dig. Tech. Papers, Feb. 1997, pp. 123–123.
he started developing clock and data recovery ICs
[8] Z. Lao et al., “55-GHz dynamic frequency divider IC,” Electron. Lett.,
for 10- and 40-Gb/s applications. He is currently a
vol. 34, no. 20, pp. 1973–1974, 1998.
[9] H.-M. Rein et al., “Design considerations for very-high-speed Technical Manager at Lucent Technologies GmbH,
Si-Bipolar ICs operating up to 50-Gb/s,” IEEE J. Solid-State Circuits, Nürnberg, where he is responsible for high-speed
vol. 31, pp. 1076–1090, Aug. 1996. optical/electrical module design and integration for optical transmission
[10] “SONET OC-192 Transport System Generic Criteria,” Bellcore, systems.
GR-1377-CORE, Dec. 1998.

Mario Reinhold was born in Mülheim/Ruhr, Ger-


many, in 1972. He received the Diplom-Ingenieur de- Frank Kunz was born in Bad Sobernheim, Germany,
gree in electrical engineering from the Ruhr-Univer- in 1970. He received the Dipl.Ing. degree in electrical
sity Bochum, Germany, in 1998. engineering from the Ruhr-University Bochum, Ger-
He joined Lucent Technologies, Optical Net- many, in 1998.
working Group, Nürnberg, Germany, in 1998, Until February 2001, he was with Lucent Tech-
where his activities focused on the development nologies GmbH, Nürnberg, Germany, developing
of various analog and digital high-speed bipolar high-speed bipolar ICs for advanced 10-Gb/s
ICs for 40-Gb/s and advanced 10-Gb/s fiber-optic optical communication links. He is currently with
communication systems. Since 2001, he has been CoreOptics Inc., Nürnberg, and is working on a
with CoreOptics Inc., Nürnberg, Germany, working second-generation chipset for a 40-Gb/s optical link
on a next-generation 40-Gb/s chipset. system.
REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC 1945

Yves Baeyens (S’89–M’96) received the M.S. and John-Paul Mattia received the B.S., M.S., E.E.,
Ph.D. degrees in electrical engineering from the and Ph.D. degrees in electrical engineering and
Catholic University, Leuven, Belgium, in 1991 computer science from the Massachusetts Institute
and 1997, respectively. His Ph.D. research was of Technology, Cambridge.
performed in cooperation with IMEC, Leuven, and He began working in high-speed electronics
treated the design and optimization of coplanar at MIT Lincoln Laboratory in 1989. In 1996, he
InP-based dual-gate HEMT amplifiers, operating up joined Texas Instruments Inc. in the DSP R&D
to W-band. organization. From 1997 to 2000, he worked in
After a year and a half stay as a Visiting Scien- the High-Speed Electronics Group of Lucent Bell
tist at the Fraunhofer Institute for Applied Physics, Labs, designing and testing circuits for lightwave
Freiburg, Germany, he is currently a Technical Man- communication systems. Since July 2000, he has
ager in the High-Speed Electronics Research Department of Lucent Technolo- been at Big Bear Networks, Sunnyvale, CA, where he is Chief Technical
gies, Bell Laboratories, Murray Hill, NJ. His research interests include the de- Officer of Electronics.
sign of mixed analog–digital circuits for ultrahigh-speed lightwave and mil-
limeter-wave applications.

Thomas Link was born in 1968 in Nürnberg,


Germany. He received the Dipl.-Ing. (FH) degree
in electrical engineering from Georg Simon Ohm
Fachhochschule, Nürnberg, Germany, in 1991.
In 1991, he joined Philips Kommunikation Indus-
trie AG (now Lucent Technologies), Nürnberg. He is
Member of Technical Staff in the high-speed ASIC
group and designed various high-speed ASICs, cir-
cuit packs, and firmware for SDH/Sonet systems.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000 1949

A Fully Integrated SiGe Receiver IC


for 10-Gb/s Data Rate
Yuriy M. Greshishchev, Member, IEEE, Peter Schvan, Member, IEEE, Jonathan L. Showell, Member, IEEE,
Mu-Liang Xu, Member, IEEE, Jugnu J. Ojha, Member, IEEE, and Jonathan E. Rogers, Student Member, IEEE

Abstract—A silicon germanium (SiGe) receiver IC is presented


here which integrates most of the 10-Gb/s SONET receiver func-
tions. The receiver combines an automatic gain control and clock
and data recovery circuit (CDR) with a binary-type phase-locked
loop, 1 : 8 demultiplexer, and a 27 1 pseudorandom bit sequence
generator for self-testing. This work demonstrates a higher level
of integration compared to other silicon designs as well as a CDR
with SONET-compliant jitter characteristics. The receiver has a
die size of 4 5 4 5 mm2 and consumes 4.5 W from 5 V.
Index Terms—Clock and data recovery (CDR), jitter generation,
jitter tolerance, jitter transfer, phase detector, phase-locked loop
(PLL), SONET, VCO. Fig. 1. 10-Gb/s receiver architecture. Dotted box shows components
integrated in the SiGe receiver IC presented in the paper.

I. INTRODUCTION
pseudorandom bit sequence (PRBS) generator for self-testing,
A TYPICAL fiber-optic SONET receiver contains pin-diode
with transimpedance (TZ) amplifier, wide dynamic range
automatic gain control amplifier (AGC), and a clock and data
as shown in the dotted-line box in Fig. 1 [6]. Receiver perfor-
mance mounted into test fixture was verified in a data-recovery
mode up to 12.5 Gb/s and in a CDR mode at 9.1 Gb/s (only
recovery circuit (CDR) with a demultiplexer. Introduction of
limited by the VCO maximum oscillation frequency after
dense wave-division-multiplexed (DWDM) systems has put a
packaging). The OC192 10-Gb/s SONET-compliant jitter
high demand on the receiver production. A high level of 10-Gb/s
characteristics of the CDR were verified on-wafer with a
component integration, as opposed to using a filter-based CDR
membrane probe card and with a jitter analyzer from Anritsu.
architecture [1], is required along with self-testing capabilities
Phase-noise characteristics have also been measured to confirm
to reduce receiver cost, module size, and power dissipation. One
the CDR’s sub-picosecond rms jitter performance. Measured
of the major difficulties in the integration of 10-Gb/s receiver
10-Gb/s maximum receiver sensitivity de-embedded after
is to achieve jitter characteristics compliant to the SONET re-
losses in the test fixture is 4.5 mV at a bit-error rate (BER) of
quirements, such as Bellcore recommendations for the OC192
at the demultiplexer (DEMUX) output.
system [2]. To the authors’ knowledge, none of the previously
In Section II, the binary CDR architecture used in the receiver
reported [3]–[5] 10-Gb/s receiver ICs with the integrated clock
is briefly analyzed as compared to a linear-type CDR and design
and data recovery circuit (CDR) demonstrated all of the SONET
method to meet SONET jitter requirements is presented. Then in
compliant jitter characteristics. While sub-picosecond jitter gen-
Section III, the full receiver architecture and the building blocks
eration was previously confirmed in the SONET CDR [5], an-
implementation details are discussed. In Section IV, the IC die
other important question is if all of the receiver components can
fabrication features are described. Finally, in Section V, mea-
be integrated on a die without sensitivity and jitter performance
sured results are presented.
degradation.
A fully integrated SiGe receiver IC, presented in the
paper, combines CDR, AGC, 1 : 8 demultiplexer and II. BINARY CDR IN SONET RECEIVER
A. Binary CDR Versus Linear CDR
Manuscript received April 17, 2000; revised June 29, 2000. The CDR published in [5] uses a linear-type PLL approach
Y. M. Greshishchev and P. Schvan are with Nortel Networks, Ottawa, ON [Fig. 2(a)], while the CDR presented here is based on a binary
K1Y 4H7, Canada (e-mail: greshy@nortelnetworks.com).
J. L. Showell was with Nortel Network and is currently with Quake Tech- PLL [Fig. 2(b)]. In the binary PLL, a binary Alexander-type [7]
nologies Inc., Ottawa, ON, Canada. phase detector (PD) is used as compared to the Hogge-type PD
M.-L. Xu was with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. He is [8] in a linear-type PLL. Examples of using binary architecture
now with Conextant Systems, San Diego, CA.
J. J. Ojha was with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. He in optical receiver ICs can be found in [9], [10]. Binary PD pro-
is now with Caspian Networks, Palo Alto, CA (e-mail: jojha@caspiannet- duces two digital outputs, UP and DOWN, to signal if the data is
works.com). early or late with respect to the VCO clock. To control the VCO,
J. E. Rogers is with The University of Toronto, Toronto, ON M5S 3G4,
Canada. the binary information is split into two loops as suggested in
Publisher Item Identifier S 0018-9200(00)09475-0. [11]. The phase-control loop is formed with the UP and DOWN
0018-9200/00$10.00 © 2000 IEEE
1950 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

(a)
Fig. 3. CDR analytical jitter tolerance as compared to SONET mask.

(b) Fig. 4. Trade-off for the frequency step in binary CDR.


Fig. 2. Two basic self-aligned CDR architectures. (a) With a linear PLL. (b)
With a binary PLL. tion into a frequency-control (low-frequency) part and a phase-
control (high-frequency) part. In general, binary PLL is sim-
outputs directly modulating the VCO frequency with frequency ilar in system behavior to a double integration delta modulator
step ( denotes “bang-bang”—other name of the binary with prediction [12] acting in the data-phase domain. Based on
PLL architecture) via bang-bang frequency tune input. The fre- analogy with the signal frequency response in delta modulator,
quency loop of the binary PLL uses binary outputs integrated the phase-jitter transfer function has a nonlinear (slew-rate lim-
with charge pump and capacitor to control the second tune ited) mechanism for the phase-jitter frequency response. It is a
input of the VCO. To reduce jitter generation in absence of data single-pole-like response with the bandwidth inversely propor-
transitions (during long zeros or ones), a tri-state charge pump tional to the input jitter amplitude:
was employed. By the same reason, a tri-state is introduced to
the VCO bang-bang control input which is no longer binary, but (1)
ternary, where in a tri-state no frequency step is applied.
Table I shows the comparison for two PLL receiver architec- where is 3-dB bandwidth of the binary PLL jitter transfer
tures. Binary CDR is less demanding on the “analog” features of function; is the jitter amplitude, is the
an IC technology, and, in principal, has only one critical compo- bang–bang frequency step, is the average data transition
nent—the binary phase detector (BPD) where sub-picosecond density factor (maximum for pattern).
time resolution is required. The ring-oscillator-type VCO is rec- The shape of the Jitter Tolerance function can be charac-
ommended to reduce delay in the PLL loop and, therefore, jitter terized by means of jitter-tolerance scale function,
generation in the CDR, as explained later in the paper. The VCO [5]. Modeled binary PLL jitter tolerance response, ,
phase noise is less critical in a binary CDR because of the rela- is shown in Fig. 3 for two frequency steps . For
tively wide PLL bandwidth. In a linear-type PLL, jitter transfer comparison, Bellcore SONET mask is also shown on the same
characteristics can be analyzed using linear PLL theory (see, plot with the corresponding unit interval (UI) values on the right
for example, [5]). Binary-type PLL has nonlinear jitter transfer side of the graph1 . To satisfy the mask, jitter transfer bandwidth
characteristics and its analyses have not been presented in the defined by (1) should be set above 4 MHz at a jitter amplitude
technical literature. The following subsection describes the bi- ps and minimum average data transition
nary PLL design method used in the receiver design. density, .
The Jitter Generation in a binary CDR is proportional to the
B. Binary CDR Jitter Characteristics frequency step, , and delay in the PLL loop, measured as a
number of 100-ps clock periods required to propagate signal
In SONET applications three main jitter characteristics are from the phase detector output to its input:
important:
1) Jitter Tolerance; (2)
2) Jitter Transfer;
Equations (1) and (2) were used to find a frequency step
3) Jitter Generation.
and a delay acceptable for the SONET applications as
The Jitter Generation and loop stability was first analyzed by
R. Walker et al. [11]. This work also suggested loop decomposi- 1In a 10-Gb/s system, UI = 100 ps.
GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC 1951

TABLE I
COMPARISON OF LINEAR AND BINARY TYPE CDR ARCHITECTURES

shown in Fig. 4. The minimum value for the frequency step stages implemented with output current steering in
is determined by Jitter Tolerance minimum bandwidth a differential pair [13]. Stage has a fixed gain and also
requirements (4 MHz); the maximum value is limited provides open collector transmission line interface to drive the
by jitter generation (10 ps is recommended [2]). In the de- CDR data bus. To alleviate conflicting requirements for large
sign presented here, ps was assumed. To reduce data swing and low noise figure at low input amplitudes, the
jitter generation delay, should be minimized. This makes the AGC has two gain ranges: 7–7 dB (low gain range) and 7–20
ring-type VCO preferable in the binary CDR as compared to dB (high gain range). Two differential pairs with a “large” and
LC-tank based VCO where tuning delay is larger due to the usu- a “small” emitter degeneration resistors are used to switched
ally higher -factor of the LC-tank. the gain ranges. AGC-measured S11 is better than 15 dB in
a frequency range up to 10 GHz, noise figure 13.5 dB. The ac
bandwidth is adjustable in a range of 8–10 GHz.
III. RECEIVER ARCHITECTURE
A. Architecture C. CDR
As compared to original version of binary CDR [7], [11], in
The receiver architecture is shown in Fig. 5. It combines an
the CDR presented here, the data decision and clock recovery
AGC and a binary CDR with a 1 : 8 demultiplexer and a
processes are split. This allows for independent optimization
PRBS generator for self-testing. The receiver recovers 10-Gb/s
of data decision threshold (slicing) without affecting clock re-
data and a 10-GHz clock, and produces eight demultiplexed
covery process. There are four decision channels in the CDR, all
1.25-Gb/s CML data outputs with a 1.25-GHz clock. The PRBS
driven by the CDR data bus. Channels 1 and 2 are identical de-
generator allows functional testing of the CDR and subsequent
cision circuits, as shown in the block diagram of Fig. 7(a). The
circuits. A PRBS clock (CLK) is required for testing. In the test
additional decision channel allows operation with two different
mode, the PRBS output is enabled to drive the CDR data bus.
input slicing levels. The data decision threshold is set by a differ-
In the receiver mode, the AGC output is enabled. The recovered
ential slicer circuit based on an emitter follower [Fig. 7(b)]. In a
10-Gb/s data and 10-GHz clock appear at the recovered data
long-haul receiver application, a high-performance limiting am-
(DATA REC BUS) and clock (CLK REC BUS) buses, driving a
plifier [2 in Fig. 7(a)] is required. Note that the AGC stabilizes
1 : 8 DEMUX circuit. A clock signal can also be supplied exter-
only a long-time averaged amplitude measured at AGC output
nally (CLKx) for data recovery operation only.
with a peak detector (not shown in Figs. 5 and 6). The limiting
amplifier stage was designed for 40-dB gain with bandwidth of
B. AGC more than 16 GHz and input AM to output PM conversion less
The block diagram of the AGC is shown in Fig. 6. The than 1 ps in 20-dB input dynamic range. Two 20-dB gain-lim-
AGC has total linear gain range from 3 to 20 dB with a iting amplifier stages similar to [14] were employed. The dig-
maximum input of 1.7 V . The AGC has two variable gain ital sampler is based on a master–slave–master (MSM) flip-flop
1952 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

Fig. 5. SiGe receiver architecture.

Fig. 6. AGC block diagram.

configuration [Fig. 7(c)]. It helps to reduce the latching metasta-


bility region and to increase the clock phase margin in the deci-
sion circuit, as opposed to master–slave D-type flip-flop (DFF).
Schematic and layout of the latch was optimized for a minimum
latching time constant.
The BPD is formed with decision channels 3 and 4, and two
DFF circuits. The BPD takes three data samples according to the Fig. 7. Decision channel. (a) Block diagram: 1–slicer; 2–limiting amplifier;
timing diagram in Fig. 5. It generates a binary output with re- 3–digital sampler. (b) Slicer schematic. (c) Digital sampler schematic.
spect to the lead/lag phase of the VCO clock. The BPD uses only
one edge of the data transition, and is set to a tri-state condition for the clock. Frequency shift of 12.5 MHz for the data pro-
at the other edge or in the absence of data transitions according vided binary pulses at the output of the phase detector with
to a truth table (Table II). As a result, recovered clock jitter is 40-ns period corresponding to a 100-ps delay sweep. To ana-
not effected by asymmetry between the rising and the falling lyze the output pulses, a digital scope was synchronized from
edges or by the incoming data pattern. The BPD phase reso- the phase-detector output using an HP 54118A trigger ampli-
lution is critical to the CDR jitter performance. Latch metasta- fier. This method allowed measurement of the output transition
bility at time sample T0 (see Fig. 5) limits the resolution. This region with the accuracy of 0.5 ps per one data cycle (200 ps).
problem was circumvented by using a MSM-based decision cir- Measured transition region at the phase detector output contains
cuit preceded by a limiting amplifier, as described above. Phase two data cycles, confirming phase detector time resolution to be
resolution better than 1 ps was measured in the BPD circuit, as less than 1 ps.
shown in Fig. 8. A phase-delay sweep was arranged with two The CDR frequency-control loop and the phase-control loop
clock generators: 5.0125 GHz for the data and 10.0000 GHz are separated as described in Section II. The bang-bang part of
GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC 1953

TABLE II
TRUTH TABLE OF BINARY PHASE DETECTOR

Fig. 9. Ring-type VCO block diagram.

Fig. 8. BPD measured time resolution.

the PLL controls the recovered clock phase via the input of
the VCO. The frequency loop includes the charge pump and an
external integration capacitor (pins C1 and C2).
The VCO is a ring oscillator type with an architecture shown
in Fig. 9. A mixer-type delay cell is used to control the oscil-
lation frequency. The mixer cell is split into a fine-tune (for
the internal frequency loop) and a coarse-tune (to compensate
for process variation). Care was taken to provide symmetrical
bang-bang frequency steps with respect to the tri-state. All of the
VCO control inputs were implemented with the high-impedance Fig. 10. 1 : 8 demultiplexer block diagram.
pMOS buffers. A pMOS-based charge pump was employed [5].

D. 1 : 8 Demultiplexer
The 1 : 8 DEMUX (Fig. 10) is similar in architecture to
the design presented in [15]. Seven 1 : 2 demultiplexer cir-
cuits are cascaded, with each stage optimized for the clock
frequency required. Each 1 : 2 demultiplexer consists of a
master–slave–master flip-flop to capture the lead bit on the pos-
itive edge of the clock and a master–slave flip-flop to capture
the second bit using the negative edge of the clock. Utilizing
an extra latch in the MSM flip-flop ensures that the 1 : 2 data
outputs are aligned for further processing. The frequency of the
incoming CDR clock is divided by two at each demultiplexer
stage with a delay equal to the data delay in the 1 : 2 block.
Fig. 11. 2 0 1 PRBS generator block diagram.
E. Built-in PRBS Generator
The PRBS generator (Fig. 11) was implemented using The parallel form avoids the necessity of distributing a 10-GHz
the standard polynomial equation in a parallel form. clock, as would be required if using a shift-register-type PRBS.
1954 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

Fig. 13. SiGe receiver mounted into microwave-style test fixture.

Fig. 12. SiGe receiver die micrograph.

In an 8-bit parallel form, the clock frequency is reduced to 1.25


GHz. Similar to the AGC, the output of the multiplexer provides
open collector transmission line interface to drive the CDR data
bus. The disadvantage of a parallel architecture is penalty in
the die area and the power consumption. In normal operation
(the AGC transmits data to the CDR) the PRBS power supply
is turned off, reducing the overall power consumption of the
receiver.

F. CDR Simulation
Because of the nonlinear jitter response of a binary CDR,
hierarchical numerical analysis was an important part of the Fig. 14. SiGe receiver eye diagrams measured in CDR mode at 9.1 GB/s. Input
receiver IC design. Four levels of PLL analysis were carried data: 80 mV p PRBS 2 0 1.
out: analytical, behavioral, schematic level, and post-layout ex-
tracted circuit with distributed parasitics. The last three levels cm in the test fixture was defined by the perimeter re-
are HSPICE-based. A behavioral library of linear and digital quired for mounting I/O connectors in the housing metal box. A
components was developed. Analytical models of jitter transfer large number of required I/O were used for testing purposes. The
and jitter tolerance are based on simplified binary-PLL theory, receiver IC does not require external components, except decou-
as described above. pling and integration capacitors mounted beside the die. The
recovered clock and data eye diagrams at 9.1 Gb/s are shown
IV. FABRICATION in Fig. 14. The IC input sensitivity is less than 4.5 mV at
The receiver IC was implemented in IBM”s SiGe technology measured at the 1 : 8 demultiplexer output with
( GHz, GHz). The microphotograph of the AGC gain set to 20 dB (data eye closure in the test fixture was
die is shown in Fig. 12. The die size is mm . The major de-embedded). DATA : 8 transition distortion apparent in Fig. 14
circuit building blocks were not only integrated into the receiver, is due to long ribbon cable attached to the test fixture demulti-
but were implemented as individual IC components and tested. plexed outputs. The receiver consumes in a mission mode 4.5 W
In the receiver IC, the building blocks were physically parti- from 5 V.
tioned with a transmission line circuit and layout isolation inter- The CDR performance IC was fully characterized at 10-Gb/s
face similar to that presented in [5], [14]. Separate power supply on-wafer with a probe card. The die micrograph of the CDR
systems with digital and analog grounds were routed. is shown in Fig. 15. It consists of an exact copy of the receiver
CDR layout plus the output buffers located in the DEMUX parti-
V. EXPERIMENTAL RESULTS tion. In all of the measurements, input data were supplied single-
ended while unused differential input was terminated with 50 .
The IC worked at first implementation with the VCO oscilla- The CDR typical eye diagrams measured with 20 mV
tion frequency 10% lower than simulated. The receiver IC was PRBS data are shown in Fig. 16. The input sensitivity was mea-
mounted into a microwave test fixture (Fig. 13) and was tested sured to be 14 mV at as compared to 13.4 mV
at 9.1 Gb/s (VCO oscillation frequency limit2 ) in a CDR mode simulated considering thermal and shot noise in the decision
and up to 12.5 Gb/s in a data-recovery mode or in internal PRBS channel.
test mode with an external clock. The carrier substrate size of Phase noise of the recovered clock was measured with an
2Maximum oscillation frequency can be easily corrected by removing one HP 4352B as a power spectrum density (Fig. 17). 10-Gb/s input
delay stage in the VCO design of Fig. 9. data were supplied with amplitude of 100 mV and
GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC 1955

Fig. 15. SiGe CDR IC die micrograph. Fig. 18. CDR jitter transfer.

Fig. 16. CDR IC eye diagrams measured in CDR mode at 10 Gb/s.

Fig. 19. CDR jitter tolerance. Performance measured with the clock reference
level modulaton test marked with symbol .

clock gives jitter RMS value of 0.78 ps. Phase noise was found
to be PRBS pattern independent up to a pattern of .
The OC192 jitter compliant performance (at 9.953 28 Gb/s)
was verified with a jitter analyzer MX177 701 from Anritsu.
Jitter generation (in 80-MHz bandwidth) was measured to be
5.4 ps and 0.8 ps RMS as compared to 10 ps or 1 ps RMS
recommended by Bellcore [2]. The RMS jitter is very close to
the 0.78-ps value obtained in the phase-noise measurement.
Jitter transfer measurement (Fig. 18) showed, as predicted
by modeling, single-pole-like characteristics with no jitter
Fig. 17. Phase-noise comparison of the CDR recovered clock, free running peaking. Jitter tolerance (Fig. 19) has a very wide safety margin
VCO and data pattern generator (BERT) clock.
for SONET mask with a minimum of 40 ps (15 ps is
recommended). The shape of measured CDR jitter tolerance
response differs from the modeled in Fig. 3 because of test
PRBS pattern. For comparison, phase noise of the free-running setup limitations. This is seen from the measured jitter tolerance
VCO and the data-pattern generator was also measured and of the test setup (BERT) with no CDR in the data path (shown
shown on the same plot. As expected in a high performance in the same plot of Fig. 19). Only in the frequency range of
CDR, the recovered clock-phase noise follows, with no error, 40 kHz–2 MHz measured jitter tolerance is determined by CDR
the data-reference clock noise down to the CDR jitter noise performance. In this frequency range, measured and modeled
floor at 110 dBc/Hz. Similar recovered phase noise was jitter tolerance coincide. The upper frequency range of the
achieved in the CDR design with a linear PLL and LC-type jitter tolerance response was also remeasured with a different
VCO [5]. Numerically integrated phase noise of the recovered method, based on the reference voltage (see Fig. 5)
1956 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

modulation with a sin-wave signal. Minimum jitter tolerance of [14] Y. Greshishchev and P. Schvan, “60-dB gain 55-dB dynamic range
40 ps was measured. Both jitter transfer and jitter tolerance 10-Gb/s SiGe HBT limiting amplifier,” IEEE J. Solid-State Circuits,
vol. 34, pp. 1914–1920, Dec. 1999.
response were found to be PRBS pattern independent. The [15] L. I. Anderson et al., “Silicon bipolar chipset for SONET/SDH 10-Gb/s
IC demonstrated a 60-MHz frequency range of robust PLL fiber-optic communication links,” IEEE J. Solid-State Circuits, vol. 30,
locking and operation even at the input signals well below the pp. 210–218, Mar. 1995.
sensitivity level.

VI. CONCLUSION Yuriy M. Greshishchev (M’95) received the


M.S.E.E. degree from Odessa Electrotechnical
A fully integrated SiGe receiver IC is presented, which com- Institute of Communications, Odessa, Ukraine,
bines self-aligned CDR with integrated binary PLL, AGC, 1 : 8 in 1974 and the Ph.D. degree in electrical and
computer engineering from V.M. Glushkov Institute
demultiplexer, and PRBS generator for self-testing. The of Cybernetics, Microelectronics Division, Kyiv,
receiver, mounted into a test fixture, operates up to 9.1 Gb/s Ukraine, in 1984.
(VCO limit) in a CDR mode and up to 12.5 GB/s in a data-re- From 1976 to 1994, he worked with research
and development organizations and academia on
covery mode. Maximum die sensitivity is 4.5 mV at high-speed silicon bipolar and GaAs MESFET ADC
measured at 1 : 8 DEMUX output. Receiver die size is and DAC integrated circuits. His Ph.D research
mm , and it consumes in a mission mode 4.5 W from was dedicated to the development of folding-type ADCs embedded into TV
systems. In 1993, he was a Visiting Scientist at Micronet, Institution Center
5-V power supply. CDR SONET-compliant jitter characteris- of University of Toronto, Ontario, Canada. In 1994, he joined the Department
tics were verified on-wafer. Jitter tolerance well exceeds OC192 of Electrical and Computer Engineering, University of Toronto, where he
Bellcore mask with a minimum of 40 ps . Jitter transfer has a conducted research on GaAs MESFET linear transmitter design for digital
wireless communication. Since 1996, he has been with Nortel Networks,
single-pole-like response with no peaking detected. Jitter gen- Ottawa, Ontario, where he is responsible for development of highly integrated
eration is less than 1 ps RMS and less than 5.5 ps . circuit solutions in emerging technologies for optical communications. He has
coauthored two books and numerous technical papers on the area of high-speed
communication circuit design, data converters, and statistical modeling.

ACKNOWLEDGMENT
The authors thank their colleagues S. Szilagyi for the
microwave test fixture design, and D. Marchesan and Peter Schvan (M’89) was born in Budapest, Hun-
Dr. S. Voinigescu for useful discussions and distributed gary, in 1952. He received the M.S. degree in physics
components modeling. Special thanks to R. Hadaway for his from Eotovos Lorand University, Budapest, in 1975
and the Ph.D. degree in electrical engineering from
directions and to IBM Corporation for fabrication. Carleton University, Ottawa, Ontario, Canada, in
1985.
In 1985, he joined Nortel Neworks, Ottawa, where
REFERENCES he started working in the area of BiCMOS and
bipolar technology development, yield prediction,
[1] B. Beggs, “GaAs HBT 10-Gb/s Product,” in 1999 IEEE MTT-S Int. Mi- device characterization, and modeling. Recently, his
crowave Symp. Workshop, Anaheim, CA, June 13–19, 1999. work has been extended to the design of multi-gigabit
[2] SONET OC-192, “Transport system generic criteria,” Bellcore, circuits and systems. He is currently Senior Manager of a group responsible
GR-1377-CORE, no. 4, Mar. 1998. for evaluating various high-performance technologies and demonstrating
[3] R. C. Walker et al., “A 10-Gb/s Si-bipolar Tx/Rx chipset for computer advanced circuit concepts required for fiber optic communication systems. He
data transmission,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302–303. has authored and co-authored numerous publications.
[4] T. Morikawa et al., “A SiGe single-chip 3.3-V receiver IC for 10-Gb/s
optical communication systems,” in ISSCC Dig. Tech. Papers, Feb.
1999, pp. 380–381.
[5] Y. Greshishchev and P. Schvan, “SiGe clock and data recovery IC
with linear-type PLL for 10-Gb/s SONET application,” in Proc. 1999 Jonathan L. Showell (S’90–M’95) received the
Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1999, pp.
B.Eng and M.Eng degrees in engineering physics
169–172. from McMaster University, Hamilton, ON, Canada,
[6] Y. M. Greshishchev, P. Schvan, J. L. Showell, M.-L. Xu, J. J. Ojha, and
in 1990 and 1994, respectively.
J. E. Roger, “A fully integrated SiGe receiver IC for 10-Gb/s data rate,” He joined Nortel Networks, Ottawa, Canada, in
in ISSCC Dig. Tech. Papers, Feb. 2000, pp. 52–53.
1994, working on hot carrier injection reliability of
[7] J. D. H. Alexander, “Clock recovery from random binary signals,” Elec-
CMOS devices. Later he became a member of the
tron. Lett., vol. 11, pp. 541–542, Oct. 1975. Technology Access and Applications Group where
[8] C. R. Hogge, “A self-correcting clock recovery circuit,” J. Lightwave
his responsibilities included accurate high-fre-
Technology, vol. 3, pp. 1312–1314, Dec. 1985. quency analog (up to 110 GHz) and digital (40
[9] J. Hauenschild et al., “A two-chip receiver for short-haul links up to
Gb/s) measurements and the design of high-speed
3.5-Gb/s with PIN-preamp module and CDR-MUX,” in ISSCC Dig. 10- to 40-Gb/s, multiplexer/demultiplexer circuits in SiGe HBT and InP
Tech. Papers, Feb. 1998, pp. 308–309.
HBT technologies, respectively. Recently, he joined Quake Technologies,
[10] J. Hauenschild et al., “A plastic packaged 10-Gb/s biCMOS clock and Ottawa, Canada, working on the design of chip sets for high-speed datacom
data recovering 1 : 4-demultiplexer with external VCO,” IEEE J. Solid-
applications. His interests include high-speed technologies, circuit design for
State Circuits, vol. 31, pp. 2056–2059, Dec. 1996.
high-speed communications, and accurate high-frequency measurements.
[11] R. C. Walker et al., “A two-chip 1.5-GBd serial link interface,” IEEE J.
Solid-State Circuits, vol. 27, pp. 1805–1811, Dec. 1992.
[12] R. Steele, Delta Modulation Systems. New York/Toronto: Wiley, 1975.
[13] M. Soda, T. Suzaki, and T. Morikawa et al., “A Si bipolar chip set for
10-Gb/s optical receiver,” in ISSCC Dig. Tech. Papers, Feb. 1992, pp.
100–101. Mu-Liang Xu (M’00), biography not available at time of publication.
GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC 1957

Jugnu J. Ojha (M’00) received the B.Eng. degree from Salhousie University Jonathan E. Rogers (S’00) received the B.A.Sc de-
and the Technical University of Nova Scotia in 1987. He received the M.Sc. and gree in engineering sciences (electrical option) from
Ph.D. degrees from McMaster University, Hamilton, Ontario, Canada, in 1990 the University of Toronto, Ontario, Canada, in 1999.
and 1994, respectively. His graduate work involved research in electronic and He is currently working toward the M.A.Sc in elec-
optoelectronic devices, as well as optoelectronic properties of semiconductors. tronics at the University of Toronto. His area of re-
He was with Nortel Networks, Ottawa, Ontario, Canada, from 1994 to 2000, search is clock and data recovery systems in deep
where he worked on a wide range of technologies, including design of circuits sub-micron CMOS.
for 10 and 40 Gb/s optical transmission systems using SiGe and InP HBTs. He In May, 1997, he joined Nortel, Ottawa, Ontario,
also led a program in MEMS technology, with a focus on optical applications, for a 16-month internship, where he performed clock
including optical crossconnects. His other activities included next-generation and data recovery system characterization, VCO de-
optical network development, as well as research on optical properties of SiGe sign, and high-speed measurements on SiGe MMICs
materials and devices. He recently joined Caspian Networks in Palo Alto, CA, under the guidance of Dr. Y. Greshishchev.
as Senior Advisor in Optical Networking.
1120 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

Clock and Data Recovery IC for 40-Gb/s


Fiber-Optic Receiver
George Georgiou, Member, IEEE, Yves Baeyens, Member, IEEE, Young-Kai Chen, Fellow, IEEE,
Alan H. Gnauck, Senior Member, IEEE, Carsten Gröpper, Peter Paschke, Rajasekhar Pullela, Mario Reinhold,
Claus Dorschky, John-Paul Mattia, Timo Winkler von Mohrenfels, and Christoph Schulien

Abstract—The integrated clock and data recovery (CDR)


circuit is a key element for broad-band optical communication
systems at 40 Gb/s. We report a 40-Gb/s CDR fabricated in
indium–phosphide heterojunction bipolar transistor (InP HBT)
technology using a robust architecture of a phase-locked loop
(PLL) with a digital early–late phase detector. The faster InP HBT
technology allows the digital phase detector to operate at the full
data rate of 40 Gb/s. This, in turn, reduces the circuit complexity
(transistor count) and the voltage-controlled oscillator (VCO)
requirements. The IC includes an on-chip LC VCO, on-chip clock
dividers to drive an external demultiplexer, and low-frequency
PLL control loop and on-chip limiting amplifier buffers for the
data and clock I/O. To our knowledge, this is the first demonstra-
tion of a mixed-signal IC operating at the clock rate of 40 GHz.
We also describe the chip architecture and measurement results.
Index Terms—Clock and data recovery, CDR, fiber-optic com-
munication receiver, InP HBT, limiting amplifier, phase detector, Fig. 1. Schematic diagram of a lightwave transceiver.
VCO.

adjustment is further complicated by packaging issues, specifi-


I. INTRODUCTION cally that of aligning the recovered clock after the off-chip filter
and the decision IC.
C LOCK and data recovery (CDR) is an important function
of the transceiver of a high bit-rate lightwave communi-
cation system. Since 40-Gb/s systems are nearing commercial
In PLL-based CDRs, the clock phase in the decision circuit
is automatically synchronized to sample the center of the time
deployment, the chosen CDR architecture must have few ex- slot of each bit. Also, the PLL can be integrated onto a single
ternal components and be insensitive to temperature and com- IC, greatly reducing temperature drift and phase relationship
ponent variations. CDRs reported in the literature use either problems.
a high quality-factor external filter-based architecture or a Previous implementations of digital PLL-based CDRs at
phase-locked loop (PLL) architecture. 40 Gb/s have employed half bit-rate clocking of CDR demul-
While the high- filter architecture is easier to implement, it tiplexer (DEMUX) combinations, to reduce the bandwidth
is susceptible to temperature and group delay variations in the required in the buffers and digital gates [2]–[4]. By clocking
filter [1]. Specifically, the temperature drift of the filter band- at 20 GHz to tolerate lower transistor bandwidth, the 2
pass and the temperature drift of the timing within the IC are parallel phase detector requires higher circuit complexity
not correlated. Also, once the clock signal is recovered, addi- with about twice the transistor count and a precise four-phase
tional precise clock phase adjustment is needed to set the deci- voltage-controlled oscillator (VCO).
sion sampling time to obtain proper phase margin. This phase In this paper, we leverage the InP heterojunction bipolar tran-
sistor (HBT) technology operating at the full 40-Gb/s data rate
to simplify the CDR architecture.
Manuscript received January 30, 2002; revised April 30, 2002.
G. Georgiou, Y. Baeyens, and Y.-K. Chen are with Lucent Technologies, Bell
Laboratories, Murray Hill, NJ 07974 USA (e-mail: gundsn@lucent.com).
A. H. Gnauck is with Lucent Technologies, Bell Laboratories, Holmdel, NJ II. CDR ARCHITECTURE
07733 USA.
C. Gröpper and P. Paschke are with Lucent Technologies, Optical Networking The typical transceiver architecture for a 40-Gb/s lightwave
Group, 90411 Nürnberg, Germany. system is shown in Fig. 1. The CDR IC of this work is high-
R. Pullela was with Lucent Technologies, Bell Laboratories, Murray Hill, NJ lighted in the receiver path.
07974 USA. He is now with Gtran Inc., Thousand Oaks, CA 91362 USA.
M. Reinhold, C. Dorschky, T. W. von Mohrenfels, and C. Schulein were with The core of the PLL-based CDR is the phase detector. The
Lucent Technologies, Optical Networking Group, 90411 Nürnberg, Germany. phase detector used here simultaneously recovers both clock and
They are now with Core Optics, 90411 Nürnberg, Germany. data. The digital early–late phase detector [5], consisting of data
J. P. Mattia was with Lucent Technologies, Bell Laboratories, Murray Hill,
NJ 07974 USA. He is now with BigBear Networks, Milpitas, CA 95035 USA. flip-flops and combinatory logic gates, is shown in Fig. 2. In the
Publisher Item Identifier 10.1109/JSSC.2002.801186. locked state, the – chain samples the center of the incoming
0018-9200/02$17.00 © 2002 IEEE
GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER 1121

Fig. 2. Digital phase detector architecture. Fig. 3. CDR IC block diagram.

data time slot, while the chain samples the data zero cross-
ings. The combinatory logic block, with inputs from the , ,
and latch chains, determines if the clock is early or late with
respect to the incoming data transition. This logic generates the
UP–DOWN control signal for the VCO.
and are generated by decision circuits (respectively, after
two and four latches). The phase difference between and is
180 (one bit). is generated after three latches with an inverted
clock. If the clock phase is correct, is always in the middle
of and . EXOR ( ) and NAND ( ) gate combinatory logic
converts , , and to the UP–DOWN pulses for controlling
the VCO. The logic equations are as follows.

UP
DOWN

The clock is early (slows down the VCO) if and ,


UP DOWN . The clock is late (speeds up the VCO) if
and , UP DOWN . The clock is correct if Fig. 4. CDR IC photograph.
. Obviously, data transitions are required for clock
recovery.
and symmetric loading (for example, the dummy latch in the
The chip also incorporates 15-dB gain-limiting amplifier
chain of Fig. 2).
buffers at the data I/O, at the divide-by-2 clock output (for the
Coplanar transmission lines with controlled impedance are
external DEMUX) and at the divide-by-32 clock output (for the
used for longer lines ( ) to reduce reflections and to im-
external coarse adjustment low-frequency lock loop). The static
prove timing accuracy. Lines driven by emitter followers are
frequency dividers use similar latches as those in the digital
kept short to avoid reflections due to impedance mismatch.
phase detector. Figs. 3 and 4, respectively, show the final chip
A series gate approach is used for the clock and data signals.
block diagram and photograph.
To improve high-frequency performance, a transadmittance
The block diagram of Fig. 3 is laid over Fig. 4. To maintain
(TAS) and transimpedance (TIS) combination [6], connected
symmetry in the 40-Gb/s data and 40-GHz clock signals, the
by coupled coplanar transmission lines is used for clock and
phase detector layout contains four symmetric rows (as in the
data amplification and distribution. TAS–TIS buffer amplifiers
block diagram of Fig. 2).
have a higher gain-bandwidth product (because of the active
TIS load) than does a simple ECL buffer. The buffer amplifier
III. CDR DESIGN AND FABRICATION
signal splitting and transistor level design are shown in Fig. 5.
Differential emitter-coupled logic (ECL) logic with 400-mV The VCO transistor level schematic is shown in Fig. 6. The
differential voltage swings is used. To simplify the layout VCO is based on a differential Colpitts topology using coplanar
process at a high bit rate, standardized digital and analog blocks transmission lines which can be modeled very accurately at 40
are designed and used. GHz, instead of inductors. The tuning input feeds two
All high-frequency inputs are terminated with on-chip 50- reverse-biased varactors realized from the base–collector junc-
resistors. Propagation delay of the 40-GHz clock to the , tions of open-emitter HBTs. Note that minimum phase noise is
, and latches and to the divider chain is a critical issue. not a design parameter since the early–late (or bang–bang) PLL
Matching propagation delay is achieved by symmetric layout architecture has very high bandwidth with respect to loop gain.
1122 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

Fig. 5. Buffer amplifier cascaded TAS–TIS architecture and transistor level schematic.

Fig. 6. VCO transistor level schematic.

Fig. 7. Retimed 40-Gb/s data (40-GHz clock, 30 mV/div, 10 ps/div) and


Thus, the VCO phase noise is reduced by the open-loop gain of <
phase-detector transfer functions (UP DOWN with 40-GHz clock offset
the locked PLL. + 10 MHz).

The chip is fabricated in an InP HBT technology [7]. Peak


the decoupling series resistor and 4 W internally. The chip area
transistor and are 160 and 135 GHz, respectively. The
transistors are conservatively biased at half the collector cur- is 1.75 mm 1.75 mm.
rent for peak . The nominal transistor has a 1 m 3 m
emitter biased with a collector current of 4 mA. The intercon- IV. MEASUREMENT RESULTS
nects use one metal layer for longer wires and one local metal Two versions of the CDR chip (with and without VCO)
layer for shorter wires. Passive elements are fabricated using were fabricated. The results of on-wafer measurements on
tantalum–nitride thin-film resistors and silicon–nitride dielec- the CDR with external VCO are discussed here. The input
tric capacitors. is a single-ended 0.5 signal generated by a commer-
A power-supply series decoupling resistor ( ) cial 40-Gb/s 4 : 1 multiplexer, driven with four independent
is used to prevent any spurious low-frequency oscillations that 10-Gb/s 2 pseudorandom bit streams (PRBS) of a
could arise in the packaged IC. The internal circuit is designed pattern generator. A synthesizer generates the 40-GHz clock
to operate between 4 and 5 V. The circuit draws nominally signal for this measurement. Fig. 7 shows the retimed 40-Gb/s
1 A. The nominal total power dissipation is 5.6 W, 1.6 W across eye from the CDR IC. (It should be noted that the on-wafer
GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER 1123

Fig. 8. Divide-by-2 frequency spectrum of on-chip VCO. VCO designed for


0
40 GHz but actual at 42 GHz divided to 21 GHz with an output power of 5 Fig. 9. Experimental optical link to measure the bit-error rate of 40-Gb/s RZ
dBm. transmission.

measurement is limited by the equipment. The retimed data


eye is sharper than that of the commercial multiplexer. Also,
the jitter is characteristic of the digital oscilloscope used for
the measurement.) Decision circuit phase margin greater than
180 is measured by changing the clock phase with respect to
incoming data and observing the output data eye. The phase
transfer function (UP and DOWN) at the bottom of Fig. 7 is
measured by introducing a 10-MHz offset between the clock
frequency and data bit rate.
Measurements of the CDR with on-chip LC VCO indicate
that the VCO is capable of driving 40-Gb/s digital gates. How-
ever, the VCO center frequency is higher than simulated, prob-
ably because of capacitance or inductance (transmission-line)
drift during this process run. The VCO operates in the band be-
tween 40.5 and 42.5 GHz. The divider chains operate up to a
frequency of 44 GHz. The output of the divide-by-2 is shown in
Fig. 8 for the VCO tuned to 42 GHz.
To evaluate the packaged performance of the CDR chip in an
optical transmission system, we used the CDR chip as a single
channel 1 : 4 DEMUX by applying 40-Gb/s data and 10-GHz
clock. (The packaging used here for 40 Gb/s is relatively simple.
The chip is mounted into a cutout in a composite Rogers 4003
on FR-4 board. This chip recess corresponding approximately
the chip thickness 8 mil, reduces the length of the bond wires
to the 50- coplanar GSSG transmission lines designed on the
Fig. 10. CDR IC used as single channel 1 : 4 DEMUX (40-Gb/s data, 10-GHz
R4003 substrate. (See [8, Fig. 6].) clock). Electrical output at 10 Gb/s and BER versus optical power measurement
The experimental optical time-division multiplexing of the optical link of Fig. 8.
(OTDM) link is shown in Fig. 9. As before, 4 : 1 MUX is used
to multiplex independent 10-Gb/s nonreturn-to-zero (NRZ) optical power. The received optical power is 29.5 dBm for the
2 PRBS data streams, from a commercial pulse pattern typical system required bit-error rate (BER) of 10 .
generator. The resulting electrical 40-Gb/s 2 PRBS NRZ
signal is converted to an optical 40-Gb/s 2 PRBS RZ
V. CONCLUSION
signal by a pulse-carving technique with cascaded modulators.
Optical power is converted back to the electrical signal with A complex ( 1350 transistors and 1.75-mm square)
a p-i-n photodetector. The 40-Gb/s RZ eye after the p-i-n mixed-signal CDR chip with on-chip VCO, amplifiers, deci-
is demultiplexed to 10 Gb/s using the CDR chip as a single sion circuit, and clock dividers was successfully fabricated
channel 1 : 4 DEMUX. with a state-of-the-art InP HBT technology. Fully functional
Fig. 10 shows the CDR IC performance as a one-channel chips at speed were obtained from the first iteration. Data is
DEMUX. The demultiplexed 10-Gb/s eye is very open and has retimed at 40 Gb/s and a good control signal is made available
very low jitter ( 4 ps limited by the oscilloscope bandwidth). to the on-chip VCO. The CDR IC is used as a DEMUX to
The error probability is also measured as a function of received convert an optical 40-Gb/s 2 PRBS RZ signal to an
1124 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

electrical 10-Gb/s 2 PRBS NRZ signal in an optical link Young-Kai Chen (S’78–M’86–SM’94–F’98) received the B.S.E.E. degree
experiment. An optical sensitivity of 29.5 dBm is measured from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., the M.S.E.E.
degree from Syracuse University, Syracuse, NY, and the Ph.D. degree from
at 10 BER. Cornell University, Ithaca, NY, in 1988.
From 1980 to 1985, he was a Member of Technical Staff in the Electronics
Laboratory of the General Electric Company, Syracuse, responsible for the de-
REFERENCES sign of silicon and GaAs MMICs for phase array applications. Since 1988,
he has been with Lucent Technologies, Bell Laboratories, Murray Hill, NJ,
[1] R. Yu, R. Pierson, P. Zampardi, K. Runga, A. Campana, D. Meeker, K. C.
as a Member of Technical Staff. Since 1994, he has been the Director of the
Wang, A. Peterson, and J. Bowers, “Packaged clock recovery integrated
High Speed Electronics Research Department. He is also an Adjunct Asso-
circuits for 40-Gb/s optical communication links,” in GaAs IC Symp.
ciate Professor at Columbia University, New York, NY. His research interest
Tech. Dig., 1996, pp. 129–132.
is in high-speed semiconductor devices and circuits for wireless and fiber-optic
[2] M. Wurzer, J. Bock, H. Knapp, W. Zirwas, F. Schumann, and A. Felder,
communications. He has authored more than 90 technical papers and holds nine
“A 40-Gb/s integrated clock and data recovery circuit in a 50-GHz f
patents in the field of high-frequency electronic and semiconductor lasers.
silicon bipolar technology,” IEEE J. Solid-State Circuits, vol. 34, pp.
Dr. Chen is a member of the American Physics Society and the Optical So-
1320–1324, Sept. 1999.
ciety of America.
[3] J. Hauenschild, C. Dorschky, T. W. von Mohrenfels, and R. Seitz, “A
plastic packaged 10-Gb/s BiCMOS clock and data recovering 1 : 4 de-
multiplexer with external VCO,” IEEE J. Solid-State Circuits, vol. 31,
pp. 2056–2059, Dec. 1996. Alan H. Gnauck (M’98–SM’00) received the B.S. degree in physics and the
[4] M. Reinhold, C. Dorschky, R. Pullela, E. Rose, P. Mayer, P. Paschke, Y. M.S. degree in electrical engineering from Rutgers University, New Brunswick,
Baeyens, J. P. Mattia, and F. Kunz, “A fully integrated 40-Gb/s clock and NJ, in 1975 and 1986, respectively.
data recovery/1 : 4 DEMUX IC in SiGe technology,” IEEE J. Solid-State In 1982, he joined AT&T (now Lucent Technologies) Bell Laboratories. He
Circuits, vol. 36, pp. 1937–1945, Dec. 2001. has designed and built multigigabit amplifiers, multiplexers, demultiplexers,
[5] J. D. H. Alexander, “Clock recovery from random binary signals,” Elec- and optical receivers, and performed record-breaking optical transmission
tron. Lett., vol. 11, pp. 541–542, 1975. experiments at single-channel rates of from 2 to 40 Gb/s. He has investigated
[6] H.-M. Rein, “Design considerations for very high speed Si-bipolar ICs coherent detection, chromatic-dispersion compensation techniques, CATV
operating up to 50 Gb/s,” IEEE J. Solid-State Circuits, vol. 31, pp. hybrid fiber-coax architectures, wavelength-division-multiplexed (WDM)
1076–1090, Aug. 1996. systems, and system impacts of fiber nonlinearities. His WDM transmission
[7] M. Sokolich, D. Doctor, Y. Brown, A. Kramer, J. Jensen, W. Stanchina, experiments include the first demonstration of terabit transmission. He is a
S. Thomas, C. Fields, D. Ahmari, M. Liu, R. Martinez, and J. Duvall, Technical Committee Member of the Optical Fiber Communications Confer-
“A low power 52.9-GHz static divider implemented in a manufacturable ence (OFC) 2003. He holds twelve patents in optical fiber communications.
180-GHz InAlAs/InGaAs HBT IC technology,” in GaAs IC Symp. Tech. His current research interests include the study of WDM systems with
Dig., 1998, pp. 117–120. single-channel rates of 40 Gb/s.
[8] G. Georgiou, P. Paschke, R. Kopf, R. Hamm, R. Ryan, A. Tate, J. Burm, Dr. Gnauck is an Associate Editor for IEEE PHOTONICS TECHNOLOGY
C. Schullien, and Y.-K. Chen, “High gain limiting amplifier for 10-Gb/s LETTERS.
lightwave receivers,” in Proc. 11th Int. Conf. InP and Related Materials,
1999, pp. 71–74.

Carsten Gröpper was born in Münster, Germany, in


1971. He received the Dipl.-Ing. degree in electrical
engineering from the Ruhr University, Bochum, Ger-
many, in 1998.
He joined Lucent Technologies, Nürnberg,
George Georgiou (M’92) was born in Greece
Germany, in 1998. He is currently with the Optical
in 1954. He received the Ph.D. degree in applied
Networking Group, Lucent, developing high-speed
physics from Columbia University, New York, NY,
bipolar ICs for 10- and 40-Gb/s optical communica-
in 1980.
tion systems.
He joined AT&T (now Lucent Technologies) Bell
Laboratories in 1980 to develop sub-micron X-ray
lithography systems. He proceeded into process
integration of novel gate and metal structures for
sub-micron silicon CMOS. He is currently a Member
of Technical Staff with the High-Speed Electronics Peter Paschke was born in Dusseldorf, Germany, on
Research Department of Lucent Technologies, Bell May 21, 1959. He received the M.S. degree in elec-
Laboratories, Murray Hill, NJ. His current interest is mixed-signal IC design trical engineering from the Ruhr University, Bochum,
for high-speed lightwave communications systems using InP and SiGe HBT Germany, in 1988.
technologies. In 1988, he joined Philips Kommunikation Indus-
tries, Nürnberg, Germany, as a Full Custom ASIC
Designer. He is currently a Technical Manager with
Lucent Technologies GmbH, Nürnberg, where he is
responsible for the high-speed ASICs. His main focus
is analog circuits such as laser drivers and limiting
amplifiers for high bit rates up to 40 Gb/s. In the light-
Yves Baeyens (S’87–M’96) received the M.S. and wave system area, he has been involved in 2.5-Gb/s receiver design and several
Ph.D. degrees in electrical engineering from the research projects for 40 Gb/s.
Catholic University, Leuven, Belgium, in 1991
and 1997, respectively. His Ph.D. research was
performed in cooperation with IMEC, Leuven, and
treated the design and optimization of coplanar Rajasekhar Pullela received the B.Tech. degree in electrical and communi-
InP-based dual-gate HEMT amplifiers operating up cations engineering from the Indian Institute of Technology, Madras, India, in
to the W-band. 1993. From 1993 to 1998, he was a graduate student researcher at the University
He was a Visiting Scientist with the Fraunhofer In- of California, Santa Barbara. During this period, he received M.S. and Ph.D de-
stitute for Applied Physics, Freiburg, Germany, for a grees in electrical engineering, studying device physics and high-speed circuit
year and a half, and is currently a Technical Manager design.
in the High-Speed Electronics Research Department of Lucent Technologies, During 1998–2000, he was a Member of Technical Staff with Lucent Tech-
Bell Laboratories, Murray Hill, NJ. His research interests include the design of nologies, Bell Laboratories, Murray Hill, NJ, designing high-speed ICs for fiber-
mixed analog-digital circuits for ultrahigh-speed lightwave and millimeter-wave optic communication systems. Since 2000, he has been with Gtran, Inc., New-
applications. bury Park, CA.
GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER 1125

Mario Reinhold was born in Mülheim/Ruhr, John-Paul Mattia received the B.S., M.S.E.E.,
Germany, in 1972. He received the Dipl.-Ing. degree and Ph.D. degrees in electrical engineering and
in electrical engineering from the Ruhr University, computer science from the Massachusetts Institute
Bochum, Germany, in 1998. of Technology (MIT), Cambridge.
He joined the Optical Networking Group, Lu- He began working in high-speed electronics
cent Technologies, Nürnberg, Germany, in 1998, at MIT Lincoln Laboratory in 1989. In 1996, he
where his activities focused on the development joined Texas Instruments Incorporated in the DSP
of various analog and digital high-speed bipolar R&D organization. From 1997 to 2000, he worked
ICs for 40-Gb/s and advanced 10-Gb/s fiber-optic in the High-Speed Electronics Group, Lucent
communication systems. Since 2001, he has been Technologies, Bell Labs, designing and testing
with CoreOptics Inc., Nürnberg, working on a circuits for lightwave communication systems. Since
next-generation 40-Gb/s chipset. July 2000, he has been with Big Bear Networks, Sunnyvale, CA, where he is
Chief Technical Officer of Electronics.

Claus Dorschky received the Dipl.-Ing degree in electrical engineering from


Friedrich Alexander University, Erlangen, Germany, in 1986. Timo Winkler von Mohrenfels, photograph and biography not available at
He was with Phillips Kommunikation Industries (later Lucent Technologies), time of publication.
Nürnberg, Germany, for 14 years, working in the development department for
high-speed optical transmission systems. His research interests include design
and integration of analog and mixed-signal full custom ICs for 10- and 40-Gb/s
as well as integration of optical receivers and transmitters into single-wave-
length and DWDM transmission systems at those bit rates. In early 2001, he Christoph Schulien, photograph and biography not available at time of publi-
cofounded CoreOptics Inc., Nürnberg. cation.
1156 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

Clock/Data Recovery PLL Using Half-Frequency Clock


M. Rau, T. Oberst, R. Lares, A. Rothermel, R. Schweer, and N. Menoux

Abstract— A clock and data recovery PLL is described for


serial nonreturn-to-zero (NRZ) data transmission. The voltage
controlled oscillator (VCO) works at half the data rate, which
means for a 1-Gb/s data rate, the VCO runs at 500 MHz. A
specially designed phase comparator uses a delay-locked loop
(DLL) to generate the required sampling clocks to compare clock Fig. 1. Classic PLL.
and data. The VCO can typically be tuned from 350 MHz to 890
MHz and the phase-locked loop (PLL) locks between 720 Mb/s
and 1.3 Gb/s. Data recovery is error free up to 1.2 Gb/s with 2) loop filter, filtering the phase detector output and form-
a 9-b pseudorandom data sequence. The core consumes 85 mW ing the control signal for the oscillator;
(3.3 V) at 1 Gb/s.
3) voltage controlled oscillator (VCO).
Index Terms—Bang-bang control, CMOS digital integrated cir- The unusual feature in our design is the phase detector, which
cuits, data communication, high-speed integrated circuits, phase
locked loops, synchronization. uses a delay-locked loop (DLL) to generate multiple sampling
clocks. Thus, the VCO can run at only half the data rate,
which means that we can detect a 1-Gb/s serial data stream
I. INTRODUCTION with a 500-MHz VCO. This relieves the timing constraints
in the phase detector logic and results in well correlated and
D IGITAL signal processing becomes economical in con-
sumer applications. The main requirement there is low
cost in mass production. Digital processing and transmission
data independent control signals. Also, at the lower frequency
the VCO tuning range is large enough to compensate all
has to be carried out with low power and in cheap IC packages. technology parameter variations. With this architecture we
Data transmission between different digital signal process- could achieve higher data rates.
ing IC’s influences significantly the power consumption and The block diagram of the circuit is shown in Fig. 2. No
the system cost. For video signal transmission in 100 Hz TV external components are required for the PLL. The loop filter
sets, typically 16 data lines in parallel are driven with 27 MHz capacitor is integrated on chip together with the VCO, the
rail-to-rail nonreturn-to-zero (NRZ) data signals. Sharp data phase comparator, and a charge pump. The data stream is
transitions are in use to ensure reliable synchronous operation. retimed in two flip-flops with the inverted and noninverted
A power saving alternative could be found in low-swing high- clock. Two flip-flops are required because the clock has only
speed serial data transmission in the range of 500 Mb/s or half the data rate. These two half-speed data streams are
more. However, this kind of high-speed data transmission combined in a multiplexer, forming an output stream at the
has to be asynchronous. The most economic solution avoids original data rate. A lock-in circuit is realized on chip, because
separate transmission of the clock. In that case, clock recovery the phase comparator is not frequency sensitive.
from the NRZ data stream is required. In this paper we describe
a phase-locked loop (PLL) which is designed to process more III. PHASE COMPARATOR
than 1 Gb/s data in a 0.5- m CMOS technology.
The PLL adjusts the clock to an incoming data stream.
Because of the random nature of data there is not necessarily
II. ARCHITECTURE a data transition at every clock cycle. The loop has to handle
sequences of consecutive zeroes or ones in the data stream.
The PLL generally consists of three building blocks (Fig. 1): The following phase comparator output signal properties are
1) phase comparator, detecting the phase difference be- essential.
tween the data and the recovered clock; First, the phase comparator must not give any output signal
if there is no data edge. Second, the duration of the control
Manuscript received December 15, 1996; revised February 6, 1997. This signal pulses at the data transitions is important, especially if
work was supported in part by the German Ministry for Education and there are few of them. In general, for a good loop performance
Research under Contract 01M2880A.
M. Rau was with the University of Ulm, Germany. He is now with the control signal should be proportional to the phase error.
Siemens AG, 81359 Munich, Germany. However, for very high operating frequencies, analog signals
T. Oberst was with the University of Ulm, Germany. He is now with depend on the data pattern and become highly nonlinear,
DASA, D-89077 Ulm, Germany.
R. Lares and A. Rothermel are with the Microelectronics Department, because they do not settle during the bit duration. It was
University of Ulm, D-89081 Ulm, Germany. found by simulation that different phase detectors with analog
R. Schweer is with Thomson Multimedia, D-78048 Villingen- outputs [1], [2] limit the PLL operating frequency. On the other
Schwenningen, Germany.
N. Menoux is with Thomson, 38240 Meylan, France. hand, clock recovery schemes based on sampling techniques
Publisher Item Identifier S 0018-9200(97)04386-2. [3], [4] result in uniform digital control pulses. They are
0018–9200/97$10.00  1997 IEEE
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997 1157

Fig. 2. Clock recovery block diagram.

(a)

Fig. 3. Phase comparator.


(b)
best suited to support highest possible data rates at a given
technology.
The phase comparator used here is an extension of the
circuit from [3], modified to work with half the “normal”
clock frequency (Fig. 3). The data stream is sampled at four
equally spaced timepoints. The logic circuitry driven by the (c)
flip-flops generates the up and down control pulses for the Fig. 4. Operation of the phase detector: (a) data at sampling time B equals
VCO according to Fig. 4. Because these control pulses are the data at the preceding sampling time A ) data transition is late)
frequency up, (b) data at sampling time B equals the data at the following
generated by clocked flip-flops, they are of well defined width.
sampling time A ) )
)
data transition is early frequency down, (c) data at
The advantage is that they do not depend on the data pattern. sampling time A equals the data at the preceding sampling time A no data
On the other hand, they do not reflect the amount of the phase edge, no control signal output.
error, either. The pulse width is constant, even for very small
phase errors. This so-called bang-bang operation generates an
bang-bang operation results. Also, there is an increased short
increased jitter in the locked state. However, the magnitude
current inside the flip-flops that has to be limited.
is much smaller compared to the one introduced by data-
For uniform pulses and small jitter, absolutely identical
dependent and nonlinear analog pulses at high frequencies.
sampling intervals are required. Therefore, a DLL has been
The phase logic evaluates only rising signal edges, in order
not to depend on duty cycle variations of the input signal. implemented to generate four 90 shifted clock phases clk1
There is an issue to be taken care of when dimensioning clk4 from the VCO output signal (Fig. 5). The loop compares
the flip-flops and the phase logic. The stable operating point the phase of the original clock to a clock fed through four
of the loop is reached when the signal is sampled exactly adjustable delay elements. The clock signal repeats with a
at its transition (see Fig. 4). Thus the loop forces the flip- period . A delay element in Fig. 5 can therefore delay by
flop to sample the metastable state, which is not allowed in , or as well by . By rearranging the output signals,
normal flip-flop operation. In this application, however, it is delay times of are also possible. With a
not critical for the operation. If the metastable state is sampled, delay element for , it is not possible to compensate for
it does not matter whether it will be interpreted as up or down, all technology and environment variations. Therefore, it is
because any decision is equally wrong, as we are at the stable necessary to select a larger value for the delay, to just be
operating point, i.e., zero phase error. Only the jitter of the able to deal with all technology parameter variations.
1158 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

Fig. 5. DLL to generate all 90 phase shifted sampling clocks with high
accuracy. Fig. 7. VCO schematic.

Fig. 8. VCO frequency versus control voltage.

Fig. 6. Current mirror charge pump.


VI. LOCK-IN CIRCUIT
The bang-bang operation and the data dependent phase
The stability of the system containing two coupled loops detector output signal require a narrow loop bandwidth for
can be guaranteed for two reasons. First, the DLL is a first- a low jitter. This results in a reduced pull-in range of the
order loop and inherently stable. Second, the time constants PLL. Instead of adapting the loop bandwidth during operation
of the two loops are two orders of magnitude different. we created a lock-in circuit which is active only after power
up. For lock-in, a 1010-sequence has to be fed to the circuit.
The VCO is swept, starting with the highest frequency. When
IV. CHARGE PUMP AND LOOP FILTER clock and input frequencies are the same, the sampled data
The control pulses drive a current mirror charge pump [6] (before the Mux) do not change. An edge-triggered monoflop
(Fig. 6) which assures that the charge delivered to the loop then stops the frequency sweep and closes the PLL.
filter does not vary with the VCO control voltage. The charge
pump allows the realization of an ideal integrator transfer VII. LAYOUT
function (pole at ) with no additional active amplifier, Fig. 9 shows the test chip. A large area is used for the on-
resulting in a zero-phase error in steady state. A simple RC chip loop filter capacitor (upper left). A comparable area is
network shown in Figs. 2 and 6 is used for the low-pass loop required for the ring oscillator, including its load capacitors
filter. The current level of the charge pump and hence the (lower left). Because the series resistance of those load capac-
charge delivered at every rising data transition can be set itors is more critical compared to the one in the loop filter,
to a small value. This allows the implemention of the loop a finer finger structure was chosen. All capacitors have been
capacitor on chip. realized as MOS-transistor gates. No special mask is required.
In the top right area are located the lock-in circuit and the
DLL with its loop filter, whereas in the lower middle and to
V. VCO the right, buffers and control logic can be seen.
Both high oscillation frequency and a wide tuning range
are required. We choose a ring oscillator design with variable VIII. MEASUREMENT RESULTS
load capacitors (Fig. 7) based on [5]. Duty cycle is not an We verified locking of the PLL at data rates from 720 to
issue here, because the flip-flops all are triggered with the 1300 Mb/s with pseudorandom sequences up to bit
same edge; the DLL generates the required phase shifts. This at the data input. However, data recovery is not guaranteed
circuit can safely cope with all parameter variations. Fig. 8 under these conditions because of the clock jitter. Fig. 10
shows the VCO tuning characteristic. shows the maximum available data rates for different lengths
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997 1159

Fig. 11. VCO clock output and data output eye pattern at 1 Gb/s with a
(215 0 1)-bit length pseudorandom input sequence.

Chip core area is 0.38 mm , power consumption without


pad drivers is 85 mW at 1 Gb/s, 0.5- m CMOS, 3.3 V supply.
Only 1/4 of the power consumption is proportional to the clock
frequency, 3/4 are constant. The circuit consumes 91 mW at
Fig. 9. Chip micrograph.
1.3 Gb/s. The power saved by using only half the conventional
clock frequency is partly used to supply the DLL, which needs
21 mW ( 1/4 of total power at 1 Gb/s).
No external components are required, except one reference
current, which is not very critical (a 20% variation is
allowed).

IX. CONCLUSION
Complete on-chip clock and data recovery at 1 Gb/s is
feasible with a standard 0.5- m CMOS technology. On-
chip clock is only 500 MHz in this case. Data are directly
demultiplexed one to two in the retiming flip-flops. A multi-
plexer to regenerate the original data stream was included for
measurement purposes only. In applications, serial-to-parallel
Fig. 10. Maximum data rate versus pseudorandom sequence length for conversion will normally follow the PLL. In that case, the
error-free receiving during time of measurement (complies with error rate
smaller than 1 10011 ).
1 halved clock frequency is an advantage, because the following
blocks can be designed more easily.

of the pseudo random sequences for correct data recovery.


Measurement period was 10 clocks (corresponding to a bit ACKNOWLEDGMENT
error rate of ). The authors greatly acknowledge perfect layout support by
At very high data rates, clock and data phase precision has Y. A. Savalle and G. Kimmich from TCEC. They thank J.
to be better at the input of the retiming flip-flops, because Borel from SGS-Thomson for providing the design kit and
the “eyes” become smaller. The lower required phase jitter acknowledge the fast sample production in the factory.
corresponds to shorter pseudorandom sequences.
Fig. 11 shows the locked PLL at 1 Gb/s with a -bit
REFERENCES
length pseudorandom sequence. The clock jitter is about
350 ps, which is caused mainly by the bang-bang operation [1] T. H. Lee, “A 155-MHz clock recovery delay- and phase-locked loop,”
of the phase comparator. We believe that this behavior can IEEE J. Solid-State Circuits, vol. 27, pp. 1736–1746, Dec. 1992.
[2] B. Thompson, “A 300-MHz BiCMOS serial data transciever,” IEEE J.
be improved by reducing the uncertain time interval of the Solid-State Circuits, vol. 29, pp. 185–192, Mar. 1994.
sampling flip-flop, i.e., reducing their setup-and-hold times and [3] B. Lai and R. C. Walker, “A monolithic 622 Mb/s clock extraction data
retiming circuit,” in Int. Solid-State Circuits Conf., San Francisco, CA,
increasing the clock slope. 1991, vol. 306, pp. 144–145.
All measurements have been done with the IC housed [4] A. Pottbaecker, U. Langmann, and H.-U. Schreiber, “A si bipolar phase
in a standard 16-pin dual in-line ceramic package which and frequency detector IC for clock extraction up to 8 Gb/s,” IEEE J.
Solid-State Circuits, vol. 27, pp. 1747–1751, Dec. 1992.
shows rather poor high-frequency performance. It was our [5] M. Bazes, “A novel precision MOS synchronous delay line,” IEEE J.
goal to demonstrate the circuit in a critical environment. Better Solid-State Circuits, vol. 20, pp. 1265–1271, Dec. 1985.
[6] A. Waizman, “A delay line loop for frequency synthesis of de-skewed
results could be expected when using packages with shorter clock,” in Int. Solid-State Circuits Conf., San Francisco, CA, 1994, pp.
leads. 298–299.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000 1353

SiGe Clock and Data Recovery IC with Linear-Type


PLL for 10-Gb/s SONET Application
Yuriy M. Greshishchev, Member, IEEE, and Peter Schvan, Member, IEEE

Abstract—An integrated 10 Gb/s clock and data recovery (VCO) frequency (that causes jitter) during a long run of data
(CDR) circuit is fabricated using SiGe technology. It consists of 0 s or 1 s. Second, charge-pump and VCO control circuits were
a linear-type phase-locked loop (PLL) based on a single-edge designed to provide a high degree of PLL filter isolation, or low
version of the Hogge phase detector, a LC-tank voltage-controlled
oscillator (VCO) and a tri-state charge pump. A PLL equivalent charge-pump offset current, in a tri-state. In addition, the charge
model and design method to meet SONET jitter requirements are pump has a high output impedance necessary for high loop gain
presented. The CDR was tested at 9.529 GB/s in full operation and in a PLL with passive filter. Third, the original Hogge PD was
up to 13.25 Gb/s in data recovery mode. Sensitivity is 14 mVpp
at a bit error rate (BER) = 10 9 . The measured recovered clock
modified to provide a single-edge operation and to extend linear
phase range. Fourth, circuit and layout cross-talk isolation tech-
jitter is less than 1 ps rms. The IC dissipates 1.5 W with a 5-V
power supply. niques similar to those presented in [6] are employed to prevent
jitter generation and sensitivity degradation due to a cross-talk.
Index Terms—Charge pump, clock and data recovery (CDR),
jitter generation, jitter tolerance, jitter transfer, phase detector, The CDR IC was implemented in IBM’s SiGe bipolar process
phase-locked loop (PLL), SONET, VCO. which includes pMOS devices.
Jitter characteristics of a LPLL depend to a large degree on
the PLL filter parameters. An LPLL equivalent model and de-
I. INTRODUCTION sign method to satisfy SONET requirements are presented in

I N A CLOCK and data recovery (CDR) circuit with in-


tegrated phase-locked loop (PLL), the reference clock is
extracted from the incoming data stream and is automatically
this paper. A theoretical jitter tolerance function is introduced
based on considerations similar to those presented in [7]. It is
shown that in a LPLL, all of the jitter characteristics speci-
aligned to the center of the data pulse independent of its pattern. fied by SONET requirements can be analytically expressed via
Two CDR ICs have been reported operating at 10 Gb/s: one jitter transfer bandwidth and PLL damping factor . The
with a linear PLL (LPLL) using a modified Hogge-type phase PLL bandwidth should be above 4 MHz to satisfy OC
detector (PD) [1] and the other with a binary PLL using a 192 SONET jitter tolerance mask requirement, and the damping
bang-bang (Alexander)-type PD [2].1 CDR jitter characteristics factor should be above – to satisfy 0.1-dB jitter peaking.
critical to SONET optical receiver design are Jitter Transfer In Section II the CDR architecture is described with an at-
(Bandwidth and Jitter Peaking), Jitter Tolerance, and Jitter tention to low jitter operation and the LPLL equivalent model
Generation. In a linear CDR (LCDR), jitter transfer charac- and its design method are considered. Then, in Section III the
teristics are independent of the jitter amplitude and can be building-blocks circuit diagrams are discussed. The CDR hier-
analytically predicted according to a LPLL theory. This feature archical simulation flow is given in Section IV. Finally, the IC
can be important for SONET applications, particularly if data implementation details and measured results are presented in
is to be retransmitted and jitter transfer must be well controlled. Sections V and VI.
This paper describes a 10 Gb/s LCDR with less than 1 ps
rms pattern-independent jitter generation required for SONET II. CDR ARCHITECTURE
application [3]. This jitter generation represents the best pub-
lished CDR result. A number of techniques have been used to A. The Architecture
achieve low jitter. First, a charge pump with tri-state is em- The CDR architecture includes a single-edge Hogge-type PD
ployed. This is a well known technique frequently used in con- with a decision circuit, a charge pump, an integrated LC-tank
junction with frequency-PD PLLs [4]. This technique is also VCO and a passive second-order PLL filter, as shown in Fig. 1.
called switched-filter PLL [5]. Generally, this approach pro- These four components constitute the LPLL. To minimize jitter,
vides a hold mode in the PLL filter in the absence of data transi- a tri-state PD and charge pump are used. The charge pump
tions and prevents variation of the voltage-controlled oscillator produces close to zero differential output current, when no data
transitions occurs. The CDR is fully differential. Maximum
Manuscript received December 8, 1999; revised February 2, 2000. differential input voltage of the CDR is 1 V . Two differential
The authors are with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. threshold slicing inputs and are used to
Publisher Item Identifier S 0018-9200(00)05928-X. optimize the threshold of the decision and the clock recovery
1The linear property of a LPLL is due to a linear phase response of the circuits within 80% of the data swing. The recovered clock and
Hogge-type PD. In publication [1], [4] the Hogge-type PD is called a phase data are transmitted to the corresponding outputs
comparator, which is a misleading terminology since the word comparator
implies a binary output. The bang-bang (Alexander)-type PD is a true phase
and via buffers and cross-talk isolation interface
comparator. (transmission lines and transmitters ) [6]. The amplitudes

0018–9200/00$10.00 © 2000 IEEE


1354 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

Fig. 3. CDR analytical jitter transfer compared to a SONET mask.

functions of : , . The CDR circuit


was designed for . Fig. 3 shows analytically calcu-
lated with 4 MHz compared to OC192
SONET mask. The bandwidth does not satisfy SONET require-
Fig. 1. CDR IC architecture. ment (to be below 120 KHz), but the jitter peaking is within the
recommended 0.1-dB jitter gain.
The CDR jitter tolerance is a measure of how much
peak-to-peak sinusoidal jitter can be added to the incoming
data before causing data errors2 due to misalignment of the
data and the recovered clock. In a CDR with LPLL, jitter
tolerance is defined by a jitter transfer function and the PLL
slew-rate capabilities [7]. Considering no slew rate limitations
in the loop (this is usually the case in a well designed CDR),
the frequency response of the jitter tolerance can be described
Fig. 2. Equivalent model of the linear PLL. by the following function:

of and are 1 V differential. The IC can (4)


also be used in a data recovery mode only, when the VCO clock
is overdriven with an external signal . determines the shape of the jitter tolerance response.
It can be used to compare the performance of the design with
B. PLL Equivalent Model for CDR Jitter Characteristics SONET jitter tolerance mask if is multiplied by the
Analyses jitter tolerance at high frequencies . The
Three types of jitter characteristics are important in SONET value of is defined by the CDR circuit design and
receiver design: jitter transfer function (bandwidth and jitter decision-circuit clock-phase margin. For the SONET OC192
peaking), jitter tolerance and jitter generation [8]. The PLL mask it is specified to be more than 15 ps at 4 MHz
bandwidth was specified to be between 4–10 MHz with less [8]. Fig. 4 shows analytically calculated for
than 0.1-dB jitter peaking and jitter generation below 1 ps 4 MHz compared to the OC192 SONET mask. The dotted line is
rms. The PLL was analytically designed using a continuous the response with asymptotic single-pole jitter transfer function
time approximation for the equivalent model of Fig. 2 [9]. [(A.4) in Appendix A]. The 20-dB/decade slope of the mask in
A damping factor above – is required to provide low the frequency range of 0.4–4 MHz coincides with the
jitter peaking. Due to overdamping, the PLL jitter transfer response for 4 MHz. SONET compliant LPLL design
approaches a single-pole low-pass-type response must have a bandwidth 4 MHz for minimum data tran-
with the following parameters (see Appendix A): sition density .
Several factors affect jitter generation of the CDR shown in
(1) Fig. 1. The recovered clock central frequency value is held as a
voltage on the capacitor . An offset current at the charge
(2) pump output in tri-state causes a frequency step

(3) (5)

where is 3-dB bandwidth of the PLL jitter transfer func- For practical filter parameters and expected maximum time
tion in Hz, is VCO sensitivity in Hz/V, is the loop interval with no data transitions, the voltage variation across
natural frequency, and is the average data transition density capacitor is negligible compared to the voltage step
factor (maximum for 0101 pattern, for . The phase jitter is proportional to the number of
PRBS pattern). In (3) the charge pump current is doubled, consecutive 0’s or 1’s in the data, as follows:
compared to the Gardner’s formula [9], since in a Hogge PD a
[ps] [MHz] (6)
current variation of corresponds to -radians of the data
phase. Both the bandwidth and the damping factor are 2The amount of errors is defined with 1-dB receiver input power penalty.
GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION 1355

Fig. 5. PD and decision circuit.

Fig. 4. CDR analytical jitter tolerance compared to a SONET mask.

Combining (1)–(6) the jitter can be expressed as follows:

[ps] [MHz] (7)

where is the relative current offset at the charge pump output


and [MHz] is the PLL bandwidth [MHz] at minimum ex-
pected data transition density factor .
Instantaneous voltage drop across the damping resistor
creates the well known frequency ripple. Peak-to-peak jitter as-
sociated with this ripple can be expressed as

[ps] [MHz] (8)


Fig. 6. PD simulated response.
where is the attenuation due to high order poles in the
PLL filter. Lower targeted values of increase jitter gen- circuit to provide independent data threshold optimization. A di-
eration. The single-edge PD, used in this CDR, reduces DF by vide-by-two circuit results in a single-edge operation of
a factor of 2 and doubles the jitter amplitude. Because of the the original Hogge-type PD [10]. Therefore the recovered clock
required high bandwidth , attention must be paid to the jitter is not affected by possible asymmetry between the rising
charge-pump offset and to the attenuation to keep and falling edges of the incoming data. A dummy latch circuit
jitter generation in the sub-picosecond range. is introduced to compensate for the delay. An ad-
Jitter can also be generated by the loop static phase error and ditional advantage of the single-edge operation is an extended
its pattern dependence. To minimize static error, a charge pump linear phase range, which is explained in Fig. 5. The
with high output impedance and a VCO with high control input output provides a phase-independent reference signal with a
impedance were employed. The VCO phase noise had little im- constant pulse width of about 70 ps at each positive data transi-
pact on the recovered clock jitter because of the high loop gain tion. The output is the phase difference signal in the form
and wide loop bandwidth achieved. of a variable pulse width of 70 50 ps. Fig. 6 shows SPICE
simulation results of the PD circuit phase response. The linear
C. PLL Design Method phase range is about 80 ps. In the absence of data transitions
The LPLL with previously described model is fully defined both outputs and are at a low level. This is detected
by five independent parameters: , , , , and as a tri-state by the charge pump.
. Filter components and are functions of these The front-ends of the decision circuit and the PD contain lim-
parameters and can be found from (1)–(3) (see Appendix B). iting amplifiers with differential slicing level control at the
Parameters and are mostly constrained by circuit input. To increase the time resolution of the decision circuit,
implementation. Their initial values do not impact, in the a master–slave–master structure is employed in the retiming
first order, LPLL jitter generation as is seen in (7) and (8). block . The sensitivity of the decision circuit, defined pri-
Parameters and are specified at (or marily by thermal and shot noise in the input slicing circuitry,
accounting for single-edge phase detection): to satisfy is simulated to be 13.4 mV at BER 10 . The simulated
jitter peaking and 4 MHz to satisfy jitter tolerance latching metastability region is less than 1 ps . This region is
mask. determined as a time zone in the clock-delay sweep where the
output of the decision circuit is not defined.
III. CIRCUIT DESIGN B. LC-Tank VCO
A. Phase Detector and Decision Circuit The block diagram of the LC-tank-based VCO is shown in
The block diagram in Fig. 5 shows the PD and decision cir- Fig. 7(a). The 10-GHz oscillator core is a cross-coupled differ-
cuit. The data decision circuit is split with the clock recovery ential circuit [Fig. 7(b)] [11]. The VCO includes a differential
1356 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

(a)

Fig. 8. PLL charge pump with a tri-state.

(b)

Fig. 7. VCO. (a) Block diagram. (b) LC-VCO core.

Fig. 9. Jitter generation due to offset current in the charge-pump tri-state.


control buffer and a pulse-edge sharpening limiting amplifier
to reduce jitter sensitivity to the cross-talk noise. The D. PLL Filter
has an open collector output to drive the transmission line inter-
face with a 50- termination at the far end. The control buffer The PLL filter is split into internal and , and external
is designed with pMOS source followers at the input. VCO fre- and components (see Fig. 1). Resistors R’ make jitter perfor-
quency is tuned with a varactor which is split into two parts: a mance less sensitive to the external parasitics and coupled noise.
coarse tune (to compensate for process variation) and a fine tune Capacitor (along with in Fig. 8) performs smoothing of
for frequency control in the loop. VCO phase noise is measured the PD output pulses and introduces the required attenuation
to be less than 80 dBc/Hz at a 100-kHz offset frequency. used in (8). In the PLL, resistor is limited by max-
imum voltage drop required for normal circuit operation. Pulse
C. Charge Pump smoothing also relaxes this constraint.

The charge pump (Fig. 8) employs a well known current- E. Cross-Talk Isolation
switching technique with the addition of a common-mode feed-
Two differential output buffers ( in Fig. 1) provide ad-
back amplifier . Care was taken to achieve unconditional sta-
justable differential voltage swing up to 1 V . The buffers are
bility in the feedback with sufficient gain and with a small value
physically separated from the VCO and PD with transmission
of capacitance in Fig. 8. Small is necessary for low jitter
line interfaces to prevent jitter generation due to cross-talk via
peaking in the PLL. The charge-pump output differential cur-
substrate and common grounds. The VCO is also separated
rent , as accounted for by the model in Fig. 2.
from the PD with similar transmission line interface. All of the
The charge pump is in a tri-state when both differential inputs
blocks have separate power-supply systems routed according
and are switched into a low (or high) state. A mismatch
to the isolation and analog–digital ground splitting techniques
between charge-pump current sources, , their finite output
described in [6]. All of the CDR circuits are fully differential.
resistances, and the VCO control input current cause an offset
The 10 Gb/s inputs and outputs are terminated on-chip with
current in the tri-state. Fig. 9 shows a plot of the PLL jitter due
50- resistors.
to relative offset current in the tri-state as calculated from
(7) for . Single-edge phase detection, employed in the
CDR, requires half the value compared to the double-edge IV. SIMULATION
Hogge-type PD. The top current sources and the feedback Five levels of hierarchical PLL analysis were carried out:
amplifier were designed with pMOS transistors. Appropriate analytical, behavioral linear, behavioral mixed-mode, circuit
matching was achieved by sizing the critical components and schematic level, and post layout with distributed parasitics. The
using symmetrical layout. To increase the charge-pump output last four levels are HSPICE-based. A mixed-mode behavioral
impedance, cascode current sources were employed. The library of linear and digital components was developed. All
measured offset was less than 0.2%. levels of simulation give consistent results, with increasing
GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION 1357

Fig. 11. Microphotograph of the SiGe CDR with linear PLL.

(a)

Fig. 12. CDR 9.529-Gb/s eye diagrams and the recovered clock. Input data 30
mV , 2 0
1 PRBS pattern.

VI. EXPERIMENTAL RESULTS


The IC performed as simulated, except for the VCO oscilla-
(b) tion frequency which was 5% lower than simulated. Measure-
Fig. 10. HSPICE simulated jitter characteristics. (a) Jitter tolerance. (b) Jitter
ments were done on-wafer using membrane probes from Cas-
transfer. cade Microtech. The PLL was locked by an external sweeping of
the VCO frequency using the 10-GHz adjust input (see Fig. 1).
The locking range is 25 MHz and the PLL stays locked within a
insight into jitter behavior at more detailed levels. Analytical
200-MHz frequency range. Fig. 12 shows recovered clock and
models of jitter transfer and jitter tolerance are based on the
CDR eye diagrams at 9.529-Gb/s data rate and 30 mV ,
second-order linear PLL theory as described in Section II.
PRBS input signal. Measured sensitivity was 14 mV at BER
With the addition of and , the PLL becomes a third-order
10 . This value is close to the simulated 13.4 mV (Sec-
loop. This was simulated along with on-chip, in-package, and
tion III), which indicates that a sufficient level of cross-talk iso-
external filter parasitics using HSPICE-based models. AC
lation is achieved. The jitter tolerance was measured for jitter
simulation results for ranging from 1/6 to 1 are shown
amplitudes below 40 ps . This jitter was generated by modu-
in Fig. 10(a) and (b). Jitter tolerance response was
lating the clock slicing level (see Fig. 1) with an ex-
designed to fit the mask at . For ,
ternal signal from dc to 100-MHz frequency range. No data er-
the is compliant with SONET requirements. Jitter
rors were detected associated with this jitter. This demonstrates
peaking is within the required 0.1-dB value for the simulated
jitter tolerance of more than 40 ps compared to the 15 ps
range. Sub-picosecond jitter generation was predicted in
SONET mask above 4 MHz.
circuit transient simulation.
To verify the maximum bit rate of the IC, it was tested at
13.25 Gb/s in a data recovery mode with an external clock
V. FABRICATION (Fig. 13). Sensitivity at 12.5 Gb/s was measured to
The CDR circuit was implemented in IBM’s SiGe HBT be 15.5 mV . Data-recovery clock-delay margin of 77 ps
bipolar process which includes pMOS devices. Detailed device at 10 Gb/s was the same as the BER tester delay margin,
characteristics are given in [12]. Die size is 3 3 mm (Fig. 11). confirming picosecond timing resolution of the decision circuit.
Three external RC-filter components are required to complete The recovered clock jitter was measured, with a digital oscil-
the CDR design. loscope, to be 1.85 ps rms versus 1.68 ps rms jitter of the refer-
1358 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

APPENDIX A
The CDR jitter transfer function is similar by definition to
the PLL phase transfer function . For the second-order
charge-pump PLL of Fig. 2, the phase transfer function is [9]

(A.1)

This formula can be rewritten as

Fig. 13. CDR data recovery eye diagrams at 13.25 Gb/s. Input data 14 mV ,
2 0 1 PRBS pattern. The RECCLK waveform is the PRBS generator (A.2)
reference clock translated by CDR.

where is a bandwidth of the


asymptotic single-pole low-pass transfer function

(A.3)

The jitter response approaches the low-pass re-


sponse at or at . The asymptotic
jitter tolerance shape function can be defined from (4) and (A.3)
as

(A.4)

Fig. 14. Phase-noise comparison of the CDR recovered clock, free-running APPENDIX B
VCO, and data pattern generator reference clock.
The following PLL filter components are found by solving
(1)–(3):
ence clock. Therefore jitter generated by the CDR is estimated to
be 0.78 ps rms. Phase noise was measured more accurately with
(B.1)
a HP4352B phase-noise meter (Fig. 14). Recovered clock phase
noise follows, with no error, the data reference clock noise down
to the CDR jitter noise floor at 110 dBc/Hz. The noise floor is (B.2)
reached within the bandwidth of the loop (designed to be above
4 MHz). Numerically integrated phase noise of the recovered
clock in 80 MHz bandwidth gives a jitter value of 0.77 ps rms. ACKNOWLEDGMENT
Jitter was found to be independent of the PRBS word length up The authors thank C. Kelly and P. Popescu for discussions,
to . The IC dissipates 1.5 W with a 5-V power supply. J. E. Rogers for his contributions to layout design and simula-
tions, M.-L. Xu for help with the output buffer layout, J. Showell
VII. CONCLUSION for assistance with the measurements, Dr. S. Voinigescu and D.
In this paper, a low-jitter integrated CDR with a linear-type Marchesan for their expertise in SiGe components modeling,
PLL has been demonstrated. The PLL equivalent model and de- and Dr. M. Copeland for advice on VCO phase noise analyses.
sign method to meet SONET jitter requirements were presented. Special thanks to R. Hadaway for his support and to IBM cor-
The IC was implemented in SiGe technology. Sub-picosecond poration for fabrication.
rms jitter with no jitter dependence on data PRBS pattern is
achieved. Jitter generation factors in CDR were considered. A REFERENCES
single-edge version of the Hogge-type PD and a tri-state charge [1] T. Morikawa et al., “A SiGe single-chip 3.3 V receiver IC for 10Gb/s op-
pump were designed to satisfy jitter requirements. PMOS tran- tical communication systems,” in ISSCC Dig. Tech. Papers, Feb. 1999,
pp. 380–381.
sistor circuits and cross-talk isolation technique were used to [2] R. C. Walker et al., “A 10Gb/s Si-bipolar Tx/Rx chipset for computer
improve CDR jitter performance. In a second-order LPLL a data transmission,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302–303.
bandwidth of more than 4 MHz and a damping factor of 4–6 [3] Y. Greshishchev and P. Schvan, “SiGe clock and data recovery IC
with linear-type PLL for 10 Gb/s SONET application,” in Proc. 1999
at minimum expected data transition density are recommended Bipolar/BiCMOS circuits and Technology Meeting, Sept. 1999, pp.
to satisfy OC192 jitter tolerance and jitter transfer peaking re- 169–172.
quirements. To satisfy jitter transfer bandwidth ( 120 KHz), [4] B. Razavi, “Design of monolithic phase-locked loops and clock recovery
circuits—A tutorial,” in Monolithic Phase-Locked Loops and Clock Re-
additional low-pass filtering of the recovered clock must be per- covery Circuits: Theory and Design, B. Razavi, Ed. New York, NY:
formed, for instance, in the PLL of a transmitter circuit. IEEE Press, 1996, pp. 405–420.
GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION 1359

[5] K. Kishine, N. Ishihara, K. Takiguchi, and H. Ichino, “A 2.5-Gb/s clock Peter Schvan (M’89) was born in Budapest, Hun-
and data recovery IC with tunable jitter characteristics for use in LANs gary, in 1952. He received the M.S. degree in physics
and WANs,” IEEE J. Solid-State Circuits, vol. 34, pp. 805–812, June from Eotvos Lorand University, Budapest, in 1975
1999. and the Ph.D. degree in electrical engineering from
[6] Y. Greshishchev and P. Schvan, “60 dB gain 55 dB dynamic range Carleton University, Ottawa, ON, Canada, in 1985.
10Gb/s SiGe HBT limiting amplifier,” IEEE J. Solid-State Circuits, In 1985, he joined Nortel Networks, Ottawa,
vol. 34, pp. 1914–1920, Dec. 1999. ON, Canada, where he worked in the area of
[7] L. De Vito, “A versatile clock recovery architecture and monolithic im- BiCMOS and bipolar technology development, yield
plementation,” in Monolithic Phase-Locked Loops and Clock Recovery prediction, device characterization, and modeling.
Circuits: Theory and Design, B. Razavi, Ed. New York, NY: IEEE Recently, his work has been extended to the design
Press, 1996, pp. 405–42. of multigigabit circuits and systems. He is currently
[8] “SONET OC-192 Transport System Generic Criteria,” Bellcore, Senior Manager of a group responsible for evaluating various high-perfor-
GR-1377-CORE, Mar. 1998. mance technologies and demonstrating, advanced circuit concepts required for
[9] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. fiberoptic communication systems. He is the author or coauthor of numerous
Commun., vol. COM-28, pp. 1849–1858, Nov. 1980. publications.
[10] C. R. Hogge, “A self-correcting clock recovery circuit,” IEEE J. Light-
wave Technol., vol. 3, pp. 1312–1314, Dec. 1985.
[11] B. Jansen, K. Negus, and D. Lee, “Silicon bipolar VCO family for 1.1 to
2.2 GHz with fully-integrated tank and tuning circuits,” in ISSCC Dig.
Tech. Papers, Feb. 1997, pp. 392–393.
[12] J. D. Cressler, “SiGe HBT technology: A new contender for Si-based
RF and microwave circuit applications,” IEEE Trans. Microwave Theory
Tech., vol. 46, pp. 572–589, May 1998.

Yuriy M. Greshishchev (M’95) received the


M.S.E.E. degree from Odessa Electrotechnical
Institute of Communications, Odessa, Ukraine,
in 1974 and the Ph.D. degree in electrical and
computer engineering from V. M. Glushkov Institute
of Cybernetics, Kyiv, Ukraine, in 1984.
From 1976 to 1994, he worked with research
and development organizations and academia on
high-speed ADC and DAC circuit theory and design,
primarily in the area of silicon bipolar and GaAs
MESFET integrated circuits. His Ph.D. research was
dedicated to the development of folding-type video ADCs embedded into TV
systems. In 1993, he was a Visiting Scientist at Micronet, Institution Center of
University of Toronto, Toronto, ON, Canada. In 1994, he joined the Department
of Electrical and Computer Engineering, University of Toronto, where he
conducted research on low-voltage GaAs MESFET circuits for digital wireless
communication. Since 1996, he has been with Nortel Networks, Ottawa, ON,
Canada, where he is responsible for the development of highly integrated
circuit solutions in emerging technologies for optical communications. He is
the coauthor of two books and more than 40 technical papers on the area of
data converters, high-speed circuit design, and statistical modeling.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999 1951

A 2–1600-MHz CMOS Clock Recovery


PLL with Low- Capability
Patrik Larsson

Abstract— A general-purpose phase-locked loop (PLL) with design is a large digital circuit incorporating a PLL-based
programmable bit rates is presented demonstrating that large clock generator with low-jitter requirements, which is the most
frequency tuning range, large power supply range, and low jitter common mixed-mode design today. Digital style PLL’s have
can be achieved simultaneously. The clock recovery architecture
uses phase selection for automatic initial frequency capture. The been suggested, e.g., [3], but these cannot compete with the
large period jitter of conventional phase selection is eliminated supply-noise rejection of differential analog circuitry.
through feedback phase selection. Digital control sequencing of A clock recovery PLL architecture suitable for pro-
the feedback enables accurate phase interpolation without the grammable bit rates is developed in Sections II and III with
traditional need of analog circuitry. Circuit techniques enabling emphasis on jitter reduction. Sections IV–VI present PLL
low-V dd operation of a PLL with differential delay stages are
presented. Measurements show a PLL frequency range of 1–200 circuit techniques that use the noise resistant differential pair
MHz at V dd = 1:2 V linearly increasing to 2–1600 MHz at V dd but avoid other “expensive” (in terms of headroom) analog
= 2:5 V, achieved in a standard process technology without low circuitry, such that low- operation is enabled in a standard
threshold voltage devices. Correct operation has been verified digital CMOS process without the need of low-threshold
down to V dd = 0:9 V, but the lower limit of differential operation devices.
with improved supply-noise rejection is estimated to be 1.1 V.
Index Terms—Frequency locked loops, frequency synthesizers, II. LOW-JITTER PHASE-SELECTING CLOCK RECOVERY
phase comparators, phase jitter, phase locked loops, phase noise,
synchronization. A basic PLL for clock recovery is shown in Fig. 1(a). In
most CMOS implementations, the VCO must have a tuning
range covering more than 50% of the target frequency
I. INTRODUCTION to guarantee high yield over large process variations. This

T HE continuing scaling of CMOS process technologies


enables a higher degree of integration, reducing cost.
This fact, combined with the ever shrinking time to market,
large frequency range requires special techniques for ini-
tial frequency locking since there exists no phase detector
for nonreturn-to-zero (NRZ) data that operates reliably with
indicates that designs based on flexible modules and macro- large initial frequency offset. Available techniques include
cells have great advantages. In clock recovery applications, frequency sweeping [4], using a replica VCO matched to the
flexibility means, for example, programmable bit rates requir- clock generating VCO [5], or initially locking the PLL to a
ing a phase-locked loop (PLL) with robust operation over a reference frequency with a frequency detector before switching
wide frequency range. Increased integration also implies that to the input data and locking with a phase detector [6].
the analog portions of the PLL (mainly the voltage-controlled One common technique requiring no special initialization
oscillator [VCO]) should have good power-supply rejection to is shown in Fig. 1(b). This dual-loop PLL can be traced
achieve low jitter in the presence of large supply noise caused back to [7], which was based on a delay-locked loop (DLL).
by digital circuitry. The multiple-output VCO in Loop A in Fig. 1(b) generates
Another trend is low-power design using reduced This a number of equally spaced clock phases at a frequency of
reduces the headroom available for analog design, causing This loop can have a large frequency tuning range
integration problems for mixed-mode circuits [1]. Further- since it is locked with a phase frequency detector (PFD). Clock
more, in applications where power consumption is a more recovery is performed by Loop B that generates the recovered
critical design goal than compute power, is not scaled as clock by selecting the clock phase from Loop A that is best
aggressively as to avoid leakage currents in OFF devices, aligned with the incoming data. If there is a frequency offset
which aggravates the headroom problem. For mixed-mode between and the incoming data, an appropriate clock
circuits with significant analog circuitry, dual- and/or dual- can still be generated by changing the Ctrl signal to select a
processing combined with a dc/dc converter [2] is a viable different phase over time.
solution. However, for circuits dominated by digital logic, it Frequency initialization is automatically achieved by select-
is difficult to justify the additional fabrication steps required ing appropriate and for the expected data rate. Most
for these solutions. A common case of the latter mixed-mode communication systems have a frequency tolerance of a few
hundred parts per million (ppm), eliminating any need for a
Manuscript received April 13, 1999; revised June 19, 1999. frequency detector in Loop B. The decoupling of the VCO loop
The author is with Bell Laboratories, Lucent Technologies, Holmdel, NJ
07733 USA. from the data recovery loop enables independent selection of
Publisher Item Identifier S 0018-9200(99)08963-5. bandwidth in those two loops. This allows a large bandwidth in
0018–9200/99$10.00  1999 IEEE
1952 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

Loop A is selected from the multiple VCO phases. When Loop


B detects a misalignment of the incoming data and the VCO
output clock, the Ctrl signal is changed to select a different
phase for the feedback clock. This will cause a phase change in
the divided clock feeding the PFD such that the charge pump
will alter the VCO control voltage stored in the loop filter.
(a) Therefore, sudden phase steps generated by the clock recovery
logic will be smoothed by the filter of Loop A, causing the
VCO clock to slowly drift toward the correct phase with a
rate of change determined by the bandwidth of Loop A. In a
622-MHz application with a division ratio of and a
bandwidth of Loop A equal to one-tenth of it will take
approximately clock cycles to complete a phase
switch. The jitter caused by a phase step in the structure in
Fig. 1(c) is therefore spread out over 80 clock cycles, sig-
nificantly reducing the jitter compared to Fig. 1(b). Feedback
phase selection has previously been applied to fractional-
frequency synthesizers for other purposes [10], [11].

(b)
III. AVERAGING PHASE INTERPOLATION
The smoothing effect of the loop filter can also be used for
phase interpolation. If the Ctrl signal in Fig. 1(c) alternates be-
tween two different clock phases every second cycle of the ref-
erence clock, the result will be a VCO clock phase correspond-
ing to the average of the two selected phases. In the test chip,
four levels of averaging phase interpolation were implemented
by circulating through four clock cycles and in each clock cy-
cle selecting phase or as the feedback clock. A quar-
ter phase interpolation generating is then achieved by
selecting for three consecutive clock cycles, then selecting
for the fourth cycle and repeating this sequence.
The architecture in Fig. 1(c) lends itself naturally to combin-
(c)
ing both averaging phase interpolation and standard current-
Fig. 1. Clock recovery PLL’s. (a) Standard, (b) phase selection, and (c) mode interpolation. A test chip was built in a 0.25- m,
+
feedback phase selection. ChP F denotes charge pump + loop filter.
2.5-V digital CMOS process to evaluate the jitter performance
of the phase selection architecture. A block diagram of the
Loop A to suppress VCO jitter [4], for example, jitter induced implemented VCO and phase control circuitry is shown in
by power-supply noise. At the same time, a low bandwidth Fig. 2. The phase select control code at the input consists
can be used in Loop B to reduce jitter transfer. This cannot of seven bits, of which two are directly fed to a finite state
be achieved by the PLL in Fig. 1(a), which has a single loop machine (FSM) that generates control signals for realizing
with conflicting design goals regarding loop bandwidth. the averaging interpolation. The remaining five bits of the
A disadvantage of a phase-selecting PLL is that the phase control code represent from which the code for
step that is generated when the Ctrl signal in Fig. 1(b) switches is generated by adding one. The FSM controls Mux1 to
to a new clock phase. This phase switching leads to large select one of the codes representing and in a four
cycle-to-cycle jitter (greater than or equal to the phase spac- clock period repetitive cycle, as described above. The five
ing) that can actually dominate the peak-to-peak jitter. By bits at the output of Mux1 are split into three bits coarse
increasing the number of phases, the phase spacing will be select and two bits fine select. The three coarse bits select
smaller with less jitter. More phases can be generated by two neighboring phases from a four-stage differential VCO
having more delay stages in the VCO, but this limits the having eight evenly spaced output phases and send these two
speed. An alternative is phase interpolation that enables a large phases to a current-mode interpolator. Mux2/Mux3 in Fig. 2
number of phases without degrading the VCO speed [8], [9]. receive one coarse bit each, and the third coarse bit is used
However, interpolators add analog circuitry to the design and to conditionally invert the output signals. The interpolator is
are prone to mismatch, which in the worst case can lead to similar to the Type-I circuit in [9] and is controlled by a four-
nonmonotonic phase spacing. bit temperature code derived from the two fine select bits.
A proposed remedy for the jitter due to phase steps is shown Both the current-mode interpolation and the averaging phase
in Fig. 1(c). Instead of selecting a clock phase feeding the interpolation are programmable in the test chip and can be
sampling flip-flop and the phase detector, the feedback clock in disabled. The two complementary multiplexers at the output
LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL 1953

Fig. 4. Phase shift versus phase code measured at 1 GHz.

of the histogram. As shown by the inset, this peak is slightly


off its ideal position and is smeared out due to the nonideal
ac behavior of the current-mode interpolator.
Feedback phase selection with 4 current-mode interpola-
tion eliminates the long tail in the histogram, bringing the
period jitter down to 7.9 ps rms, as shown in Fig. 3(c).
Fig. 2. VCO, phase selector, interpolator, and feedback multiplier. This indicates that the jitter is completely dominated by
the VCO jitter, and nearly all of the phase-switching jitter
caused by digital clock recovery can be eliminated. Fig. 3(d)
shows the period jitter histogram obtained when the current-
(a) mode interpolators are disabled and 4 averaging phase
interpolation is used instead. Its similarity to the result in
Fig. 3(c) proves that the same low jitter and the same number
(b) of discrete clock phases (32) can be achieved without the
analog interpolation circuitry.
Enabling both the current-mode interpolator and the aver-
aging phase interpolation gives a total of 128 selectable clock
(c)
phases. The graph in Fig. 4 shows the phase shift as a function
of the phase select control code when the period of the VCO
is 1 ns. The expected phase step is ns/ ps, whereas
(d) the largest measured step is 21 ps, resulting in a differential
nonlinearity (DNL) of 1.7 bits. The differential VCO makes the
phase curve near-symmetric around the midpoint, suggesting
that the integral nonlinearity (INL) of 94 ps is mainly due to
Fig. 3. 500-MHz period distribution histograms. (a) Clock recovery inactive,
(b) standard phase selection, (c) feedback phase selection, and (d) aver- delay mismatch in the VCO. The main contributing factor to
aging phase interpolation. Measurement conditions were V dd = 2:5 V, this mismatch is unbalanced parasitic wiring capacitors that
N = 25; fref = 20 MHz, and fdata = 499:4 Mb/s.
are difficult to match without incurring speed penalty. If Loop
B is a first-order loop or a well-damped second-order loop,
of the VCO/interpolator (Mux4/Mux5) allow the chip to be the feedback in Loop B will automatically select the best fit
configured for the scheme in either Fig 1(b) or (c), enabling phase select code, reducing the impact of INL. The maximum
a performance comparison. phase deviation in a clock recovery application is then the
Freezing the 7-bit phase select control code to a fixed DNL added to the VCO jitter.
phase gives a measured output period jitter1 of 7.6 ps rms In addition to jitter reduction and phase interpolation, feed-
when running the VCO at 500 MHz, as shown in Fig. 3(a). back phase selection also has other advantages when combined
This is the jitter inherent in the VCO and the output buffers. with other architectures. Using feedback instead of feed-
Configuring the chip for the standard phase selecting scheme forward phase selection reduces circuit complexity, thereby
in Fig. 1(b) with 32 clock phases (4 interpolation) gives the eliminating the need for good matching in an analog-style
jitter in Fig. 3(b), revealing a long tail in the histogram caused interpolator [12] and a high-speed parallel sampling structure
by a frequency offset of 1200 ppm between the incoming data [13].
and The phase spacing is ns/ ps such
that we can expect a second peak in the histogram 62.5 ps IV. VCO
away from the main peak, which is confirmed by the shape Recently, low-noise VCO’s utilizing high-swing comple-
1 Timing uncertainty between two consecutive edges of the generated clock. mentary signals have been presented (e.g., [14] and [15]).
1954 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

(a)

Fig. 6. Example VCO waveforms to estimate lower limit on V dd:

low oscillator noise suggests that the rise and fall times of the
output nodes should be made equal [16]. This is achieved
by reflecting half of to each of the controlled PMOS
loads by the current mirror formed by devices Md1–Md3.
Assuming that Md4 recently turned on, “node a” will be
discharged by a current of
At the same time, the complementary output node is pulled
(b)
to by a current equal to indicating equal
Fig. 5. Bias generation and one VCO delay stage of (a) replica bias scheme rise and fall times. A disadvantage of this oscillator is the
and (b) diode clamping.
additional parasitic capacitance of the diodes, which makes the
maximum operating frequency lower than that of the replica
Good 1 noise performance has been shown, but their power- bias structure. The additional gate capacitance of the diode
supply noise rejection is inferior to that of the standard analog loads can be eliminated by using NMOS diodes [17].
differential pair since they lack a high-impedance source, The minimum supply voltage for the VCO is
making the delay depend on Therefore, the analog style which has been verified by measurements
differential pair is preferable in applications where power- down to V. However, at this value of
supply noise is the main source of oscillator jitter. When a the VCO is no longer differential. An estimate of the min-
differential pair with resistor loads is used as a delay cell in a imum for differential operation can be derived from
VCO, the frequency is regulated by changing the tail current the simulated VCO waveforms in Fig. 6. The VCO output
as implemented by the control voltage in Fig. 5(a). To swings from down to approximately For
achieve a large frequency tuning range, it is desirable that the differential operation, it is required that both NMOS devices
output swing and common mode do not change significantly in the differential pair (Md4, Md5) are turned ON at the
with frequency. Often the replica-bias scheme in Fig. 5(a) is crossover point of the waveforms. Assuming a drop
employed, which relies on good matching between a replica of over the current source device Md6 leads to a minimum input
the delay stage (devices Mr1–Mr3) and the VCO delay stages voltage of At the lowest limit of this
to set the VCO output swing from to , giving a known input voltage is generated by the previous
common mode and swing independent of the speed-regulating stage in the oscillator, indicating a minimum of
current. A disadvantage of this technique is that the PMOS Measurements determined and to
load (Mr3) will operate as a current source at low frequencies, be 0.53 and 0.85 V, respectively, indicating a minimum
introducing high gain in the replica feedback loop. To prevent of about 1.1 V assuming a of 0.1 V. Note that this
instability, a large compensation capacitor is required, which is a theoretical number, since the differential operation of
introduces another pole in the PLL, leading to more intricate the VCO has zero tuning range at this value of Good
design. Furthermore, the amplifier in the replica bias loop power-supply rejection can also be achieved by the regulated-
requires additional headroom, thereby prohibiting low- supply structure in [18]. However, the requirement of a large
operation. decoupling capacitor generates contradicting design goals on
Fig. 5(b) shows a structure that achieves the good power- PLL bandwidth.
supply noise rejection of the analog differential pair, at the
same time enabling low- operation. The PMOS diodes V. CHARGE PUMP
are used for clamping the output voltage to a minimum level
of giving a fixed common mode and swing A. Bandwidth and Peaking Compensation
without the need for a replica bias circuit. This makes the VCO To reduce peak-to-peak jitter due to VCO noise, it is
suitable for a wide range of operating frequencies and supply advantageous to keep as high a PLL bandwidth as possible.
voltages. To guarantee clamping action, the NMOS tail current Traditional worst case design would keep the PLL bandwidth
must be larger than the current through the controlled and damping factor sufficiently far away from stability limits
PMOS load Furthermore, a proposed design goal for under all variations of the input reference frequency, the
LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL 1955

(a)

Fig. 7. Current multiplier generating charge-pump biasing voltages qbn


V
and Vqpb :

manufacturing process, and the division ratio in the feed-


back path The concept of self-biasing introduced in [19]
simplifies the design by eliminating process variations and
the input reference frequency from the stability constraints.
However, the PLL bandwidth is still a function of so
that maximum noise suppression can only be achieved for a
fixed In programmable applications, can vary by more
(b)
than an order of magnitude, indicating that the variation in
stability constraint can be dominated by instead of process Fig. 8. Jitter transfer functions for different division ratios. (a) Simulated
standard PLL. (b) Measured characteristics of Loop A with intentionally low
variations, as shown by the stability limit of a charge-pump damping.
PLL [20]

(1)

where is the input reference frequency (or effectively the


sampling rate of the phase detector), is the VCO gain,
is the charge-pump current, and is the loop filter resistance.
Other PLL design parameters, such as bandwidth and damping
factor, also change with Compensating loop parameters
for changes in guarantees that the PLL is always operating
with maximum bandwidth and fixed damping factor without
endangering stability. This can be done by setting the charge
pump current to
(2) (a) (b)

where is a fixed reference current. This is realized by the Fig. 9. (a) Charge-pump suffering from charge sharing (Type A). (b) Charge
removal transistors eliminate charge sharing (Type B).
current multiplier in Fig. 7, which generates the charge-pump
current by letting the individual bits of control binary
weighted current sources. of nodes ncs and pcs can never be matched, this will lead to a
The simulated jitter transfer function of a standard PLL in static phase offset, as shown in Fig. 10(a). This is the transfer
Fig. 8(a) demonstrates the change of loop parameters as is function of a phase-frequency detector followed by a Type A
altered. The damping factor is intentionally set low to show charge pump. The two transistors Mp and Mn in the Type
its dependence on The measured jitter transfer function of B charge pump in Fig. 9(b) will remove the charge from the
Loop A in Fig. 8(b) shows the desired independence of The nodes pcs and ncs when Up and Down are deactivated [22].
slight deviation of the curves is caused by transistor mismatch This leads to a large reduction in the phase offset, as shown
in the current multiplier. in Fig. 10(a).
For this application, static phase offset in Loop A is not
B. Charge Sharing critical. However, when analyzing the cause of phase offset, a
A common problem of many charge pumps is charge source of increased jitter is revealed. Fig. 10(b) indicates that
sharing. For the charge pump in Fig. 9(a) (Type A), charge the leakage from node pcs is larger than that from ncs. When
sharing is caused by the parasitic capacitance in nodes pcs the PLL is locked, the leakage mismatch is compensated for by
and ncs [21]. When is active, node pcs is charged to activating earlier than , giving a phase offset. Since
When deactivating some of the charge stored in node pcs the compensation charge is applied in the early portion of the
will leak through the current source device. Since the parasitics charge-pump activation time, it will cause voltage ripple on
1956 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

(a) (b)

(a)

(c) (d)

(e)
Fig. 11. Evolution of loop filter. (a) Ideal model, (b) MOS-only implemen-
(b) tation, (c) improved resistor linearity for low-V dd operation, (d) improved
Fig. 10. Characteristics of the Type A and B charge pumps. (a) Transfer capacitor linearity, and (e) final model where C3 models the well-to-substrate
function of PFD followed by charge pump. (b) Simulated IUp and IDown capacitance.
when net output charge is zero.

VI. LOOP FILTER


the loop filter, leading to phase jitter at the VCO output. The
charge removal transistors Mn and Mp in Fig. 9(b) eliminate The most common PLL loop filter is the simple RC circuit
the current tails resulting in a well-balanced Up and Down in Fig. 11(a). Common design options for the resistor are
activation time such that and cancel each other, poly or the channel resistance of an MOS transistor. For high
reducing the loop filter ripple. A further advantage of the Type resistance values, an MOS device is most attractive. However,
B charge pump is reduced 1 noise in the current source it has a disadvantage at low if implemented with the
transistors achieved by periodically resetting their to below straightforward configuration of Fig. 11(b). For a nominal
0 V [23], [24]. of 2.5 V, the effective resistance of the transmission gate
A limitation of the Type B charge pump is a reduced is nearly independent of the VCO control voltage ( ).
dynamic range of the VCO control voltage ( ). If However, the resistance becomes strongly dependent on
is less than there will be a current flowing for low For the resistance goes
through the Mp device to the output when the Dwn control is to infinity for some values of [25], [26]. Exchanging
inactive. When NMOS devices are used for speed-regulating the position of the transmission gate resistor and the MOS
the VCO, will never drop below , constraining capacitor as in Fig. 11(c) will make and the resistance of
to be less than which can easily be fulfilled. However, the NMOS device independent of The resistance still
the charge pump works only up to an output voltage of varies with but the variation is much less than for the
limiting the upper tuning range of the previous configuration.
VCO. However, the charge pump in Fig. 9(a) has the same Since the capacitor in Fig. 11(c) is a “floating” capacitor,
upper voltage limit. Mismatch in and is a similar it must be implemented with a PMOS device. When the VCO
source of jitter as charge sharing described above. For low control voltage approaches the MOS device is between
jitter, it is essential to have good matching, implying that the inversion and depletion, where its capacitance value is voltage
devices controlled by should be saturated. Again, dependent, as shown in Fig. 12. By altering the gate and
this requires source/drain connections of the PMOS as shown in Fig. 11(d),
Charge removal can also be done by ac coupling [18], but it will operate in accumulation where the capacitance value
this requires careful timing of the control signals in the charge is less voltage dependent, as shown for V in
pump. The solution to charge sharing in [21] is less suitable for Fig. 12. To avoid strong power-supply noise injection, the well
low- applications due to the common-mode restrictions must be connected to the same node as source and drain,
on the differential amplifier. as shown in Fig. 11(d). The corresponding filter model is
LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL 1957

Fig. 12. Voltage-dependent capacitance of filter in Fig. 11(c) and (d).

shown in Fig. 11(e), where is the parasitic well-to-substrate Fig. 13. Phase-frequency detector used in Loop A with details of Up section.
capacitance of the MOS capacitor. This filter has an impedance
of to the two classical PFD’s implemented by either four RS flip-
(3) flops or two resettable D—flip-flops. The precharged gate and
the shorter logic depth of this implementation make the delay
which is a close approximation to the impedance of the original shorter than for the standard PFD’s. This allows a smaller
filter in Fig. 11(a), given as delay in the reset path for eliminating the dead zone, such that
loop filter ripple will be reduced and generate less noise. An
(4) additional benefit of low logic depth is a reduction in phase
detector jitter caused by power-supply-dependent delays and
when as is common design practice [20]. device noise.
The reset delay of this PFD can be further reduced by
VII. PHASE-FREQUENCY DETECTOR letting the signal directly reset the precharged gate
simultaneously as the RS flip-flop is reset. This technique
Phase detectors may exhibit a dead zone, resulting in en- was not adopted in order to keep a conservative design,
larged jitter. A common design technique to avoid a dead zone guaranteeing operation with no dead zone. Similar precharged
is to make sure that both Up and Down output signals are fully gates have previously been used in PFD designs [27]–[29].
activated before shutting them both off. This is implemented
by generating a reset signal with an AND operation of Up VIII. FREQUENCY DIVIDER
and Down output and introducing a delay before feeding back
this signal to reset the phase detector. It is this reset delay To enable high flexibility, the frequency divider in Fig. 2 is
that causes the simultaneous and in Fig. 10(b). a fully programmable ( ) divider. The structure in
If the charge sharing in the charge pump is not perfectly [30] based on a clock-gated dual-modulus prescaler followed
cancelled or if there is a mismatch of and there will by a counter was chosen to achieve high speed at low supply
always be some current compensation, leading to phase offset voltages. The divider was realized in standard static CMOS
and loop filter ripple, as discussed in Section V. A longer logic, reaching a maximum operating frequency of 800 MHz in
reset delay results in a longer period during which the VCO simulations of worst case slow process variation at
is running at a different frequency due to the compensation V and C This exceeded the simulated speed limit of
current. Therefore, the reset delay should be minimized under the VCO. The potential startup deadlock in [30] was eliminated
the constraint that it has to be longer than the response time by logic that prohibits two consecutive clock pulse removals.
of the PFD with some additional design margin to avoid a
dead zone. IX. PLL OPERATING RANGE AND JITTER
A PFD with low logic depth is shown in Fig. 13, including The maximum operating frequency of the PLL measured
details of the Up section. Its operation is easiest to analyze at room temperature is plotted in Fig. 14 as function of
by assuming an initial state of Simulations indicate that the speed is limited by the VCO. A
This implies that and that is minimum of 0.9 V agrees well with the measured
precharged high. A rising edge on discharges and sets V. At low power-supply voltages, the speed cannot
without changing the state of the RS flip-flop. The compare with high-end circuits using standard However,
internal weak feedback in the path will assure that the operating frequency range exceeds that of low-voltage
is kept active even if falls. At the next rising edge on V, circuit implementations [2], [3], [25], [26]. The maximum
is activated, which sets This triggers the RS speed also compares favorably with another low-voltage PLL
flip-flop to precharge high, which shuts off ; and, at the based on a low-threshold process [18].
same time, is deactivated in a similar way. With a PLL bandwidth of 2 MHz, the tracking jitter is 5.2 ps
In summary, a positive edge on sets which is rms at 1200 MHz, as shown in Fig. 15(a). This measurement
reset by the next positive edge on This behavior is identical represents the standard deviation of the delay between a
1958 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

Fig. 14. Measured maximum PLL operating frequency.

triggering clock edge and a clock edge occurring 320 ns later


according to the setup in Fig. 2(e) in [31]. The delay was
chosen four times larger than the delay at which the “jitter
knee” occurs in Fig. 2(f) in [31] ( ns) to
get reliable data.
All signals on the chip are periodic with the reference
frequency, so when using the frequency divider output as a (a)
triggering signal, most of the jitter due to power supply and
board noise will cancel in the measurement. Such a setup gave
an rms jitter of 2.5 ps at 1200 MHz, as shown in Fig. 15(b).
The jitter with respect to an ideal reference would in this
case be ps [31]. This proves that device noise
in a standard ring oscillator is tolerable for communication
standards with very tight jitter tolerances such as SONET OC-
48 (2.5 Gb/s), which has a jitter specification of 4 ps rms at a
2-MHz PLL bandwidth. The VCO power consumption was 5
mW at 1200 MHz in simulation of extracted layout.
The figure of merit defined in [31] is estimated from

(5)

where is the PLL bandwidth, is the measured long-


term self-referenced tracking jitter, and is the tracking
jitter with respect to an ideal reference clock. Table I lists
measured and derived as function of operating frequency.
The jitter reported here is lower than that in [32] due to an
improved measurement setup and more accurate measurement
equipment. The of this oscillator is better than that reported
for bipolar implementations in [31] and a (b)
CMOS VCO with a of (as derived from the
Fig. 15. A 1.2-GHz tracking jitter histogram. (a) Including power supply
Slide Supplement of [33]). A for a complete and board noise. (b) Supply and board noise cancelled.
PLL is similar to reported for a stand-alone
VCO in [34], suggesting that noise contributions from the
other PLL components (charge pump, PFD, frequency divider) where is the VCO period. The derived period jitter
are much smaller than the VCO noise. The tracking jitter agrees to within 10% of the phase noise estimation technique
compares favorably with several other CMOS oscillators [e.g., in [38].
[8], [33], and [35]–[37]. degrades significantly at 1600-MHz The PLL tracking jitter is plotted as a function of frequency
operation, indicating the speed limit of the PLL. The for various power-supply voltages in Fig. 16. These measure-
measurements are also used for estimating the VCO period ments were taken with and a fixed PLL bandwidth.
jitter due to device noise, based on the relation [31] Since the loop filter resistance changes with the charge-
pump reference current in Fig. 7 was adjusted until a 3-dB
(6) PLL bandwidth of 2 MHz was measured. A bandwidth of 2
LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL 1959

TABLE I TABLE II
PLL JITTER FOR V DD = 2:5 V MEASURED WITH A PLL BANDWIDTH OF 2 MHz PLL CHARACTERISTICS

frequency locking is phase selection clock recovery, where


a multioutput VCO is locked onto a reference clock and a
clock recovery loop selects one of the output phases of the
VCO. The large period jitter in traditional phase selection
clock recovery is eliminated by the feedback phase selection
technique presented here. This scheme filters the phase jumps
through the PLL loop filter and also enables accurate phase
interpolation with digital circuitry only, as opposed to the
conventional analog-style phase interpolation.
Fig. 16. Tracking jitter as function of V dd and frequency with a PLL In applications where the PLL is programmable, important
bandwidth of 2 MHz. The right-hand scale represents the figure of merit loop characteristics such as bandwidth and damping factor
 [31]. change with the frequency multiplication mode. By making
the charge-pump current depend on the division ratio in the
MHz represents a scaling factor of approximately 5000 in (5), feedback divider, a fixed bandwidth and damping factor can
as indicated by the right-hand scale in Fig. 16. be obtained.
The VCO frequency is set by the tail current in the dif- Differential analog circuits have superior supply noise re-
ferential delay stages and is practically independent of jection compared to digital complementary logic styles and
Therefore, the power consumption at a fixed VCO frequency are therefore preferred in an environment with large power-
drops linearly with The graph in Fig. 16 shows that supply noise. However, previous differential PLL implemen-
the jitter does not change with suggesting that power tations have used circuits requiring large headroom, thereby
reduction (by lowering ) can be achieved without jitter prohibiting low- operation. Circuit techniques for PLL
penalty. This seems to contradict the common belief that jitter components are discussed that enable low- operation in
should increase with lower power consumption. However, the a process technology without low-threshold devices. Correct
critical parameter for low jitter is not power consumption operation has been verified down to V, but
but current consumption, as has previously been theoretically the lower limit for differential operation is estimated to be
derived for LC oscillators [39], [40]. As shown by these V in a process with
measurements, low-jitter design with a fixed power budget measured V and V.
should be based on a minimum and as large a current Measurements show that jitter is independent of con-
as can be tolerated. The measured power consumption at 250 tradicting the common belief that jitter is strongly correlated
MHz and 2.5 V is 18 mW and is dominated by buffers and to power consumption. At a fixed operating frequency, power
the current-mode interpolator. Simulations of extracted layout reduction is achieved without any penalty in jitter performance
indicate that the VCO consumes 0.7 mW at 250 MHz and 5 by lowering
mW at 1200 MHz. The PLL characteristics are summarized The tracking jitter at 500–1500 MHz was measured to be
in Table II. 2–5 ps rms dominated by device noise. This indicates that a
standard ring oscillator can fulfill the jitter specification for a
X. CONCLUSION SONET OC-48 receiver.
Clock recovery circuits in CMOS processes require special This paper demonstrates that a clock recovery circuit with
techniques for initial frequency locking. This need is due to the programmable bit rates can be realized with a large frequency
fact that CMOS process variations dictate a larger frequency tuning range. Robust operation and low jitter are achieved over
tuning range than can be covered by existing frequency a large range of power-supply voltages, making it ideal for
detectors for NRZ data. An attractive technique for initial low-power applications and suitable as a reusable macrocell.
1960 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

REFERENCES [24] S. L. J. Gierkink, E. A. M. Klumperink, T. J. Ikkink, and A. J. M.


van Tuijl, “Reduction of intrinsic 1=f device noise in a CMOS ring
[1] K. Bult, “Analog broadband communication circuits in pure digital deep oscillator,” in Proc. IEEE European Solid-State Circuits Conf., 1998,
sub-micron CMOS,” in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 272–275.
pp. 76–77. [25] J. Crols and M. Steyeart, “Switched-opamp: An approach to realize full
[2] H. Neuteboom, B. M. J. Kup, and M. Janssens, “A DSP-based hearing CMOS switched-capacitor circuits at very low power supply voltages,”
instrument IC,” IEEE J. Solid-State Circuits, vol. 32, pp. 1790–1806, IEEE J. Solid-State Circuits, vol. 29, pp. 936–942, Aug. 1994.
Nov. 1997. [26] A. M. Abo and P. R. Gray, “A 1.5 V, 10-bit, 14 MS/s CMOS pipeline
[3] W. Lee, P. E. Landman, B. Barton, S. Abiko, H. Takahashi, H. Mizuno, analog-to-digital converter,” IEEE J. Solid-State Circuits, vol. 34, pp.
S. Muramatsu, K. Tashiro, M. Fusumada, L. Pham, F. Boutaud, E. 599–606, May 1999.
Ego, G. Gallo, H. Tran, C. Lemonds, A. Shih, M. Nandakumar, R. [27] S. Kim, K. Lee, Y. Moon, D-K. Jeong, Y. Choi, and H. K. Lim, “A 960-
H. Eklund, and I. C. Chen, “A 1-V programmable DSP for wireless Mb/s/pin interface for skew-tolerant bus using low jitter PLL,” IEEE J.
communication,” IEEE J. Solid-State Circuits, vol. 32, pp. 1766–1776, Solid-State Circuits, vol. 32, pp. 691–700, May 1997.
Nov. 1997. [28] D. W. Boerstler and K. A. Jenkins, “A phase-locked loop clock generator
[4] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979. for a 1 GHz microprocessor,” in Proc. IEEE Symp. VLSI Circuits, 1998,
[5] R. J. Baumert, P. C. Metz, M. E. Pedersen, R. L. Pritchett, and J. A. pp. 212–213.
Young, “A monolithic 50–200 MHz CMOS clock recovery and retiming [29] H. O. Johansson, “A simple precharged CMOS phase frequency detec-
circuit,” in Proc. IEEE Custom Integrated Circuits Conf., 1989, pp. tor,” IEEE J. Solid-State Circuits, vol. 33, pp. 295–298, Feb. 1998.
14.5.1–4. [30] P. Larsson, “High-speed architecture for a programmable frequency
[6] K. M Ware and C. G. Sodini, “A 200-MHz CMOS phase-locked loop divider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol.
with dual phase detectors,” IEEE J. Solid-State Circuits, vol. 24, pp. 31, pp. 744–748, May 1996.
1560–1568, Dec. 1989. [31] J. A. McNeill, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits,
[7] J. Sonntag and R. Leonowich, “A monolithic CMOS 10 MHz DPLL vol. 32, pp. 870–879, June 1997.
for burst-mode data retiming,” in Proc. IEEE Int. Solid-State Circuits [32] P. Larsson, “A 2–1600 MHz 1.2–2.5 V CMOS clock-recovery PLL
Conf., 1990, pp. 194–195. with feedback phase-selection and averaging phase-interpolation for
[8] M. Horowitz, A. Chan, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung, jitter reduction,” in Proc. IEEE Int. Solid-State Circuits Conf., 1999,
W. Richardson, T. Thrush, and Y. Fujii, “PLL design for a 500 MB/s pp. 356–357.
interface,” in Proc. IEEE Int. Solid-State Circuits Conf., 1993, pp. [33] J. F. Ewen et al., “Single-chip 1062 Mbaud CMOS transceiver for serial
160–161. data communication,” in Proc. IEEE Int. Solid-State Circuits Conf.,
[9] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked 1995, pp. 32–33.
loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683– 1692, Nov. 1997. [34] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and phase noise
[10] S. Kasturia, “A novel fractional divider for improving the switching in ring oscillators,” IEEE J. Solid-State Circuits, vol. 34, pp. 790–804,
speed of phase-locked frequency synthesizers,” Bell Labs Tech. Memo., June 1999.
May 1995. [35] I. Novof, J. Austin, R. Chmela, T. Frank, R. Kelkar, K. Short, D. Strayer,
[11] J. G. Maneatis, personal communication, Feb. 1999. M. Styduhar, and S. Watt, “Fully-integrated CMOS phase-locked loop
[12] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, with 15–240 MHz locking range and 50 ps jitter,” in Proc. IEEE Int.
and T. Ishikawa, “A 2.5 V CMOS delay-locked loop for an 18 Mbit, Solid-State Circuits Conf., 1995, pp. 112–113.
500 megabytes/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. [36] Z.-X. Zhang, H. Du, and M. S. Lee, “A 360 MHz 3 V CMOS PLL
1491–1496, Dec. 1994. with 1 V peak-to-peak power supply noise tolerance,” in Proc. IEEE
[13] T. H. Hu and P. R. Gray, “A monolithic 480 Mb/s parallel Int. Solid-State Circuits Conf., 1996, pp. 134–135.
AGC/decision/clock-recovery circuit in 1.2 m CMOS,” IEEE J. [37] I. A. Young, M. F. Mar, and B. Bhushan, “A 0.35 m CMOS 3–880
Solid-State Circuits, vol. 28, pp. 1314–1320, 1993. MHz PLL N/2 clock multiplier and distribution network with low jitter
[14] C.-H. Park and B. Kim, “A low-noise 900 MHz VCO in 0.6 m CMOS,” for microprocessors,” in Proc. IEEE Int. Solid-State Circuits Conf., 1997,
IEEE J. Solid-State Circuits, vol. 34, pp. 1586–1591, May 1999. pp. 330–331.
[15] J. Lee and B. Kim, “A 250 MHz low jitter adaptive bandwidth PLL,” [38] A. Demir, A. Mehrotra, and J. Roychowdhur, “Phase noise in oscillators:
in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 346–347. A unifying theory and numerical methods for characterization,” in Proc.
[16] A. Hajimiri and T. H. Lee, “A general theory of phase noise in electrical ACM/IEEE Design Automation Conf., June 1998, pp. 26–31.
oscillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–195, Feb. [39] Q. Huang, “On the exact design of RF oscillators,” in Proc. IEEE
1998. Custom Integrated Circuits Conf., 1998, pp. 41–44.
[17] K. Iravani, F. Saleh, D. Lee, P. Fung, P. Ta, and G. Miller, “Clock and [40] P. Kinget, “Integrated GHz voltage controlled oscillators,” in Proc.
data recovery for 1.25 Gb/s Ethernet transceiver in 0.35 m CMOS,” Advances in Analog Circuit Design, Nice, France, Mar. 1999.
in Proc. Custom Integrated Circuits Conf., 1999, pp. 261–264.
[18] V. von Kaenel, D. Aebisher, C. Piguet, and E. Dijkstra, “A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,”
in Proc. IEEE Int. Solid-State Circuits Conf., 1996, pp. 132–133.
[19] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based
on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp.
1723–1732, Nov. 1996.
[20] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commu- Patrik Larsson received the Ph.D. degree from
nications, vol. COM-28, pp. 1849–1858, Nov. 1980. Linkoping University, Sweden, in 1995.
[21] M. Johnson and E. Hudson, “A variable delayline PLL for CPU- During his Ph.D. research, he investigated the
coprocessor synchronization,” IEEE J. Solid-State Circuits, vol. SC-23, inherent analog properties of digital circuits, such
pp. 1218–1223, Oct. 1988. as di=dt noise, clock skew, and clock slew rate.
[22] P. Larsson and J.-Y. Lee, “A 400 mW 50–380 MHz CMOS pro- After graduation, he joined Bell Laboratories, where
grammable clock recovery circuit,” in Proc. IEEE ASIC Conf. Exhibit, he is currently working on VCO’s and PLL’s for
1995, pp. 271–274. gigabit/second communication. He has also been
[23] I. Bloom and Y. Nemirovsky, “1=f noise reduction of metal-oxide- working on low-power digital filtering for cable
semiconductor transistors by cycling from inversion to accumulation,” modems, equalization, and clock recovery structures
Appl. Phys. Lett., vol. 58, no. 15, pp. 1664–1666, Apr. 1991. while maintaining his interest in di=dt noise.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999 805

A 2.5-Gb/s Clock and Data Recovery


IC with Tunable Jitter Characteristics
for Use in LAN’s and WAN’s
Keiji Kishine, Member, IEEE, Noboru Ishihara, Member, IEEE, Ken-ichi Takiguchi,
and Haruhiko Ichino, Member, IEEE

Abstract— A 2.5-Gb/s monolithic clock and data recovery Case 1) LAN’s such as gigabit/second ethernets, fiber
(CDR) IC using the phase-locked loop (PLL) technique is channels, and other optical interconnections. They
fabricated using Si bipolar technology. The output jitter use a single span of a transmission medium.
characteristics of the CDR can be controlled by designing the
loop-gain design and by using the switched-filter PLL technique. Case 2) Backbone networks or WAN’s such as syn-
The CDR IC can be used in local-area networks (LAN’s) and in chronous digital hierarchy (SDH) or synchronous
long-haul backbone networks or wide-area networks (WAN’s). optical network. They use line regenerators to
Its power consumption is only 0.4 W. For LAN’s, the jitter
generation of the CDR when the loop gain is optimized is 1.2 ps
transport information over long distances.
(0.003 UI). The jitter characteristics of the CDR optimized for For case 1), the CDR must suppress mainly the jitter generated
WAN’s meet all three types of STM-16 jitter specifications given due to noise in the CDR, so-called jitter generation. In case 1),
in ITU-T Recommendation G.958. This is the first report on a there is no jitter accumulation due to cascaded regenerators.
CDR that can be used for both LAN’s and WAN’s. This paper
also describes the design method of the jitter characteristics of
For case 2), however, the ITU-T G.958 recommendation
the CDR for LAN’s and WAN’s. for SDH stipulates other specifications [3]: a) jitter transfer
specification, which is the criterion of the suppression of
Index Terms—Clock and data recovery (CDR), IC, jitter sup-
pression, local-area network (LAN), low jitter, phase-locked loop
the noise in input signals to line regenerators, and b) jitter
(PLL), transmission receiver, wide-area network (WAN), 2.5 tolerance specification.
Gb/s. This paper describes a 2.5-Gb/s CDR that can be used in
both cases, which eliminates the need to fabricate two chips
I. INTRODUCTION with different characteristics. The key design techniques are
based on the switched-filter (SF) PLL technique and loop gain
O PTICAL communication systems, which are used in
local-area networks (LAN’s) and wide-area networks
(WAN’s), are expected to play an important role in realizing
adjustment using a gain control amplifier (GCA) circuit. The
CDR IC is fabricated using 0.5- m Si bipolar technology.
the future multimedia society. These systems must be com- The loop gain and loop bandwidth can be adjusted using a
pact, economical to produce, and efficient in terms of power control signal from outside the chip. For case 1), the rms
consumption. Given these requirements, researchers have been jitter generation of the CDR can be reduced to only 1.2 ps,
developing low-power and small-size optical receiver/sender and the capture range is 150 MHz. For case 2), the jitter of
(OR/OS) modules. A clock and data recovery (CDR) circuit the CDR meets the jitter specifications of the ITU-T G.958
is one of the key components of the OR, which must have recommendation. The rms jitter generation is 3.6 ps, and the
retiming, reshape, regeneration (3R) operation. To ensure that capture range is 50 MHz. The power consumption of the CDR
the receivers have low power consumption and are cost- for both cases is only 0.4 W.
effective and compact, it is essential to employ a single-chip, In Section II, the concept of the suppression of jitter in each
adjustment-free CDR IC using the phase-locked loop (PLL) case is discussed. It is explained that the SF PLL technique
technique without any high- components. can be used in the CDR for both cases. Design details
A number of approaches have been proposed for developing and the configurations of the circuits of the CDR are given
a CDR IC using the PLL technique [1], [2]. Generally, the in Section III. Section IV discusses the experimental results,
jitter specifications for the CDR differ depending on what which show that the CDR has very good jitter characteristics,
it is being used for, and jitter suppression is one especially and discusses the feasibility of using the CDR for various
serious problem for the CDR-IC design. There are different transmission systems.
jitter specifications for the following two applications.
Manuscript received August 19, 1998; revised February 8, 1999. II. CONCEPT FOR JITTER-SUPPRESSION DESIGN
K. Kishine and H. Ichino are with NTT Network Innovation Laboratories,
Yokosuka, Kanagawa 239-0847 Japan.
N. Ishihara is with NTT Opto-electronics Laboratories, Atsugi-shi, Kana- A. Jitter Characteristics of CDR Using the PLL Technique
gawa 243-01 Japan. Generally, output jitter of a CDR based on the PLL tech-
K. Takiguchi is with NTT Electronics Corp., Atsugi-shi, Kanagawa Pref.
243-0032 Japan. nique can be caused by two kinds of sources: 1) additive
Publisher Item Identifier S 0018-9200(99)04198-0. noise that accompanies the input signal [Fig. 1(a)] and 2)
0018–9200/99$10.00  1999 IEEE
806 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

(a)

(b)
Fig. 2. Loop-gain dependence of jitter.
Fig. 1. Noise source (a) in input signal and (b) in PLL.

the CDR. However, larger loop gain results in a larger cutoff


noise generated in the CDR [Fig. 1(b)]. In Fig. 1(b), two cases frequency of the jitter transfer function.
[noise forward and present for voltage-controlled oscillation Consequently, there is a tradeoff, as shown in Fig. 2,
(VCO)] have been shown to be equivalent in terms of VCO- between reducing the jitter shown in Fig. 1(a) (equivalent to
phase fluctuation [4]. In addition, because the phase drift of the reducing the cutoff frequency of the jitter transfer function)
VCO output due to the random input data is random, Fig. 1(b) and reducing the jitter shown in Fig. 1(b) (jitter generation).
gives a rough approximation of the noise due to the input data When the jitter of Fig. 1(a) is dominant, the loop gain must
pattern, with no jitter applied. be controlled to be lower (area 1 in Fig. 2). When the jitter of
To suppress the jitter caused by additive noise, the CDR Fig. 1(b) is dominant, it must be controlled to be higher (area
should be designed so that the noise bandwidth of the PLL is 2 in Fig. 2). The optimum loop gain depends on which type
minimized. The output jitter (the phase deviation [in rad]) of of jitter has a greater effect on the output jitter of the CDR
the CDR using the PLL technique is expressed as [4] and causes degradation of transmission quality in the system.

(1)
B. Design of the CDR
Given the previous discussion, it is clear that there should
where is the power spectra density of noise, is the input
be two types of CDR design, one for LAN’s and another for
signal amplitude, is the natural angular frequency, is the
WAN’s.
damping factor, and is the loop gain. When the loop gain
1) CDR for LAN’s: In the case for LAN’s, the jitter from
becomes larger, the jitter becomes larger, as shown in the
input signals is small because there is no jitter accumulation
Appendix. This means that smaller loop gain causes narrow
through the short and single laser-fiber-receiver span. We can
noise bandwidth, thereby suppressing jitter. It should be noted
therefore concentrate on reducing the jitter generation, which
that smaller loop gain leads to a smaller cutoff frequency of
is caused by the input-signal-pattern dependence of the circuit,
the jitter transfer function of a PLL. On the other hand, in
the fluctuation of the supply voltage, and device noise in the
order to suppress the jitter caused by noise generated in the
CDR. As described in Section II-A, this design should not
CDR circuit, the operation of the CDR circuit needs to be
utilize smaller loop gain to lower the cutoff frequency, but
made stable. This stability can be obtained by reducing the
instead should utilize larger loop gain to achieve smaller output
signal fluctuation in the CDR circuit caused by the input of
jitter.
consecutive data bits, device noise, and so on. This is the so-
2) CDR for WAN’s: In the case for backbone networks or
called suppression of jitter generation, which is specified for
WAN’s, the regenerator may be cascaded in order to transport
SDH in ITU-T Recommendation G.958. The output jitter (in
information over long distances, causing the jitter to accu-
rad) in this case (jitter generation) is expressed as [4]
mulate. Therefore, not only the jitter generation of the CDR
has to be taken into consideration but also the jitter transfer
(2) characteristics, which is the criterion of suppression of noise in
input data signals as given in ITU-T Recommendation G.958.
where is the power spectra density of noise. This equation is There is a tradeoff between reducing the jitter generation and
derived assuming that the instantaneous frequency deviation of reducing the cutoff frequency of the jitter transfer function.
the VCO output is caused by disturbance due to random phase a) Jitter transfer: The loop gain of the CDR IC using the
noise. In this equation, when the loop gain becomes larger, PLL technique, on an IC whose jitter transfer specifications
the jitter becomes smaller, as shown in the Appendix. In meet those of ITU-T Recommendation G.958, must be de-
other words, the jitter increases as the loop gain decreases. signed to be lower. The jitter transfer function of the 2.5-Gbit/s
Larger loop gain can reduce the jitter caused by the noise in PLL using a lag-lead filter can be expressed by substituting the
KISHINE et al.: CDR IC FOR LAN’S AND WAN’S 807

In other words, the SF circuit would provide equivalent high-


operation and achieve low jitter operation.
We also thought this advantage of the SF circuit could
be used to solve the tradeoff problem. Fig. 6 shows the
SPICE simulation results of the cutoff-frequency (of the jitter
transfer curve) dependence of the jitter generation of the 2.5-
Gb/s CDR’s both with the SF circuit and without it. Both
curves indicate that the jitter generation decreases as the
cutoff frequency increases. Furthermore, the jitter generation
of the SF CDR is 70% lower than that of the CDR without
the SF circuit. It is noteworthy that the suppression of jitter
generation by the SF circuit is marked. In addition, the jitter
characteristics of the SF CDR meet the STM-16 specifications
(rms jitter generation that is lower than 4 ps, and equivalent
jitter transfer specification in which the cutoff frequency at 3-
dB jitter gain is lower than about 2.8 MHz), while the CDR
circuit without the SF circuit fails to meet the specifications.
Fig. 3. Jitter transfer function. In the SPICE simulation, the source of jitter is the instability
of the circuit operation of the CDR, with no jitter applied at
phase transfer function for the jitter transfer function as random input. In the experimental results, the device noise,
the fluctuations of supply voltage, and so on are also noise
sources. The jitter generation in experiments is therefore larger
(3) than that in the simulation results. The simulation results do,
however, show the characteristics of jitter generation versus
This function is plotted in Fig. 3. It is a curve when the time jitter transfer functions.
constants and of the lag-lead filter described in the Given our findings, we conclude that in the design of the
Appendix are set to values that provide that the natural angular CDR IC using the PLL technique, an IC used in backbone
frequency becomes nearly 2 MHz, and the damping factor networks or WAN’s, the loop gain must be large enough so that
is larger than the value where the jitter gain peaking is less the jitter generation meets the ITU-T specs, yet small enough
than 0.1 dB. Fig. 3, in which it is stipulated that the loop gain so that suitable jitter transfer characteristics are obtained.
is lower than 3.2 10 (1/s), indicates that the curve of the
smaller loop gain meets the ITU-T (STM-16) specification. III. CIRCUIT DESIGN
b) Jitter generation: As described in Section II-A, as the
A block diagram of the CDR, including a GCA circuit
loop gain becomes small, it becomes more difficult to suppress
between the SF and VCO, is shown in Fig. 4 [6]. This CDR
the jitter generation because of the tradeoff between reducing
can be used in both short- and long- distance transmission
jitter generation and reducing the cutoff frequency of the jitter
systems by adjusting the loop gain through the GCA from
transfer function. To solve this problem, we introduce the SF
outside the chip. The main features of the CDR are 1) an SF
PLL technique. Fig. 4 shows the SF CDR configuration, which
circuit for equivalent high- operation [5], 2) optimum timing
we originally proposed as a way to maintain a precise clock
adjustment between extracted clock and input data, and 3) loop
signal, thereby achieving tolerance to the input of consecutive
gain control on optimization.
data bits. (Our previous work, 156-Mb/s SF CDR [5], has no
GCA in Fig. 4.) The main features of SF circuit operation
are that the PC output can be transferred to the low-pass A. VCO
filter (LPF) only when data transitions occur (sample mode) In Fig. 7, the circuit configuration of VCO is shown. The
and the LPF output can be constant during consecutive data oscillation frequency is controlled by the voltage swing of
inputs (hold mode). These features prevent the phase drift of “VC1, VC2,” which is determined by the feedback signal
VCO output during the input of consecutive data bits. We from GCA. The free-running frequency is de-
thought that the equivalent high- operation of the SF circuit termined by the current IF1, IF2, which can be controlled from
could be utilized to reduce the jitter generation. Fig. 5 shows outside the chip. The free-running frequency can be adjusted
simulation results of the change in differential output voltage from 2.2 to 2.8 GHz. It covers the free-running frequency
of the loop filter when the input signal changes from a 1/0- deviation caused by fluctuations in device performance. Fig. 8
repeated bitstream to consecutive data bits (in this case, “0”) at shows the tuning-voltage (feedback-voltage) dependence of
805 ns. Fig. 5 shows the filter output of the SF CDR levels out, the oscillation frequency. The simulation results are in good
while that of the CDR without the SF circuit begins to degrade agreement with the experimental results. The VCO modulation
at 805 ns. This means that the operation of the SF circuit is frequency sensitivity is designed to be about 1 GHz/V. The
equal to that of the larger RC time constant of the loop filter, oscillation frequency range by a feedback signal is 200 MHz.
and the jitter generation due to the input signal pattern would The tuning-range diagram is shown in Fig. 9. The total tunable
be more suppressed than that of the CDR without an SF circuit. range is sufficiently wide, from 2.0 to 3.0 GHz.
808 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Fig. 4. SF CDR with GCA.

Fig. 5. Output voltage of low-pass filter (differential output).


Fig. 6. Cutoff-frequency dependence of jitter generation.

B. GCA
the delay circuit itself is relatively small. In addition, in the
A current-bypass GCA circuit is used in the CDR (see
ECL circuit, there is no input-data-pattern dependence of the
Fig. 10). The gain of the GCA, which can be controlled from
response in the edge-inclined circuit capacitor. When the ECL
outside the chip, can be varied from 40 to 0 dB. To lower
delay circuit is used, the simulated jitter due to the input
the jitter generation of the CDR, the gain should be higher. On
data pattern is about 80% of that when an edge-inclined-delay
the other hand, to achieve the lower cutoff frequency of the
circuit is used. As a result of using the ECL circuit, the jitter
jitter transfer curve, it must be lower. Therefore, gain should
due to the input pattern effect is more suppressed than in the
be adjusted according to the jitter specification in each case.
circuit reported in our previous work [5].
C. Delay Circuit
D. Loop Filter
To reduce jitter generation, the edge-inclined circuit in the
The lag-lead filter is used as the loop filter, and the RC time
90 -delay block shown in Fig. 4, which includes a capacitor
constant is adjusted for each use. An additive capacitor outside
for delay control [5], is replaced by a chain of emitter-coupled
the chip is not needed when the CDR is used for short-distance
logic (ECL) buffer circuits without capacitors, the delay of
transmission systems. It is, however, needed for long-distance
which can be adjusted from about 100 to 300 ps from outside
use.
the chip. The edge-inclined circuit was employed to make
the 156-Mbit/s CDR smaller. But the delay needed for the
2.5-Gbit/s CDR is only 200 ps, which is much smaller than E. Other Considerations
that needed for the 156-Mbit/s CDR. Therefore, only a small Furthermore, in order to guarantee jitter tolerance, it is
number of ECL circuits are needed for the 200-ps delay, and important to maintain an optimum timing adjustment between
KISHINE et al.: CDR IC FOR LAN’S AND WAN’S 809

Fig. 7. VCO configuration.

Fig. 10. GCA configuration.

Fig. 8. VCO tuning curve.

Fig. 11. GCA gain dependence of jitter generation.

signal generated from the comparison between the phase of the


delay flip-flop output and the 90 -delayed phase for the input
data. In addition, in order to lower the power consumption,
the new 2.5-Gb/s CDR IC uses stacked differential pairs on
two levels, which enables its supply voltage to be decreased
to 3.0 V (as opposed to 5.2 V in our previous work [5]).
Furthermore, current dissipation is optimized in each block to
Fig. 9. Range of controllable oscillation frequency.
reduce power consumption.

the extracted clock and the input data. This timing adjustment IV. EXPERIMENTAL RESULTS
is attained by allowing the clock to trigger the center of the A new chip was fabricated using the 0.5- m super self-
data period by means of the phase-comparator (PC) output aligned process technology Si bipolar process [7]. It was
810 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

(a)

(b)

Fig. 12. Output waveforms of the CDR for LAN. (a) Data output. (b) Clock output.

(a)

(b)

Fig. 13. Output waveforms of the CDR for WAN. (a) Data output. (b) Clock output.

mounted in a 7 7 mm -square ceramic package. The CDR Gb/s are shown in Fig. 12. The eye opening of the output data
IC of both high and low loop gain is evaluated in each case was sufficiently wide, and clock extraction was very precise.
when the gain is adjusted to both short- and long-distance The rms jitter generation is 1.2 ps and the capture range is
transmission systems. Jitter was measured with a commercial over 150 MHz.
jitter analyzer. An rms jitter-generation value from which the
jitter value of input data is subtracted can be obtained with
the analyzer. B. SF CDR for WAN
To lower the cutoff frequency of the jitter transfer curve, the
external capacitor for the loop filter of 0.1 F is added. The
A. SF CDR for LAN loop gain is set to about 2 10 (1/s), which is the loop gain
The internal capacitor in the loop filter is 10 pF, and an when the jitter transfer curve meets the ITU-T jitter transfer
external capacitor is not needed in this case. The GCA gain specification in Fig. 3.
dependence of the jitter generation is shown in Fig. 11. The The output waveforms when the loop gain is set to the
jitter generation decreases as the gain increases, and the lowest point above are shown in Fig. 13. Again, the eye opening of
point is when the gain is larger than about 8 dB. The loop the output data was sufficiently wide, and clock extraction
gain at this point is 1.2 10 (1/s). The output waveforms was very precise. The rms jitter generation is 3.6 ps, which is
when the loop gain is set to that point and input data is 2.488 32 larger than that of the CDR when its loop gain is adjusted for
KISHINE et al.: CDR IC FOR LAN’S AND WAN’S 811

Fig. 14. Jitter transfer function.

Fig. 16. Loop-gain dependence of jitter generation and cutoff frequency of


jitter transfer function.

Fig. 15. Jitter tolerance curve. the I/O circuit) in both cases is less than 35% of that in the
short-distance transmission systems, but is smaller than the 2.5-Gb/s PLL’s reported previously [1], [2].
specification of the jitter generation of 4.0 ps (for STM-16;
ITU-T Recommendation G.958). The capture range is over V. CONCLUSION
50 MHz. Fig. 14 shows the measured jitter transfer function The design method of the CDR for both LAN’s and WAN’s
of the CDR in this case. The curve meets the ITU-T G.958 is presented. A new 2.5-Gb/s SF monolithic CDR IC using
specification. Fig. 15 shows the jitter tolerance curve when the the 0.5- m Si bipolar process has been developed. The CDR
input jitter magnification is 120% of the ITU-T specification. IC can be used in the transmission receivers for both LAN’s
The squares indicate error-free operation (where the error rate and WAN’s. The rms jitter generation of the CDR adjusted
is lower than 10 ). The rms jitter generation, jitter tolerance, for LAN’s is 1.2 ps. Furthermore, the jitter characteristics
and jitter transfer function all meet the jitter specifications in of the CDR for backbone networks or WAN’s meet the
ITU-T G.958. specifications for STM-16 given in ITU-T Recommendation
The relationship between the measured jitter generation (or G.958. In addition, the power consumption of the CDR is
cutoff frequency) and the loop gain in this experiment is only 0.4 W.
shown in Fig. 16. The darker shaded area is for the CDR,
whose jitter characteristics meet the specifications of ITU-T
APPENDIX
Recommendation G.958. In the area of larger loop gain, the
jitter generation becomes small. Fig. 16 shows clearly that,
when its loop gain is optimized, the CDR IC is suitable for both A. Loop-Gain Dependence of the Jitter Shown in Fig. 1
LAN’s and WAN’s. The capture range of both types of CDR’s The jitter due to the noise in the input signal to the PLL is
is wide enough to cover the deviation in the free-running expressed as (1). When the loop filter is a lag-lead type (the
frequency due to changes in temperature (ranging from 5 to series and shunt register are respectively and and the
90 C In addition, the power consumption (including that of shunt capacitance is ), the natural angular frequency and
812 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

the damping factor are expressed as [7] C. Yamaguchi, Y. Kobayashi, M. Miyake, K. Ishii, and H. Ichino, “A
m
0.5  bipolar technology using a new base formation method,” in
Proc. BCTM, 1993, pp. 63–66.

Keiji Kishine (M’98) was born in Kyoto, Japan,


where and The term on October 26, 1964. He received the B.S. and
M.S. degrees in engineering science from Kyoto
in (1) therefore becomes University, Kyoto, in 1990 and 1992, respectively.
In 1992, he joined the Electrical Communica-
(A1.1) tion Laboratories, Nippon Telegraph and Telephone
Corp. (NTT), Tokyo, Japan. At the NTT System
Electronics Laboratories, Kanagawa, Japan, he was
engaged in research and design of high-speed, low-
power circuits for Gbit/s LSI’s using Si-bipolar
This shows that increases as the loop gain increases. transistors with application to optical communica-
Furthermore, in (1) can be expressed as tion systems. Since 1997, he has worked on research and development of
Gbit/s clock and data recovery IC at the Photonic Network Laboratory, NTT
Network Innovation Laboratories, Kanagawa, Japan.
Mr. Kishine is a member of the Institute of Electronics, Information and
Communication Engineers (IEICE) of Japan.

(A1.2) Noboru Ishihara (M’89) was born in Gunma,


Japan, on April 27, 1958. He received the B.S.
degree in electrical engineering from Gunma
This also shows that increases. Therefore, the University, Gunma, in 1981 and the Dr.Eng. degree
jitter in Fig. 1 increases as the loop gain increases. from the Tokyo Institute of Technology, Tokyo,
Japan, in 1997.
In 1981, he joined the Electrical Communication
B. Loop-Gain Dependence of the Jitter Shown in Fig. 2 Laboratory, NTT, Tokyo, where he has been
engaged in research and development of analog
The jitter due to the noise in the PLL can be expressed as IC’s for communication use. His recent work is in
(2). In (2), the term is the area of low-power and high-speed analog IC’s
for optical communications.
(A2.1) Mr. Ishihara is a member of the Institute of Electronics, Information
and Communication Engineers (IEICE) of Japan and the IEEE Microwave
Theory and Techniques Society.

The term decreases as the loop gain increases. In


addition, the term is expressed as
Ken-ichi Takiguchi was born in Kanagawa, Japan, on July 21, 1969. He
(A2.2) graduated from Tokyo Computer School, Tokyo, Japan, in 1992.
In 1992, he joined NTT Electronics Corp., Kanagawa. He has been engaged
in development of high speed IC’s, especially gigabit/second PLL IC’s.
also decreases as the loop gain increases. Therefore,
the jitter in Fig. 2, expressed in (2), decreases as the loop gain
increases.
Haruhiko Ichino (M’89) was born in Yamaguchi, Japan, on January 26, 1957.
ACKNOWLEDGMENT He received the B.S., M.S., and Ph.D. degrees in applied physics from Osaka
University, Osaka, Japan, in 1979, 1981, and 1994, respectively.
The authors wish to thank H. Yoshimura and K. Sato for In 1981, he joined the Electrical Communication Laboratories, Nippon
their helpful discussions and suggestions. Telegraph and Telephone Corp. (NTT), Tokyo, Japan. He has been engaged
in research and development of Gbit/s SSI-MSI’s using bipolar transistors
REFERENCES (Si bipolar transistor and AlGaAs/GaAs HBT), with application to Gbit/s
optical communication systems and high-frequency satellite communication
[1] H. Ransjin and P. O’Conner, “A PLL-based 2.5b/s GaAs clock and data systems. His work also includes low-power Gbit/s LSI’s for SDH networks
Regenerator IC,” IEEE J. Solid-State Circuits, vol. 26, pp. 1345–1353, and future ATM switching systems; and O-E, E-O converter modules. During
Oct. 1991. this research and development, he also worked on the modeling of a high-
[2] R. Walker, C. Stout, and C.-S. Yen, “A 2.488Gb/s Si-bipolar clock and speed bipolar transistor, analyzing ECL gate delay and maximum operating
data recovery IC with robust loss of signal detection,” in ISSCC Dig. speed of GHz flip-flop, and high-speed design methodology based on gate-
Tech. Papers, 1997, pp. 246–247. array and standard-cell approaches. His interests include high-speed packaging
[3] “Digital Line Systems Based on the Synchronous Digital Hierarchy for and measurement systems. Since 1997, he has worked on research and
Use on Optical Fiber Cables,” CCITT Rec. G.958. development of Gbit/s-interface hardware design of photonic transport net-
[4] A. Blanchard, Phase-Locked Loops. New York: Wiley, 1976, ch. 8. work systems. Currently, he is a Senior Research Engineer, Supervisor, and
[5] N. Ishihara and Y. Akazawa, “A monolithic 156Mb/s clock and data Photonic Network Systems Research Group Leader of the Photonic Network
recovery PLL circuit using the sample-and hold technique,” IEEE J. Laboratory, NTT Network Innovation Laboratories, Kanagawa, Japan. He was
Solid-State Circuits, vol. 29, pp. 1566–1571, Dec. 1994. a Visiting Lecturer at Osaka University during 1995–1996.
[6] K. Kishine, N. Ishihara, and H. Ichino, “Jitter-suppressed low-power Dr. Ichino is a member of the Institute of Electronics, Information and
2.5-Gbit/s clock and data recovery IC without high-Q components,” Communication Engineers (IEICE) of Japan. He was a Secretary of IEICE’s
Electron. Lett., vol. 33, no. 18, pp. 1545–1546, Aug. 1997. Technical Group on Integrated Circuits and Devices.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002 1781

A 10-Gb/s CDR/DEMUX With LC Delay Line VCO


in 0.18-m CMOS
Jonathan E. Rogers, Member, IEEE, and John R. Long, Member, IEEE

Abstract—A monolithic 10-Gb/s clock/data recovery and 1 : 2


demultiplexer are implemented in 0.18- m CMOS. The quadra-
ture LC delay line oscillator has a tuning range of 125 MHz and a
60-MHz/V sensitivity to power supply pulling. The circuit meets
SONET OC-192 jitter specifications with a measured jitter of 8
ps p-p when performing error-free recovery of PRBS 231 –1 data.
Clock and data recovery (CDR) is achieved at 10 Gb/s, demon-
strating the feasibility of a half-rate early/late PD (with tri-state)
based CDR on 0.18- m CMOS. The 1.9 1.5 mm2 IC (not in-
cluding output buffers) consumes 285 mW from a 1.8-V supply.
Index Terms—Bang–bang phase-locked loop, clock and data re-
covery, LC delay line, phase detector, SONET OC-192, voltage-
controlled oscillator.
Fig. 1. Optical receiver architecture.

I. INTRODUCTION
the received data by the clock/data recovery block (CDR), and

T HE VOLUME of data transported over the telecommuni-


cations network increased at a compounded annual rate of
100% from 1995 to 2001 in the U.S. (and since 1997 in Eu-
the received data are regenerated. It is then applied to a 1 : N
time-division demultiplexer (DEMUX) to separate the 10-Gb/s
stream into multiple, lower speed channels. Typical ratios for
rope), mainly due to increased internet traffic [1]. Contrast this demultiplexing are 1 : 4, 1 : 8 and 1 : 16. These lower speed chan-
with the historic demand for bandwidth, which grew at an an- nels are then further processed by VLSI CMOS circuitry de-
nual rate of between 6% and 10% before the mid-1990’s. The signed to comply with the SONET or SDH standards.
call for technologies, such as interface electronics, which ex-
pand the capacity of fiber-based transport links to 10 Gb/s (and
beyond) has risen in response to this explosion in data traffic. II. BANG–BANG CDR/DEMUX
Systems at 10 Gb/s per channel are currently implemented in
This paper describes the implementation of an early/late
either OC-192 or STM-64 formats using the synchronous op-
(bang–bang) phase-locked loop (PLL) based CDR in 0.18 m
tical network (SONET OC-192) or European synchronous data
CMOS. For monolithic CDR implementations, a PLL imple-
hierarchy (SDH STM-64), respectively.
mentation is preferred [2], [3] since it eliminates narrowband
A typical Gb/s receiver is shown in Fig. 1. It uses a pho-
(i.e., high-Q) resonators, used in direct extraction of a timing
todiode to convert the incoming nonreturn to zero (NRZ) op-
tone by filtering that are difficult to integrate. Also, the use
tical pulses from a single 10-Gb/s fiber channel to a current.
of CMOS technology is potentially advantageous from both
These current pulses are then converted by the transimpedance
manufacturability and cost perspectives [4], [5].
(TZ) amplifier into a voltage. The pulses are low-pass filtered
Linear and early/late (bang–bang) phase detectors (PD) have
to remove out-of-band noise, thereby improving the received
been used for Gb/s CDR implementations. A linear PD has an
signal-to-noise ratio (SNR). An automatic gain control amplifier
average output voltage that is directly proportional to the error
(AGC) provides additional amplification while compensating
between the data and clock phases. This linear relationship
for variations in the received signal power. The clock required to
allows the loop to be designed using classical control theory. In
retime the incoming synchronous data stream is recovered from
contrast to a linear loop, the early/late phase detector outputs
only signify whether the clock is early or late with respect to the
Manuscript received April 8, 2002; revised June 25, 2002. This work was ideal data sampling instant. A loop incorporating an early/late
supported by Micronet, NSERC, and the Nortel Institute at the University of
Toronto. phase detector is nonlinear and has a jitter transfer bandwidth
J. E. Rogers was with the RF/MMIC Group, Department of Electrical and which varies with jitter amplitude. Therefore, characterizing a
Computer Engineering, University of Toronto, Toronto, ON, Canada. He bang–bang loop requires time-domain numerical simulation,
is now with Inphi Corporation, Westlake Village, CA 91361 USA (e-mail:
jrogers@inphi-corp.com). which appears to be a disadvantage. However, guaranteeing
J. R. Long was with the RF/MMIC Group, Department of Electrical and Com- the stability of a linear phase-locked loop (PLL) often requires
puter Engineering University of Toronto, Toronto, ON, Canada. He is now with multiple design iterations, lengthening the time to manufac-
the Electronics Research Laboratory, Delft University of Technology, 2628CD
Delft, The Netherlands (e-mail: j.r.long@its.tudelft.nl). ture. Early/late PD-based PLLs are quasi-digital systems and
Digital Object Identifier 10.1109/JSSC.2002.804337 consequently they are more resistant to component and process
0018-9200/02$17.00 © 2002 IEEE
1782 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

where is the stability factor, the gain of the integral branch


of the loop, is the number of bit periods of latency around the
loop, and is the unit interval or length of the bit period in
seconds.
This stability factor is the ratio of proportional and integral
path gains with a correction for the amount of latency around the
loop. A stability factor greater than unity ensures stability while
the loop is not slew-rate limited in the phase domain. It should
be noted that the bandwidth of the jitter transfer function (JTF)
Fig. 2. Early/late (bang–bang) phase detector-based CDR.
for a bang–bang loop is inversely proportional to the input jitter
amplitude. A JTF for jitter amplitudes in excess of 1 UI cannot
be defined, as the loop will slip cycles when such large jitter in-
variations as well as noise. In addition, an early/late PD pos- puts are not tracked. To ensure that the loop JTF does not exhibit
sesses intrinsic matching between the sampling and retiming peaking, the stability factor is made large enough so that the pro-
phases, allowing operation at speeds where it is difficult to portional loop dominates during slew limiting (slope overload)
match the delay of a conventional analog phase detector to that at jitter inputs less than 1 UI p-p. Therefore, this design imple-
of the retiming latch. It is this combination of robustness and ments a large stability factor ( 1 10 ) using interchangeable
speed that makes the bang–bang PD an excellent choice for an (off-chip) loop filter capacitors.
IC-based implementation.
Fig. 2 shows a block diagram of the CDR-PLL implemented B. CDR Architecture
in this work. Data is compared to the voltage-controlled oscil- Fig. 3 shows the CDR/Demux test chip at the block level.
lator (VCO) clock at the loop input. If the falling clock edge Differential 10-Gb/s data are brought on-chip through DATA
occurs before the data transition (early), the early/late phase de- and DATA pins which are broad-band matched to 50 . The
tector outputs a voltage, . If the clock edge falls after the half-rate early/late phase detector compares the data falling edge
data transition (late), then the phase detector outputs a voltage with the phase of the quadrature clock. The phase detector pro-
. In the case where there is no data transition during the clock duces no output if a falling edge of the data is not present, oth-
period, the phase detector outputs a “0” voltage. The PD output erwise an early or late pulse is produced. The phase informa-
pulses are then filtered by integral and differential loop filter tion generated by the phase detector travels down two separate
branches. An important parameter of the early/late phase de- paths. The direct path to the oscillator provides high frequency
tector based CDR is the bang–bang frequency step of the VCO, proportional (bang–bang) control for the system. Through this
or , tuning path, the early/late phase detector outputs translate into
frequency steps of and from the oscillator
(1)
center frequency. If early or late pulses are not present, the os-
cillator center frequency remains unchanged.
where is the output voltage of the phase detector, is the
The second tuning path leads to the input of the charge pump.
gain of the proportional branch of the loop, and is the
The charge pump adds or removes charge from capacitor
tuning gain of the VCO in Hz/V.
for early and late inputs, respectively. When the phase detector
result is neither early nor late, the charge pump does not alter
A. System-Level Design
the charge on the capacitor (charge pump tri-state). The voltage
Characterization of early/late (bang–bang) PLL behavior across is used for low-frequency tuning of the VCO, thus
is described by Walker [2] and analysis of its application to completing the integral control loop. An on-chip capacitor
SONET compatible CDR systems is found in the publication serves to reduce glitches caused by the probe or bond induc-
by Greshishchev [3]. tances between the chip and external circuitry.
At the system level, design of the bang–bang CDR is reduced When the PLL has settled into lock, the in-phase clock of the
to the selection of two main system parameters. The first is oscillator is aligned with the ideal data sampling instant. This
the bang–bang frequency step , which (somewhat incon- allows the 1 : 2 DEMUX to divide the data into two streams,
veniently) simultaneously governs jitter tolerance performance, and these signals are buffered off-chip so that the data recovery
the related jitter transfer bandwidth, and jitter generation. Op- properties of the chip can be analyzed. Although this is signifi-
timal jitter generation is realized by setting the bang–bang fre- cantly less demultiplexing than in an actual 10-Gb/s application,
quency step to the lowest value which still meets the jitter tol- it is sufficient to demonstrate feasibility. The clock signal is also
erance specification. A of 3 MHz was selected, which in- brought off-chip for evaluation of the system performance.
cludes some margin so that the CDR exceeds the jitter tolerance Fig. 4 shows the half-rate early/late phase detector. The repre-
specification for OC-192. sentation of this block with single-ended signal paths is purely
The second system parameter is the loop stability factor , to simplify the diagram; the actual phase detector is fully dif-
defined by Walker [2] as ferential. The architecture is similar to Hauenschild’s phase de-
tector/demux [6], which was implemented in a BiCMOS tech-
nology. The main difference here results from modifications
(2) necessary to accommodate the relatively low transconductance
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS 1783

Fig. 3. CDR/DEMUX block diagram.

available from 0.18- m CMOS transistors. In the quadrature tages over conventional CMOS and other all-NMOS implemen-
sampling paths, where metastability and hysteresis are a con- tations. First, it has a relatively low output impedance making
cern, extra buffering is used. Also, the input latch of the first it suitable for high-speed operation. MCML logic also benefits
flip-flop is increased in size in order to improve performance. from reduced logic voltage swing as well as from the elimina-
All samples enter the logic on the same clock edge simplifying tion of lower mobility PMOS transistors compared to CMOS
the early/late logic. The phase detector pulses are retimed at the logic. The MOS equivalent of a bipolar ECL gate is not prac-
output of the phase detector in order to remove asymmetries in tical, especially from a 1.8-V supply due to attenuation of the
both amplitude and duration from the output pulses. Note that signal by source followers and lack of headroom.
a drawback to these modifications is unequal loading of the in- Another benefit of using MCML is reduced switching-re-
phase and quadrature clock lines. Thus, care must be taken to lated supply noise, due to the relatively constant current drawn
avoid a static offset in the phase detector as this could cause a from the power supply. For improved supply rejection, the gain
reduction in the residual jitter tolerance. stages and output buffers of the ring oscillator are implemented
Note that 1 : 2 demultiplexed data could be tapped off directly as MCML inverter/buffers. An additional benefit of this simple
from early/late logic inputs A and B of Fig. 4. However, a sep- design is that the clocked phase detector elements can interface
arate 1 : 2 demux was used for the testchip to minimize loading with the ring oscillator without level shifting or swing adjust-
of the phase detector latches, at the expense of possible phase ment.
alignment errors at the demux and a slight increase (10 mW) in The first goal of this design is to create a buffer which has the
power consumption. widest possible bandwidth, while still having enough gain. A
minimum value of approximately 2 for the small-signal gain was
III. CIRCUIT DESCRIPTION chosen, otherwise the gate noise margin becomes unacceptable.
The transistor and block level design of the 10 Gb/s CDR Biasing of the circuit so that the large-signal switching speed
circuits are described in the following sections. The implemen- approaches maximum performance is now considered.
tation of the phase detector is examined first, followed by an First, an appropriate voltage swing ( ) is se-
in-depth description of the LC delay line VCO. lected. The voltage swing is made as large as possible, without
forcing the switching transistors into the triode region at any
A. Phase Detector time during the cycle. Larger drain-to-gate capacitance of the
The phase detector logic is implemented in resistively-loaded MOSFET in the triode region limits the switching speed. Note
MOS current mode logic (MCML). This offers several advan- that gain is (first-order) dependent on the voltage swing, but
1784 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

Fig. 4. Early/late (bang–bang) phase detector.

the propagation delay time is not. When the differential pair is


switched so that transistor M1 [see Fig. 5(a)] is carrying all the
bias current, M1 will be in the active region (saturation) as long
as voltage swing (i.e., ) obeys the following con-
dition:

(3)

where is the threshold voltage of the switching transistor,


is the difference between and and is the
minimum drain source voltage for which M1 is still in the active
region. Note that signal swings in excess of force the (a)
transistors into the triode region, which puts an upper limit on
the signal swing that is practical.
Once an upper limit for the voltage swing is defined, then
it is simply a matter of deciding what proportion of and
are used to make up the signal swing ( ).
The maximum bandwidth for the MCML circuit is achieved
by finding the largest ratio of to (from circuit sim-
ulations) for which the small-signal gain condition is still met,
as this minimizes the load resistance . Additionally, it must
be ensured that this minimized swing is still sufficient to fully
switch the differential pair. Swings of approximately 700 mV at
current densities of 50–100 A/ m of FET width (depending on
the application) are typical of the CML cells used in this work.
The combinatorial logic required by the phase detector is per- (b)
formed using MCML logic gates consisting of cascoded differ-
Fig. 5. MCML logic gates. (a) Inverter/buffer. (b) Cascode logic gate (AND).
ential pairs [Fig. 5(b)]. These are preferred over MCML gates
consisting of parallel differential pairs since they maintain two
distinct logic levels for all possible input combinations despite The latch design (refer to Fig. 6) is critical to the jitter gener-
the low transconductance typical in MOS circuits. ation performance of the CDR system. Two important nonide-
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS 1785

(a) Fig. 7. LC delay line oscillator.

edges. The physical layout of the MCML latch is also shown


in Fig. 6.

B. LC Delay Line Oscillator


Fig. 7 shows a block diagram of the two-stage LC delay line
VCO. The symmetry of this architecture ensures that precise
in-phase (CKI) and quadrature (CKQ) clocks are generated.
The 5-GHz center frequency is tunable through external (EX),
internal (IN), and high-frequency bang–bang (BB) inputs.
The external tuning input is used to control the oscillator’s
center frequency in laboratory testing. This input could also be
incorporated into a frequency-locked calibration loop to boost
the frequency acquisition range of the PLL. The internal tuning
(b) port is part of the integral tuning path, while the bang–bang
Fig. 6. MCML latch. (a) Schematic. (b) Physical layout. tuning input completes the higher speed proportional tuning
path. Circuitry in the oscillator core is fully differential (with
the exception of the varactors) in order to reject supply noise.
alities, metastability and hysteresis, directly translate into de- The LC delay line promotes frequency stability with supply,
graded jitter generation performance. This degradation occurs process, and temperature variations, without compromising
above and beyond the jitter generation, which is inherent in the tuning speed.
system properties of the early/late phase detector-based PLL. The oscillation frequency is determined by the total propa-
Metastability occurs when positive feedback during the latch gation time through the gain blocks and differential transmis-
phase cannot ramp the output from the initial tracked voltage sion line stages. Load resistors of each MCML buffer match the
to the voltage required for reliable recognition of the dig- output to the 75- characteristic impedance of the delay line. A
ital state . An approximate value for the minimum time delay line impedance of 75 was chosen as a reasonable com-
required to latch an initial voltage is [7] promise between power dissipation in the gain stages and at-
tenuation of the delay lines. The delay through each gain stage
is only 7 ps (simulated) due to the low impedance seen at the
(4) drains of the switching transistors.
The delay line is a fully symmetric square microstrip spiral
It should be noted that the term is not much greater than [see Fig. 8(a)]. Each delay line (two are required) has an outside
one for a typical MCML latch with resistive loading in a typical dimension of 150 m, a conductor width of 4 m, conductor
0.18- m CMOS technology. spacing of 2 m, and consists of top metal (aluminum) directly
Maximizing the tracking bandwidth reduces the pattern-de- over the substrate. A relatively wide gap between lines reduces
pendent jitter (hysteresis) caused by the phase detector latches. the interwinding capacitance [Co in Fig. 8(b)], allowing a larger
Increasing bandwidth by reducing the gain creates a tradeoff be- tuning capacitance. The effective inductance seen between each
tween resistance to metastability and the tracking bandwidth. pair of input and output ports is approximately 2.5 nH with a
Tracking bandwidth is improved without increasing the likeli- parasitic capacitance of 120 fF. The inductance per unit area in-
hood of metastability by making device sizes in the input latches creases due to mutual coupling between interwound delay lines.
(e.g., the master latch) significantly larger than the slave. This This also reduces chip area compared to the alternative im-
reduces the relative loading of the master by the slave latch. This plementation requiring two separate delay lines between gain
technique was used in the latches which sample the incoming stages. In addition to saving chip area, a differential delay line
full-rate data in the 1 : 2 demux and on the quadrature clock also improves common-mode rejection of the oscillator. This is
1786 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

(a)

(a)

(b)
Fig. 9. NMOS varactor. (a) Varactor structure. (b) Varactor C –V curve.

The simulated temperature sensitivity includes back-end metal


(b) and substrate resistivity effects, which dominate temperature de-
Fig. 8. Balanced LC delay line. (a) Physical layout. (b) GEMCAP2 model. pendence of the on-chip delay line. In addition, it is important to
note that the tuning response time of the delay line oscillator is
TABLE I
on the order of the clock period (comparable to a conventional
SIMULATED VCO SENSITIVITIES ring oscillator), which is important in the bang–bang PLL ap-
plication.
GEMCAP2 [8] is used to derive a SPICE-compatible lumped-
element model for the delay line [see Fig. 8(b)] and refine the
oscillator design. Each pi-section of this model corresponds to
an individual conductor segment of the delay line, with circuit
elements representing the self-inductance, frequency-dependent
resistances (e.g., skin effect), capacitance to the substrate as well
as the capacitance and loss of the substrate itself. Also modeled
is the capacitance and mutual magnetic coupling between wind-
ings. These elements are then combined to form the multi-seg-
ment delay line model which is employed in transient simula-
tions of the oscillator.
The remaining capacitance required for oscillation at 5 GHz
because the inductance seen in common mode is less than when is added by inversion-mode NMOS varactors at each gain stage
differentially driven, and this places common-mode oscillations input (a high impedance node). Integral loop tuning range is
outside the bandwidth of the gain stages. designed at 32.5 MHz/V, and the (simulated) voltage swing at
The delay lines account for 43 ps of delay time each at 5 GHz the clock buffer inputs is 2.5 V differential, which consumes
or 86% of the total delay around the oscillator loop. Concentra- extra power but improves switching speed of the MCML logic.
tion of the loop delay in these lines makes the VCO resistant The varactors were selected based on practical issues related
to variations in power supply, temperature and process. Table I to the fabrication technology [9]. Fig. 9 shows a cross section
compares simulated sensitivities of the LC delay line VCO (in- and the tuning characteristics of the inversion mode NMOS var-
cluding tuning circuitry) with an ring oscillator composed of actor used in the VCO. This varactor was chosen because the
identical MCML gain stages (without tuning capability) and four-terminal NMOS transistor model in the IC design kit could
comparable center frequency. Supply pulling is over an order of be used for circuit simulations without modification.
magnitude lower for the delay line oscillator, at 45 MHz/V. In
addition, sensitivity of the delay line oscillator to process (based
IV. EXPERIMENTAL RESULTS
on transistor variation only) and temperature variations is sub-
stantially less than for a ring oscillator, mainly due to the domi- Table II summarizes the measurements for the LC delay line
nance of inductance over capacitive parasitics in the loop delay. oscillator. Phase noise spectral density of the VCO running
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS 1787

TABLE II
MEASURED VCO PERFORMANCE

Fig. 12. CDR clock and data waveforms (10-Gb/s operation).

Fig. 13. Measured CDR jitter tolerance.

TABLE III
CDR SUMMARY

Fig. 10. Oscillator and CDR phase noise performance.

open-loop and phase-locked and the phase noise of the refer-


ence source are all plotted in Fig. 10. At a 1-MHz offset from
the carrier, the free-running VCO phase noise is 103 dBc/Hz,
which falls to 127 dBc/Hz when the CDR is locked to a
sinusoidal data input at 2.5 GHz. The reference source phase
noise is also shown ( 135 dBc/Hz). The difference is primarily
due to frequency multiplication between the reference source
(2.5 GHz) and locked VCO (5 GHz), which adds a minimum
of 6 dB to the phase noise.
The fabricated oscillator has a measured center frequency of
4.45 GHz (10% slower than predicted by simulations). Subse- stack and the actual magnitude of these variations. This charac-
quent measurements of individual component test structures re- terization work is aimed at improving the accuracy of the CAD
vealed that the frequency shift is caused by unanticipated loss models thereby allowing better correlation between the simula-
and delay between the oscillator gain stages. Excessive resistive tion and measurement. Nevertheless, it is important to note that
losses in the top metal, inaccuracy in the modeling of parasitic oscillators from two separate fabrication runs showed only 0.2%
capacitances, and stray inductance between the delay line and variation in VCO center frequency.
gain stages (which is not extracted from the physical layout for The measured external tuning range and bang–bang fre-
simulation) all contribute to this error. The measured capaci- quency step (varied using the BB input) are 125 and 2.5–5 MHz,
tance per unit length and resistance per unit length of the delay respectively. Characterization of this varactor using a separate
line are 45% and 28% higher, respectively, than those derived test structure showed 40% less capacitance variation than
from the same simulation. This result exposes a sensitivity of expected, thus explaining the smaller tuning ranges observed.
the circuit to the absolute loss and capacitance in the delay line A VCO center frequency close to 4.98 GHz is needed in order
and, more importantly, sensitivity to variations in these param- to conduct full-speed testing of the CDR including bit error rate
eters over process. Work is ongoing to analyze the architecture testing (BERT) at the SONET OC-192 rate (9.953 Gb/s). A non-
for variations in the properties of the backend metal/dielectric invasive technique for adjusting the oscillation frequency was
1788 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

Fig. 14. A 10-Gb/s CDR testchip micrograph.

developed so that a design iteration is avoided, thereby allowing with the exception of the small residual jitter tolerance. The
the prototype to be fully characterized. The method is shown jitter tolerance exceeds specifications at low jitter frequencies
schematically in Fig. 11, where a metal plate is placed in close but is very close to the SONET mask at higher frequencies (see
proximity to the delay line using a micro-manipulator. Current Fig. 13). Poor electrical contact from probes to the chip, phase
induced in the metal plate reduces the self-inductance of the error between the quadrature clocks, and mismatch between the
delay lines. Since delay between oscillator stages is proportional demux and PD latches are likely sources of degradation in the
to line inductance, the frequency increases when the inductance jitter performance at higher frequencies. Poor electrical contact
is lowered. However, the plate must be placed within approxi- is partly due to wear caused by mechanical scrubbing of the pad
mately 10 m of the IC surface ( m from Fig. 11) to be by the probe tip. Repeated contacts were needed to trim the os-
effective. Conductivity of the metal plate is important (i.e., gold cillation frequency before measuring the jitter tolerance, which
or a similar metal is used), as resistive losses actually increase caused significant wear of the pad metal and inconsistent elec-
the signal delay and slow down the oscillator. An unwanted sec- trical contacts. CDR performance is summarized in Table III.
ondary effect is additional interwinding capacitance that results A photomicrograph of the 1.9 1.5 mm IC is shown in
from placing another conductor in close proximity to the delay Fig. 14. The input and output data lines are implemented in 50
line, which acts to reduce the oscillation frequency. The induc- microstrip. The pad configuration used was dictated by the RF
tive effect dominates, however, with the net result that the center on-wafer probes used for test. In order to increase the isolation
frequency is adjustable from 4.45 to 5.5 GHz with negligible ef- between the oscillator and the data path circuits, power supplies
fect on phase noise. are kept separate. The layout also includes an extensive bottom
The oscilloscope eye pattern of Fig. 12 is measured in re- metal ground plane which provides the reference plane for the
sponse to 10-Gb/s PRBS input data (2 –1). Error-free data re- microstrips as well as increasing the capacitance from substrate
covery at 10 Gb/s was measured in BER tests with a recovered to ground. The IC consumes 285 mW from a 1.8-V supply (not
clock jitter of 1.2 ps rms, or 8 ps p-p. The 5-Gb/s output data including 50- test output drivers).
eye has larger jitter than the clock due to pattern dependencies
that are likely introduced by bandwidth limitations of the 50-
output buffers used for testing. Note that a 1 : 8 or 1 : 16 demulti- ACKNOWLEDGMENT
plexer would be used in a typical application, which relaxes the
bandwidth requirements for off-chip buffering of the recovered Circuit fabrication was facilitated by the Canadian Microelec-
data. tronics Corporation. The authors thank Dr. Y. Greshishchev and
Measured jitter transfer, generation, and tolerance all meet Dr. P. Schvan for providing access to test facilities at Nortel Net-
the SONET OC-192 requirements (measured jitter of 8 ps p-p), works’ Ottawa Laboratories.
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS 1789

REFERENCES Jonathan E. Rogers (S’00–M’01) received the


B.A.Sc. degree in engineering science (electrical
[1] M. J. Reizenman, “Optical nets brace for even heavier traffic,” IEEE
option) and the M.A.Sc. degree in electronics from
Spectr., pp. 44–45, Jan. 2001. the University of Toronto, Toronto, ON, Canada, in
[2] R. C. Walker, C. L. Stout, J.-T. Wu, B. Lai, C.-S. Yen, T. Hornak, and P. T.
1991 and 2001, respectively.
Petruno, “A two-chip 1.5-GBd serial link interface,” IEEE J. Solid-State During his undergraduate work, he spent 16
Circuits, vol. 27, pp. 1805–1811, Dec. 1992. months working at Nortel, Ottawa, ON, where he
[3] Y. M. Greshishchev, P. Schvan, M. Xu, J. Showell, J. Ohja, and J. E.
participated in the design and characterization of
Rogers, “A fully integrated SiGe receiver IC for 10-Gb/s data rate,” SiGe MMIC for OC-192 applications. His graduate
IEEE J. Solid-State Circuits, vol. 35, pp. 1949–1957, Dec. 2000.
work focused on the implementation of 10-Gb/s
[4] J. Savoj and B. Razavi, “A 10Gb/s CMOS clock and data recovery cir- clock and data recovery systems in deep submi-
cuit with frequency detection,” in Proc. ISSCC, San Francisco, CA, Feb.
crometer CMOS. In October of 2001, he joined Inphi Corporation, Westlake
2001, pp. 78–79. Village, CA, where he is working to develop physical layer solutions for 10-
[5] J. E. Rogers and J. R. Long, “A 10Gb/s CDR/demux with LC delay
and 40-Gb/s communication systems.
line VCO in 0.18m CMOS,” in Proc. ISSCC, San Francisco, CA, Feb.
2002, pp. 254–255.
[6] J. Hauenschild, C. Dorschky, T. Winkler von Mohrenfels, and R. Seitz,
“A plastic packaged 10 Gb/s BiCMOS clock and data recovering 1 : 4- John R. Long (S’77–A’78–M’83) received the B.Sc.
demultiplexer with external VCO,” IEEE J. Solid-State Circuits, vol. 31, degree in electrical engineering from the University
pp. 2056–2059, Dec. 1996. of Calgary, Calgary, AB, Canada, in 1984, and the
[7] D. A. Johns and K. Martin, Analog Integrated Circuit Design, First M.Eng. and Ph.D. degrees in electronics engineering
ed. New York: Wiley, 1997. from Carleton University, Ottawa, ON, Canada, in
[8] J. R. Long and M. A. Copeland, “The modeling, characterization, and 1992 and 1996, respectively.
design of monolithic inductors for silicon RFICs,” IEEE J. Solid-State His current research interests include low-power
Circuits, vol. 32, pp. 357–367, Mar. 1997. transceiver circuitry for highly integrated radio ap-
[9] P. Andreani and S. Mattisson, “On the use of MOS varactors in RF plications and electronics design for high-speed data
VCOs,” IEEE J. Solid-State Circuits, vol. 35, pp. 905–910, June 2000. communications systems.
Prof. Long is a member of the ISSCC, IEEE
BCTM and ESSCIRC conference technical program committees and is an
Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS. He received
the NSERC Doctoral Prize and Douglas R. Colton and Governor General’s
Medals for research excellence and a Best Paper Award from ISSCC 2000.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 713

A 0.5- m CMOS 4.0-Gbit/s Serial Link Transceiver


with Data Recovery Using Oversampling
Chih-Kong Ken Yang, Student Member, IEEE, Ramin Farjad-Rad, Student Member, IEEE,
and Mark A. Horowitz, Senior Member, IEEE

Abstract—A 4-Gbit/s serial link transceiver is fabricated in a


MOSIS 0.5-m HPCMOS process. To achieve the high data rate
without speed critical logic on chip, the data are multiplexed when
transmitted and immediately demultiplexed when received. This
parallelism is achieved by using multiple phases tapped from a
PLL using the phase spacing to determine the bit time. Using an
8 : 1 multiplexer yields 4 Gbits/s, with an on-chip VCO running
at 500 MHz. The internal logic runs at 250 MHz. For robust data
recovery, the input is sampled at 32 the bit rate and uses a digital
phase-picking logic to recover the data. The digital phase picking
can adjust the sample at the clock rate to allow high tracking
bandwidth. With a 3.3-V supply, the chip has a measured bit
error rate (BER) of <10014 .

I. INTRODUCTION

T HE increasing demand for data bandwidth in networking


has driven the development of high-speed and low-cost
serial link technology. Applications such as computer-to-
Fig. 1. Transmit architecture.

the entire transceiver chip is presented in Section V. Finally,


computer or computer-to-peripheral interconnection are re-
some conclusions are drawn from these results in Section VI.
quiring gigabit-per-second rates either over short distances
in copper or longer distances in fiber. CMOS technology is
used increasingly over GaAs or bipolar technologies because
of the development toward faster and faster devices. In 0.18- II. ARCHITECTURE
m CMOS technology, the -channel is expected to equal A 0.5- m CMOS technology is not fast enough to directly
or exceed that of the standard 0.5- m GaAs process. While generate and receive a 4-Gbit/s stream (since the maximum
other technologies are limited in the number of transistors due ring oscillator frequency is <2 GHz). Instead, we use paral-
to yield or power, CMOS technology allows implementation of lelism to reduce the performance requirements of each circuit.
complex digital logic enabling more integration of the back- The transmitter generates the bit stream by an 8 : 1 multiplexer
end processing, lowering the cost. Recent development has that multiplexes current pulses directly onto the output channel
shown CMOS capability to achieve Gbit/s data rates [1], [5], (Fig. 1). The receiver (Fig. 2) performs a 1 : 8 demultiplexing
[6], [8], [11]. This work pushes NRZ signaling rates to the by sampling with a bank of input samplers. Similar to the
bandwidth limitations of the process technology and explores transmitter, each sampler is triggered by individual clock
the issues involved. phases. Furthermore, clock/data recovery is achieved by a 3
The primary components of a link are the transmitter, the oversampling of each data bit. Thus, the receiver requires a to-
receiver, and the timing recovery circuits. Section II describes tal of 24 clock phases to support both the oversampling and the
the overall architecture of the link. Because many of the 1 : 8 demultiplexing. Various techniques exist for generating
circuits in the transmitter and receiver blocks have been multiple clock phases [2], [3]. The receive side uses a six-
previously discussed [1], this paper focuses on the timing stage ring oscillator ( -PLL) followed by phase interpolators
recovery technique. Section III evaluates the impact of timing to generate intermediate phases (ick[23 : 0]) between the ring
recovery on performance and compares two different timing oscillator edges (ck[11 : 0]) [1]. Similar to the -PLL, eight
recovery techniques: phase-locked loops versus oversampled different clock phases tapped from a four-stage ring oscillator
phase picking. This chip implements a phase-picking algorithm ( -PLL) control the transmitter multiplexing.
that is discussed in Section IV. The measured performance of A timing recovery circuit extracts the clock from the mul-
tiple samples per bit by finding the positions of the data
Manuscript received September 1997; revised December 3, 1997. transitions. Once the transitions are determined, a decision
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305-4070 USA. logic selects the samples furthest from data transitions (phase
Publisher Item Identifier S 0018-9200(98)02225-2. picking) as the received data byte. This approach is similar to
0018–9200/98$10.00  1998 IEEE
714 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Fig. 3. Transceiver test-chip block diagram.

limits the run length to <5 consecutive zeros or ones. The


PRBS sequence is a suitable substitute because it guarantees
a maximum run length of 7. The transmitter can be optionally
configured to transmit the PRBS sequence, a fixed sequence,
or the received data for testing.

III. TIMING RECOVERY


The goal of the timing recovery scheme is to maximize
the timing margin—the amount that a sample position can err
with the data still properly received. Errors that impact the
timing margin can be classified into two sources: static phase
error, and jitter (dynamic phase error). Fig. 4 illustrates the
timing margin where is the
static sampling error, and and are the jitter on the data
transition and the sampling clock. Since the sampling position
is defined with respect to the data transition, jitter on both the
clock and the data additively reduces timing margin. With ideal
square pulses, as long as the sum of the magnitudes of the static
and dynamic phase error is less than a bit time, the phase error
does not impact signal amplitude. However, in a band-width
limited system (for this work, due to the process technology),
signal amplitude is lower with sampling phase error because
Fig. 2. Receive architecture. the signals have finite slew rates. Correspondingly, this reduces
the signal-to-noise ratio (SNR), hence impacting performance.
what is done in UART’s, and was first applied to a high-speed The amount of SNR degradation can be calculated based
link by Lee et al. in [4]. on the shape of the signal waveform. For static phase error,
Fig. 3 shows the full transceiver test-chip block diagram. the SNR penalty is shown in Fig. 5 for a triangular signal
Since the sampling clocks are different phases, the sampled waveform and a sinusoidal signal waveform. When the sample
position phase offset is small, the sinusoidal waveform has a
results are resynchronized to a global clock. To facilitate
lower penalty than a triangular waveform due to the lower
the digital design, the on-chip data are further demultiplexed
signal slew rate near the sample point.1 For jitter, the SNR
(2 : 1) to 250 MHz. Finally, in order to test the bit-error
penalty is more complex to evaluate since it additionally
rate (BER), an on-chip parallel pseudorandom bit sequence depends on the statistics of the noise. For example, we can
(PRBS) encoder and decoder are used for a sequence.
Serial data are commonly encoded with 8B10B coding which 1 This penalty is only applicable to transitions.
YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER 715

Fig. 4. Timing margin.

Fig. 6. BER versus SNR with various amounts of phase noise.

(a)

Fig. 5. SNR penalty for different phase offsets.

assume an idealized jitterless system with signal amplitude


and additive white Gaussian noise (AWGN) of standard
deviation on the signal amplitude. In this system, we can
determine the performance (BER) for various SNR [14]:
(b)
(1)
a A A Fig. 7. Clock recovery architectures: (a) phase picking block diagram and
(b) data/clock recovery architectures.
This equation is plotted as the lowest dotted line in Fig. 6.
If we further assume jitter to be a AWGN as well, for a
triangular waveform, the phase noise can be translated into The amount of phase error and the jitter depends on the
amplitude noise using (where the bit time implementation of the clock recovery circuit. Two techniques
spans ). Since the noise sources are additive, the probability are commonly used, a phase-locked loop (PLL) and a phase
of error can be simply expressed as picker. A PLL employs a feedback loop that actively servos
the sampling phase of an internal clock source based on the
phase of the input [7]. Fig. 7(a) illustrates a common VLSI
implementation using an on-chip voltage-controlled oscillator
(VCO) as the clock source, and a charge pump following the
phase detector to integrate the phase error. A phase picker,
as shown in Fig. 7(b), oversamples each bit, and uses the
(2) oversampled information to determine the transition position
(phase) of the data. Based on the transition information, the
best sample is then selected as the data value (UART [10]).
Fig. 6 illustrate the BER versus amplitude SNR for various Each of the two architectures has a different tradeoff in terms
amounts of phase noise. The SNR penalty, as shown in of static phase error and jitter.
the figure, increases at higher SNR because the phase noise The static phase error of a PLL depends mainly on its
eventually limits performance, a “BER floor.” For a sinusoidal phase detector design. Ideally, sampling at the middle of the
signal waveform (with a lower slew rate near the sample bit window gives the maximum timing margin. However, if
point), the behavior is similar, except with lower SNR penalty. the sampler has a setup time, the middle of the effective bit
716 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

window is shifted by the setup time. Not compensating this


shift causes significant static phase error. This error can be
reduced by using the data samplers as the phase detector.2
Additional phase error occurs due to inherent mismatches
within the phase detectors and/or charge pump. Furthermore,
any phase detector “dead band” (window in which the phase
detector does not resolve phase information) limits the phase
resolution, increasing the static phase error.
In a phase-picking architecture, the multiple samples per
bit are used to find the transitions, effectively behaving as the
(a)
phase detector. Sampler uncertainty limits the resolution of the
transition detection. Sources of this uncertainty are sampler
metastability window and data dependence of the sampler
setup time. The uncertainty window for the sampler design
used is <1/10 the bit time which does not impact performance
significantly. More importantly, in this architecture, the phase
information is quantized by the oversampling, causing a finite
quantization error of 1/2 the phase spacing between samples.
For a higher oversampling ratio, this static phase error is
less, but it has a significant cost of increasing the number
of input samplers, increasing the input capacitance, and hence
(b)
limiting the input bandwidth. For a 3 oversampling system,
the maximum static phase error is 1/6 the bit time. Fig. 8. Effect of tracking bandwidth on jitter.
In terms of jitter, a PLL tracks the phase of the input
data with a tracking bandwidth limited by the stability of based on the first transition’s phase information could increase
the feedback loop. The loop tracking is effectively a high- the phase error for receiving the next bit.
pass filter that rejects the phase noise of the input at lower The impact of different tracking bandwidth on jitter is il-
frequencies. The noise not tracked appears as data jitter. lustrated in Fig. 8. The single sideband power spectral density
Furthermore, because the PLL frequency source is an on- (PSD) of an oscillator, such as the VCO of the transmitter,
chip VCO, supply and substrate noise from on-chip digital is shown to represent the phase noise in Fig. 8(a). Two
switching can introduce additional jitter. The impact of these hypothetical PLL’s with different bandwidths ( , and )3
two sources is formulated for a second-order PLL in the behave as high-pass filters that reject the lower frequency
following equation as the first and second terms: noise. Their transfer functions are overlaid in Fig. 8(a). The
resulting phase error is shown in the PSD of Fig. 8(b). Note
that this example excludes the additional noise from the phase-
tracking circuit [second term of (3)]. The integral of the area
(3) beneath the curve is an indication of the amount of jitter [13]
[ for (2)]; thus, the phase noise of Circuit I is larger than
Constants that determine the loop bandwidth in the equation that of Circuit II. Additionally, if a second-order PLL is not
are depicted in Fig. 7(a) with (V/rad) the gain of the filter, critically damped, the transfer function can exhibit peaking.
the stabilizing zero in the filter, and (rad-hertz/V) the This peaking accumulates phase noise at its loop bandwidth,
gain of the VCO. is the noise induced onto the VCO, increasing the noise.
and is the sensitivity of the VCO to this noise. Thus, For a phase picker, the sampling clocks experience similar
the total amount of “effective jitter” depends on the tracking jitter problems from supply and substrate noise since the
bandwidth of the loop, the amount of supply and substrate phases for the oversampling are also generated from an on-
noise, and the sensitivity of the loop elements to the noise. chip VCO. The primary difference is the tracking bandwidth.
Because the feedback loop has a loop delay of at least one A phase-picking system is a feedforward architecture (instead
clock cycle, the bandwidth of the loop is often chosen to be of feedback); thus, there are no intrinsic bandwidth limitations.
<1/10 of the oscillation frequency for sufficient phase margin The tracking rate depends on the rate at which new phase
and stability. The delay makes tracking high-frequency phase decisions are made, which in turn depends on the logic’s
noise ineffective because, if the phase error from on transition cycle time. The importance of this fast tracking is that it can
is independent of the phase of the next transition, correction potentially track the accumulation of phase noise by the on-
2 This causes additional difficulties because such phase detectors can only
chip multiphase generator (PLL). We delay the data by the
determine if transitions are early or late. The control loop is “bang–bang” time to arrive at a decision so the corrections are applied to the
control instead of linear control, which is less stable, has inherent dithering, appropriate bit (although with a latency overhead). However,
and requires additional frequency acquisition aid. Although a DLL (delay-line the maximum phase change between two transitions must be
based PLL) [8] can be used to eliminate the stability and frequency acquisition
problems, the phase spacing, when tapping phases from the buffer stages, is 3 The actual shape of the tracking transfer function H (s) varies with
sensitive to the input clock’s duty cycle and amplitude. implementation.
YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER 717

less than , half the bit time, even if the peak-to-peak jitter can
be much larger than a bit time. Changes greater than are
indistinguishable from a phase shift in the opposite direction,
.
Choosing between the two clock recovery systems depends
on the system requirements and noise behavior. We chose a
phase-picking architecture to explore the usefulness of the
higher phase-tracking capability. In such VLSI implementa-
tions, supply noise can be significant enough for the peak-to-
peak jitter to occupy a large fraction of the bit time, especially
since a PLL accumulates jitter. For the 4-Gbit/s link, we
Fig. 9. Phase-picking algorithm block diagram.
chose a low oversampling ratio of 3 to maintain high input
bandwidth and to keep the number of clock phases manageable
(1 : 8 demultiplexing and 3 oversampling yields 24 phases).
With a bit time of 250 ps, the phase-picking scheme4 can track
the noise of the on-chip multiphase generator (PLL) from both
the transmit and receive sides to keep the total “effective jitter”
below the 83-ps quantization spacing. One limitation of the
phase-picker tracking is that the maximum rate of the tracking
depends on the data transition density. Since the PRBS signal
guarantees one transition per byte, the maximum tracking rate
of one sample spacing every transition is fast (83 ps/2 ns).
Although the tracking rate is high, the maximum static phase
error from the quantization is 41 ps (2% of the clock period,
8 bit time), causing an SNR penalty (Fig. 5). Whether or
not a 3 oversampled phase-picking approach with higher
tracking bandwidth than a PLL can achieve better performance Fig. 10. Example of the phase-picking algorithm.
with the larger static phase error depends on the amount of
jitter induced by on-chip noise sources. If the lower SNR of the 3-byte sliding accumulation, the rate of phase change
penalty from the lower jitter compensates the higher SNR that the algorithm can track is slower than the maximum of 83
penalty of larger static phase error, phase picking would be ps/2 ns. The algorithm picks the correct sample if the majority
the better choice. of the transition information within the 3-byte window (6 ns)
indicates the correct phase. For example, if the input phase
IV. PHASE-PICKING ALGORITHM AND IMPLEMENTATION has a constant rate of change of <1 sample spacing per 3 ns
(corresponding to a frequency difference of 4%), the transition
The details of the phase-picking algorithm are illustrated in
information from >1.5 bytes of the 3-byte window would fall
Fig. 9. Picking the center sample requires finding and tracking
in the same phase quantization. Then the tally and compare
the bit boundaries. The decision logic first detects transitions
would select the correct sample to track the phase change.
by an XOR of adjacent samples, indicating the bit boundary to
This indicates a maximum phase-tracking rate of 83 ps/3 ns.
be in one of three possible positions. Fig. 10 shows an example
The criterion of tracking both and -PLLs’ accumulation
of the boundary detection with a portion of a sampled stream.
is met because the VCO elements’ supply noise sensitivity
To find which of the three transition positions is the most
is %/% (percent of frequency change per percent of
likely bit boundary, transitions corresponding to the same bit
supply noise [1], [3]),5 corresponding to 30 ps/3 ns for a 10%
boundary position are tallied. The position with the largest
supply step, which is less than the tracking rate. If the phase
total determines the bit boundaries.
change is slower than 83 ps/3 ns, the 3-byte accumulation
The decision logic makes a new decision per byte of data. In
offers some robustness by averaging any uncertainty in the
contrast to a higher order oversampling phase picker, the 3
transition detection due to high-frequency bit-to-bit noise. A
oversampling limits the change of the selected sample position
smaller window of one byte can track phase faster, but has
to one sample position per byte. To guarantee sufficient
poorer performance without sufficient transitions within that
transitions for averaging any bit-to-bit variations of high-
byte to average the bit-to-bit variation. A larger window of 5
frequency noise (near the bit rate), the tally is across a sliding
bytes (<83 ps/6 ns) would be too slow to track the - and
window of 3 bytes. The transitions are accumulated from the
-PLLs’ phase accumulation under reasonable supply noise.
current byte, the previous byte, and the next byte (delaying the
Once the transition position is determined, the middle
data allows the noncausal information) so that the decision is
sample within the bit boundaries is selected as the data.
applied to the byte at the middle of the window. As a result
5 Although the maximum phase error accumulation rate is based on the
4 In our system, the oscillator is at 250 MHz so the PLL bandwidth is supply sensitivity of the VCO, the peak phase error depends on the loop
2
restricted to <25 MHz. This yields a 10 tracking rate difference between bandwidth. The Tx -PLL and Rx -PLL generating the multiple clock phases
the two systems. have bandwidths of 15 and 5 MHz, respectively.
718 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Fig. 11. Comparison between center picking versus majority voting.

The selection is implemented by multiplexers selecting the


appropriate samples based on three select signals. In the case
where no transitions are detected, the three select signals
Fig. 12. Chip micrograph.
use previously stored values to maintain data through the
multiplexers.
The actual algorithm for deciding the received data value where only 7 bits are received, the opposite transition from
from the oversampled information can be designed alterna- 1–0–0 to 0–0–1 causes an “overflow,” requiring an extra bit
tively while still keeping the advantage of higher tracking (9 bits total) to be stored. These conditions are handled by a
bandwidth of a feedforward architecture. Instead of selecting bitwise FIFO built by shifting the input byte to accommodate
the middle (“phase pick”), a simple alternative implementation the one extra/less bit. If the aggregate shift increases beyond
is to take a majority vote based on the three sampled values. 1 byte, a bytewise FIFO handles the overflow/underflow byte.
Fig. 11 shows the performance comparison. Majority voting The limited depth of the FIFO can only handle a finite number
works well with nonbandwidth-limited signals that have high- of byte overflow. If the application requires handling long
frequency noise because it averages the noise over many streams of data with a slight frequency difference with the
samples. In a bandwidth-limited system (low-pass filtered by local reference clock, the local frequency can be corrected
the I/O time constant), it performs worse because at least based on the phase information from the decision logic.6
one of the two nonmiddle samples is required to be valid,
and the nonmiddle samples have a much higher probability V. TRANSCEIVER EXPERIMENTAL RESULTS
of error.
Arbitration is required when two transition positions have The transceiver chip was implemented in a 0.5- m CMOS
equal counts. This occurs when two of the sample positions process offered through MOSIS. The 3 mm 3 mm die
straddle the center of the bit and the third sampler samples photo is shown in Fig. 12. The chip is packaged in a 52-
at the transition. Picking either of the two straddling the pin CQFP package supplied by Vitesse Semiconductor which
center gives equivalent performance. More complex logic can has internal power planes for controlled impedance. The size
be implemented by using the previous, current, and next of the I/O bond pads are reduced to 70 m 70 m to
cycles’ comparison results to follow the direction of any phase keep pad capacitance to a minimum because the capacitance
transition. However, this only improves the performance by would otherwise limit the I/O bandwidth. With an effective
less than 1 dB. impedance at the I/O of 25 (for a doubly terminated 50-
If the peak-to-peak phase jitter is larger than one bit time, or line), the total I/O capacitance can not exceed 4.5 pF
if the transmitter and receiver operate at different frequencies, for 4-Gbit/s operation without losing 10% of the bit height
the tracking must allow bit(s) to overflow/underflow. For to the filtering. The 1 : 8 demultiplexing receiver and
example, if the SEL[2 : 0] signal changes from 0–0–1 to 8 : 1 multiplexing transmitter designs have capacitances of 2.2
1–0–0, the selected sample of the first cycle corresponds to and 1.2 pF, respectively, with 600 fF due to the pad and
the same bit as the selected sample of the following cycle. metal interconnects. An input time constant of ps is
This “underflow” condition must be appropriately handled by estimated from measurements sweeping the reference voltage
dropping one of the two samples. Typically, these samples for a single-ended input pulse. The width of the pulse with a
are of the same bit, and thus have the same value. However, different reference voltage determines the time constant.
in the case where they are different, if phase movement The performance of the link depends significantly on the
changes directions (the SEL signal returns to 0–0–1) in the I/O circuits. The minimum receivable amplitude of 50 mV
following cycle’s decision, dropping the latter one gives a was measured by using a fixed data pattern while changing
slight performance improvement. Similar to the “underflow” 6 This feature is not implemented as part of this test chip.
YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER 719

Fig. 13. Transmitter data eye.

the amplitude. This indicates the worst case input offset in


the bank of samplers. The transmitter data eye at 3.0 Gbits/s
is shown in Fig. 13 with the output driving a PRBS
sequence. The measured data rate is limited by the triggering
bandwidth of the oscilloscope. The maximum speed of the
transmitter was 4.8 Gbits/s, and was limited by the maximum
frequency of the ring oscillator used in the clock generation.
The multiple-phased clock generation (PLL) is crucial to the
performance of the link because the phase spacing determines
the bit time in the multiplexing/demultiplexing architecture,
and the supply sensitivity and loop bandwidth determine the
amount of jitter that needs to be tracked. Mismatches can
cause one phase to be shifted with respect to the others. In
the transmitter, the shift enlarges one bit, but reduces the next.
By measuring the spacing between edges, we can evaluate the
ability to match the phases tapped from the oscillators and
Fig. 14. Transmit-side DNL at various frequencies.
interpolators [3]. The differential nonlinearity (DNL) of the
phase spacing is plotted for the transmitter in Fig. 14 at various
frequencies. The error is expressed as a percentage of the ideal mismatches of the transistors in the clock generation circuits
bit time for all eight phase positions. While transmitting the [12]. The increase in error with decreasing oscillation fre-
PRBS pattern and using a trigger frequency of 1/8 the data quency, shown in Fig. 14, is an indication of these mismatches.
rate (internal clock rate), these spacings are measured with a The gate overdrive ( ) is less at lower oscillation
20-GHz bandwidth digital oscilloscope by the width of each of frequencies, making the phase spacing more sensitive to these
the eight data-eye patterns.7 If we use the data-rate frequency mismatches. Fig. 15 shows the measurement of the DNL for
as a trigger instead of using a divided frequency, the data eye four chips. The darker line indicates the average at each phase
of Fig. 13 overlaps all eight of the bits. The overlaid histogram position. The variation of this average across phase positions
shows that the 333-ps bit time is degraded by 90 ps due to potentially indicates some systematic error. However, because
equal contributions from jitter and errors in the transmitter the average is over a sample size of only four chips, and
phase spacing. the variation of the average is significantly smaller than the
The peak-to-peak variation, <±7% of the bit time, indicates variation between chips, the random component is believed to
very little degradation in bit width due to mismatches. The be the dominant source of static phase spacing error.
dominant cause of these bit-width variations is the and Although a systematic component of the offset can also be
7 The measurement uncertainty is the DNL is ±2 ps. expected from noise at any integer multiple of the oscillator
720 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Fig. 16. BER testing configuration.

Fig. 15. Transmit-side DNL for four chips.

frequency, it is not apparent in Fig. 15. Normally, noise such


as substrate or supply noise at the same frequency as the
oscillator would modulate the oscillator, causing a duty-cycle
error which spreads the phases in the first half cycle and
compresses the phases in the second half cycle. Since most
of the digital logic clock on this chip switches at (250
MHz), this effect of the clock buffer switching on the 500
MHz oscillator would cause different phase spacings for two
Fig. 17. Measured BER versus SNR.
consecutive oscillator cycles. However, Fig. 15 shows that
the average phase spacing errors from the second cycle is
nearly the same as the first cycle, indicating that this coupling is received and amplified by a avalanche photodiode (APD)
is negligible. Also, any systematic components from path followed by an amplifier. The output of the amplifier is either
mismatches (e.g., capacitive loading errors) are insignificant returned to the BERT for the baseline measurement, or sent
compared to the random source. into the chip configured in its transceiver mode. Because the
On the receive side, the DNL of the sample spacing is also BERT and optical amplifiers have a bandwidth limitation at
measured, as was shown for a 0.8- m process technology [1] 3 Gbits/s, the experimental results of this configuration are
to be <8% of the bit time. Receive clock phase spacing errors limited in data rate. As shown in Fig. 14, the phase spacing
reduce the effectiveness of the oversampling by increasing the at lower frequencies is worse, so the performance is slightly
sample spacing, causing both increased static phase error and worse than at 4 Gbits/s.
larger jitter. The BER versus SNR is plotted with SNR expressed in
Jitter in the transmitter can be measured by a outputting a optical power showing both the baseline and the DUT with
fixed pattern and measuring the jitter on the data transition. We a 1.5-dB penalty at BER (Fig. 17). The SNR penalty
can also measure the sampling clock jitter by looking at the for not having the selected sample at the middle of the data
sampler output while sweeping a clean input transition. The eye is shown in Fig. 18. Because of the phase spacing errors
window in which the sampler output is uncertain indicates the on the receive side, the penalty shown here is worse than
jitter with respect to the input. The supply sensitivitiy can also simulated. Since the quiescent jitter of the clock generation is
be measured by the increase in jitter due to induced supply smaller than the sample spacing (<83 ps), the phase tracking
noise with an internal switch that shorts between supply and is not active. In order to test the effectiveness of the phase
ground. The sensitivities of the transmit and receive PLL’s are picking, voltage steps are induced on the supply, causing 250-
0.2 and 0.3 ps/mV, respectively, with a similar peak-to-peak ps jitter on both the -PLL and -PLL. While this causes
quiescent jitter of 45 ps. the data eye to collapse, the receiver can still track this jitter
The BER testing is performed with two different configu- and maintain BER . Also, the transceiver is operated
rations. The first measurement is by feeding the transmitted with the transmitter and receiver at different frequencies. The
output directly back into the input. This yielded a BER of chip was able to track a frequency difference of 1 MHz with
. The second configuration is by placing the chip in a BER .
mock optical network (Fig. 16). A bit error rate tester (BERT) Table I shows some additional performance measurements
is used to generate the data pattern. The pattern is modulated of the chip. The total power dissipated is 1.5 W, with 1/3 from
onto a fiber-optic network. The optical power is measured by the clock generation and 1/3 from the receive-side logic. The
siphoning 1/10 of the total optical power. The optical signal minimum amplitude that can still maintain BER is 90
YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER 721

when additional noise is induced. This low accumulated jitter


implies that the lower tracking bandwidth of a PLL-based
clock recovery circuit can potentially perform equally. The
design of such a system is nontrivial, and still has challenges
in maintaining small static phase offsets. However, since the
phase picking has significant hardware overhead in the extra
number of input samplers and large digital processing, a PLL
would potentially offer similar performance with lower area
and power.

ACKNOWLEDGMENT
The authors would like to thank S. Sidiropoulos, B. Am-
rutur, K. Falakshahi, Vitesse Semiconductor, Prof. T. Lee,
Prof. L. Kazovsky, and their research groups for invaluable
discussions and assistance.
Fig. 18. Measured BER at various sampling phase.

TABLE I
TEST-CHIP PERFORMANCE
REFERENCES
[1] C.-K. Yang and M. Horowitz, “A 0.8 m CMOS 2.5 Gbps oversampling
receiver and transmitter for serial links,” IEEE J. Solid-State Circuits,
vol. 31, Dec. 1996.
[2] C. Gray et al., “A sampling technique and its CMOS implementation
with 1-Gb/s bandwidth and 25 ps resolution,” IEEE J. Solid-State
Circuits, vol. 29, Mar. 1994.
[3] J. Maneatis and M. Horowitz, “Precise delay generation using coupled
oscillators,” IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282, Dec.
1993.
[4] K. Lee et al., “A CMOS serial link for fully duplex data commu-
nications,” IEEE J. Solid-State Circuits, vol. 30, pp. 353–364, Apr.
1995.
2
[5] A. Fiedler et al., “A 1.0625Gb/s transceiver with 2 -oversampling and
transmit signal pre-emphasis,” in ISSCC’97 Dig. Tech. Papers, Feb.
1997, pp. 238–239.
[6] A. Widmer et al., “Single-chip 4 2 500 Mbaud CMOS transceiver,”
IEEE J. Solid-State Circuits, vol. 31, pp. 2004–2014, Dec. 1996.
mV with an internal eye height of 65 mV. The 24 mV of [7] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.
amplitude noise is primarily due to ringing from the package [8] W. Dally and J. Poulton, “A tracking clock recovery receiver for 4-Gb/s
inductance and on-chip output capacitance at the transmitter. signaling,” in Hot Interconnect97 Proc., Aug. 1997, p. 157.
[9] S. Sidiropoulos and M. Horowitz, “A semi-digital DLL with unlimited
phase shift capability and 0.08–400MHz operating range,” in ISSCC’95
Dig. Tech. Papers, Feb. 1995, pp. 332–333.
VI. CONCLUSION [10] J. E. McNamara, Technical Aspects of Data Communication, 2nd ed.
Bedford, MA: Digital, 1982.
Very high data rates are achievable in CMOS technolo- [11] S. Kim et al., “An 800Mbps multi-channel CMOS serial link with 3 2
gies by making extensive use of parallelism. Using an 8 : 1 oversampling,” in IEEE 1995 CICC Proc., Feb. 1995, p. 451.
demultiplexing at the input and a 8 : 1 multiplexing output [12] M. J. Pelgrom, “Matching properties of MOS transistors,” IEEE J.
Solid-State Circuits, vol. 24, p. 1433, Dec. 1989.
transmitter, we achieved a 4-Gbit/s transceiver while keeping [13] J. A. Crawford, Frequency Synthesizer Design Handbook. Boston,
all internal signals <500 MHz in a 0.5- m process technology. MA: Artech House, 1994.
[14] J. Proakis, Communication Systems Engineering. Englewood Cliffs,
The fundamental limitations of this approach are the I/O NJ: Prentice-Hall, 1994.
capacitance (increased due to the parallelism), the sampler
uncertainty, and the phase position accuracy of the multiple
clock phases.
Provisions were made in this design to handle very large
jitter accumulation of 83 ps/3 ns by a fast phase-picking
algorithm. The effectiveness of this architecture critically
depends on the jitter characteristics. Although a CMOS PLL Chih-Kong Ken Yang (S’93) received the B.S. and
M.S degrees in electrical engineering from Stanford
can potentially exhibit this large jitter due to supply noise, University, Stanford, CA, in 1992.
the measured jitter while operating this transceiver is only 50 He is currently pursuing the Ph.D. degree at
ps. This jitter is measured in a realistic noise environment Stanford University in the area of circuit design for
high-speed interfaces.
because of the presence of significant digital switching noise Mr. Yang is a member of Tau Beta Pi and Phi
from the large digital phase picker that can couple onto the Beta Kappa.
VCO elements. Since the jitter is less than the quantization
error, the advantage of the phase picking is only apparent
722 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Ramin Farjad-Rad (S’95) was born in Tehran, Mark A. Horowitz (S’77–M’78–SM’95) received
Iran, in 1971. He received the B.Sc. degree in the B.S. and M.S. degrees in electrical engineering
electrical engineering from Sharif University of from MIT in 1978, and the Ph.D. degree from
Technology, Tehran, in 1993 and the M.Sc. degree Stanford University, Stanford, CA, in 1984.
in electrical engineering from Stanford University, He is the Yahoo Founders Professor of Electrical
Stanford, CA, in 1995, where he is currently a Ph.D. Engineering and Computer Science at Stanford. His
candidate in electrical engineering. research area is in digital system design, and he has
He worked at SUN Microsystems Laboratories, led a number of processor designs including MIPS-
Mountain View, CA, on a 1.25-Gbit/s serial trans- X, one of the first processors to include an on-chip
ceiver for the fiber channel standard during the instruction cache, TORCH, a statically scheduled,
summer of 1995. Over the summer of 1996, he superscalar processor, and FLASH, a flexible DSM
worked at LSI Logic, Milpitas, CA, where he examined different multi-Gbit/s machine. He has also worked on a number of other chip design areas including
serial transceiver architectures. high-speed memory design, high-bandwidth interfaces, and fast floating point.
Mr. Farjad-Rad holds one U.S. patent, and is also the Bronze Medal Winner In 1990, he took a leave from Stanford to help start Rambus Inc., a company
of the 20th International Physics Olympiad, Warsaw, Poland. designing high-bandwidth memory interface technology. His current research
includes multiprocessor design, low-power circuits, memory design, and high-
speed links.
Dr. Horowitz is the recipient of a 1985 Presidential Young Investigator
Award and an IBM Faculty Development Award, as well as the 1993 Best
Paper Award from the International Solid-State Circuits Conference.

S-ar putea să vă placă și