Sunteți pe pagina 1din 6

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 57, NO.

2, APRIL 2010

519

A Sub-Nanosecond Time Interval Detection System


Using FPGA Embedded I/O Resources
Louis Arpin, Student Member, IEEE, Mlanie Bergeron, Marc-Andr Ttrault, Member, IEEE,
Roger Lecomte, Member, IEEE, and Rjean Fontaine, Senior Member, IEEE

AbstractThe Time to Digital Converter (TDC) concept is quite


useful to obtain crucial timing information for nuclear radiation
detection such as PET imaging applications. The high resolution
nature of TDCs makes them sensitive to process and temperature
variations. Thus, a calibration procedure must often be performed
to improve measurements. Moreover, field programmable gate
array (FPGA)-based TDC exacerbates this problem because the
transistor topology is fixed on the fabric for low cost purposes. A
Sub-Nanosecond Time Interval Detection System, able to overcome process and temperature (PT) variations, was designed and
implemented in an FPGA. Unlike other FPGA-based TDCs, this
new solution uses embedded PT invariant digital delay lines and
deserializers included in I/O ports, along with a stable clock oscillator resulting in low logic usage. The proposed design consists
of oversampling digital signals to enable the creation of absolute
timestamps down to 75 ps resolution (31.85 psRMS ). As a proof of
concept, this paper reports timing resolution down to 321.5 ps.
Index TermsASIC, CMOS, double data rate (DDR), field
programmable gate array (FPGA), positron emission tomography
(PET), time to digital converter (TDC).

I. INTRODUCTION

time makes them mainly suitable for highly specialized and/or


mass market applications.
The flexibility of FPGAs, their low cost compared to ASIC
architectures and their design short turnaround time make them
a good alternative to implement TDCs. Many techniques have
been proposed so far [9][13]. Among them, a 64 cells long
delay chain was implemented to obtain a resolution as low as
10 ps [9]. Unfortunately, such designs must include a calibration process to correct temperature variations which affect the
measurements. This process uses a large part of the logic that
could be available for more useful purposes. Moreover, the calibration procedure doesnt ensure full system invariance against
PT fluctuations. These techniques also require a manual placement and routing of delay chains to ensure uniform propagation
delay values between the latches. By doing so, it makes the design phase much harder.
The proposed TDC architecture is aimed at providing a PT
invariant solution that can be easily implemented in an FPGA
by using embedded deserializers such as those included in the
I/O ports circuitry of Xilinx Virtex-4/5 FPGAs.
II. THEORY

IGH precision timing measurements are mandatory for


high energy physics experiments, as well as for positron
emission tomography (PET) scanners [1][3]. These experiments rely on a precise timing measurement of the occurring
events, represented by logic transitions from particle detectors.
For example, in PET, timing information, collected together
with the energy level of the annihilation photons, are the basics
for the creation of an image of the molecular activity inside a
living organism [4]. The TDCs high resolution makes it very
sensitive to PT variations. In that regard, its design must be
based on simple electronics with high stability.
In various applications, TDCs with resolution as good as
31 ps have already been reported in custom CMOS ASIC
designs [5][8]. The ASIC flexibility makes it desirable for
specific adjustments that could make PT invariant circuits.
However, their design high complexity and slow turnaround
Manuscript received May 23, 2009; revised August 28, 2009. Current version
published April 14, 2010. This work was supported by grants from the Natural
Science and Engineering Research Council of Canada (NSERC).
L. Arpin, M.-A. Ttrault, and R. Fontaine are with the Department of
Electrical and Computer Engineering, Universit de Sherbrooke, QC, Canada
(e-mail: louis.arpin@usherbrooke.ca).
M. Bergeron and R. Lecomte are with the Department of Nuclear Medicine
and Radiobiology, Universit de Sherbrooke, QC, Canada.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNS.2009.2039804

A. TDC Architectures
A TDC commonly includes a coarse counter incremented at
each system clock and a fine counter generated by a more complex architecture. The latter can exploit the time domain [5][9],
[11], [12] or the phase domain [10], [13] to refine the coarse
counter. In the time domain, the fine portion of the TDC is implemented with delay buffers, where delayed versions of the
input data or the clock signal are fed into a parallel register. The
total delay is set to fit one system clock. The rising edges position in the delay chain allows the system to create a precise
timestamp. Such an approach achieves good timing resolution
)] with minimal logic [5]. However, it
[down to 31 ps (9 ps
is not PT invariant, as the buffers delays are very sensitive to
such parameters.
To overcome delay line problems, one can think of the Vernier
Delay Line (VDL) technique [6], [7]. In this scheme, both data
and clock are delayed with a slightly different time factor. This
approach is based on the fact that the measurement will be taken
on the time difference of the 2 delayed signals, assuming that
the PT variation will be identical on both lines. After each delay
unit, the data signal is latched in a flip-flop when a rising edge
of the longer delayed clock signal is detected. The goal is to
determine when the clock signal catches up with the latched
version of the data signal in the delay line. At this position in the
delay chain, a timestamp is produced at the desired resolution.

0018-9499/$26.00 2010 IEEE

520

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 57, NO. 2, APRIL 2010

Fig. 1. Deserializer used in a four bit output configuration. The four bits represent the input at four moments, each separated by half a sampling period through
one system clock tick. For this example, the system clock is two times slower than the sampling clock.

Regrettably, the VDL technique requires considerable logical


resources to cover a reasonable time interval.
In the phase domain [10], [13], two ring oscillators, with
slightly different clock frequencies, have been implemented.
The difference in frequency defines the resolution of the TDC.
A leading edge, asynchronous to the system clock, associated
with data, activates the slow clock. The trailing edge, which
could be a reference clock, is synchronous, and activates the fast
clock. Each oscillator increments a counter, which is latched
when the fast clock catches up with the slow clock. The fine
timing is a function of these counters. Even though this system
has good resolution and precision, these operations require significant logical resources in order to increase the dead time of
the system. Moreover, to achieve high timing resolution, the oscillators must operate at high frequency (in GHz range), which
leads to high power consumption.
After analyzing these different architectures, none are really
addressing the needs of being PT invariant and of being easy to
implement in FPGA, while achieving a good timing precision
and resolution.
B. Concept of the 1-Bit Sampler With Deserializers
The key concept of the proposed TDC consists of sampling a
digital signal with a Double Data Rate (DDR) 1 bit comparator,
linked to a high frequency deserializer which is used to reduce
the system clock frequency. Both structures are embedded in the
I/O units of FPGAs and make a highly compact block able to
sample up to 1100 Mbits per second (MBPS) per I/O pin using
the fastest speed grade device [14]. By using the DDR sampling
feature, the system samples the input signal on both the rising
and the falling edge of the sampling clock, which explains why,
with a 500 MHz sampling clock, a 1 GHz sampling frequency
is achieved (Fig. 2). The sampling clock can be generated by
doubling the clock frequency of the system clock by using FPGA
internal resources and, thus, a 250 MHz clock oscillator can be
used for both logical resources and deserializers.

The output bit stream (generated by the DDR sampling operation) length is user selectable and determines the system clocks
period in relation to the sampling clock (an example is shown in
Fig. 1). The stream can be 6 bit long at maximum, and the sampling clock period must be an integer factor of the system clock
period.
In this design context, the fabric is similar to a delay chain
without the manual placing and routing burden of logical-based
TDC implemented in FPGA. The maximum delay between bits
is defined by the maximum sampling clock the FPGA can tolerate (1 ns for the one used in the experiment) [14]). A close
study of the output bit stream of the deserializer allows for the
extraction of the timestamp of a rising or falling edge for a given
input signal. A transition in the output bit stream from 0 to 1 defines a rising edge and, a transition from 1 to 0 defines a falling
edge (Fig. 2).
During a system clock period (between two vertical bars of
the Fig. 2), there are two sampling clock periods that come one
after the other, composed of two rising and two falling edges.
During that time, four different instants are sampled because
the sampler unit operates in DDR mode. Bit 0 and 2 are sampled at the rising edge of the sampling clock, while bit 1 and 3
are sampled at the falling. The output bit stream consists of these
four bits, which cover an entire system clock period, allowing
smaller time intervals to be detected. An input signal transition
creates a transition in the output bit stream as seen in Fig. 2. A
close study of this stream, checked during each system clock period, allows the location of the rising/falling edge present on the
input signal. In this case, the transition of the output bit stream
signifies that a rising edge occurred after a series
of five consecutive bits equal to 0 (LSB being the first, chronologically speaking).
Two or more deserializers with delay blocks of different delay
time values can be used in parallel to increase the timing resolution. Being originally a high frequency serial transmitter/receiver, only one pin is wired to the deserializer to send/receive
data. Thus, the FPGA has unfortunately only one such delay unit

ARPIN et al.: SUB-NANOSECOND TIME INTERVAL DETECTION SYSTEM

521

Fig. 2. Output bit stream example of a deserializer used in a four bit output configuration.

per I/O pin. A copy of the input signal to other I/O pins is then
required if the user wants to parallelize two or more deserializers.
The added deserializers perform as if they were increasing
the sampling clock speed, helping to detect smaller time intervals. This is done by delaying inputs with delay blocks that are
embedded in the I/O pin in front of the deserializer. They are
composed of 64-tap wrap-around delay element with a fixed,
calibrated tap resolution of 75 ps, as well as a 10 ps jitter [14].
They are calibrated by a delay control element inside the FPGA,
which ensures stability with respect to PT variance in the system
[15].
If the resolution needs to be doubled, two embedded parts
will be used and the input lines will be delayed by a quarter
of the sampling clock period. The resolution of the TDC will
increase, along with the number of deserializers and I/Os used.
In fact, it will only be limited by the lowest delay available in the
delay block of the FPGA, 75 ps. An example of a 4 deserializers
configuration (used for the experiment) is shown in Fig. 3.

Fig. 3. Schematic of a 4-deserializer configuration in Xilinx Virtex 4 FPGA


that creates the ISERDES register on channel 1.

III. MATERIALS AND METHODS


A. Validation Setup
The concept of the Sub-Nanosecond Time Interval Detection System was validated for a resolution of 312.5 ps with a
FX12 development board, which is manufactured by Digilent,
and built around a Xilinx Virtex-4 FPGA. A clock of 200 MHz
(5 ns) is used to run the firmware and provides the desired coarse
resolution. This clock is doubled to 400 MHz using the internal
PLL and samples the input signals (2.5 ns, DDR). A DG535
digital delay and pulse generator from Stanford Research Systems, which has four digital delay/pulse generator channels, creates the start and stop pulses, which activate the TDC. Finally,
a custom card converts the TTL signals from the generator into
CMOS 2.5 V signals and fans out each of the signals to 4 dif-

ferent I/Os of the Xilinx device, each with a 312.5 ps delay


offset, providing the final granularity (Fig. 4).
For testing purposes, when a rising edge is detected in the ISERDES register of a channel (Fig. 3), a timestamp is generated,
and the system waits for the other channel to do the same. When
this happens, the difference between the two timestamps is calculated to create a timing spectrum sent through a serial link to
a computer for a posteriori analysis. The dead time of this delay
measurement is 3 system clock ticks, 15 ns in this case, corresponding to a timestamp processing frequency of 66 MHz.
B. Experimental Conditions
On each channel, a set of square waves was generated with
a delay sweep, between the channels, from 0 ns to 3.125 ns, by

522

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 57, NO. 2, APRIL 2010

Fig. 4. TDC validation setup verifying delays between two channels.

Fig. 5. Photograph of the setup showing the custom board on the left and the
FX12 development board on the right.

steps of 312.5 ps. In that regard, 200000 events per channel, for
each delay values of the sweep, were generated by the DG535
at a frequency of 10 kHz for the TDC characterization. Experiments were conducted at different temperatures: 0 C, 20 C,
and 50 C, to confirm the stability of the system.
A statistical code density test was used to demonstrate that
the timing bins of this 16 bit TDC were well balanced. Random
signals were generated to cover each bit of the register: 200000
events for each of the 16 bins.
Also, the TDC was successfully used in a PET application for
timing alignment [17]. A radioactive source was placed at the
center of a LabPET scanner [18] and a PMT-based timing probe
was used to detect beta emission, in coincidence with gamma
events found in the detector ring. A timing resolution of 0.694 ns
was used in the TDC to match the resolution of the PET system.
A timing spectrum was acquired between each channel and the
alignment probe, allowing an offline program to calculate the
time offset between each channel. This timing offset was used
to correct event timestamps and reduce the systems global coincidence FHWM.
IV. RESULTS
Fig. 6 demonstrates a significant timing improvement for the
scanner after the timing alignment, where the overall timing
resolution passed from 11.1 to 6.6 ns for LYSO-LYSO coincidences, from 13.0 to 10.3 ns for LGSO-LGSO coincidences and
from 12.6 to 8.6 ns FWHM for mixed coincidences [17]. LYSO
and LGSO, defined as types of coincidences, are also the two
types of crystals used in the LabPET. They are both used on each
acquisition channel in a phoswich arrangement and coupled to
an avalanche photodiode. As each material has a different decay

Fig. 6. LabPET scanners overall timing resolution before and after timing
alignment [17]. Much tighter Gaussian curves are obtained after timing alignment with a timing resolution improvement of the scanner.

Fig. 7. Staircase graph of the time differences generated between two channels
and measured by the TDC at three different temperatures.

time, a digital analysis determines where the annihilation photons deposited their energy.
The power consumption for one TDC on a channel has been
estimated to 85 mW by the Xilinx platform tool XPower.
A staircase graph, produced from a series of 20 seconds measurements, taken at three different temperatures (0 C, 20 C,
and 50 C), shows a good linearity over all temperatures (Fig. 7).
The measured differential non-linearity (DNL) is less than 0.3
LSB for each operating temperature (Fig. 8).
The measured integral non-linearity (DNL) is less than 0.65
LSB for each operating temperature (Fig. 9).

ARPIN et al.: SUB-NANOSECOND TIME INTERVAL DETECTION SYSTEM

523

Fig. 8. DNL histograms for three different temperatures.

Fig. 9. INL histograms for three different temperatures.

Fig. 10. DNL as a function of temperature for three TDC bins.

For three different temperatures, linear regressions were realized for DNL measurement points at bin 1, 10 and 20, corre, 0 ns 3.125 ns, demonstrating a
sponding to delay of
temperature dependence lower than 0.125 ps C (Fig. 10).
The statistical code density test was applied on one channel
to confirm that the timing bins are properly balanced. Results of

, for each bin, show a good measure200000 counts


ments consistency (Fig. 11).
V. DISCUSSION
The results obtained from the staircase graph present a monotonic transfer function with no missing time intervals. DNL and

524

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 57, NO. 2, APRIL 2010

Fig. 11. Statistical code density test on one channel.

INL histograms are all under 1 LSB, and demonstrate the high
resolution obtained. It was observed that a temperature change
does not degrade the TDC performance (Fig. 7). The temperature dependence was shown to be in the order of 0.1 ps per degree Celsius (Fig. 10). The power consumption of 85 mW per
channel is high when compared to 1 mW in [5], but it is not that
feature that makes the system attractive. All experiments were
conducted without calibration procedures for the TDC. This design only required a deserializers module instantiation, along
with the delay blocks, to overcome the calibration procedure.
This simple method makes the proposed architecture highly attractive compared to other methods where special care must be
taken in the routing process [9][12]. Even though better timing
resolution can be obtained with phase-based TDCs [10][13],
the logic required to build the proposed TDC is minimal because it is already embedded in the I/O unit circuitry. One minor
drawback of the proposed solution is the number of PCB wiring
needed when one wants to achieve sub-nanosecond resolutions.
Many I/Os must then be sacrificed and the architecture may become a limitation for multichannel applications. Therefore, a
compromise must be found between the resolution needed and
the number of channel that the system will analyze to get the
maximum out of this TDC concept. To increase performance of
the testing system, one should think of using LVDS 2.5 V transmission differential lines instead of CMOS 2.5 V single ended
transmission line in a next iteration. This would increase the performance (Figs. 711) by suppressing unwanted noise.
VI. CONCLUSION
A Sub-Nanosecond Time Interval Detection System with
312.5 ps timing resolution was designed and tested. It was
clearly shown to be PT invariant according to experimental
results. A low dead time of only 3 clock cycles was achieved
with the test firmware, which allows a high timestamp generation rate (66 MHz in this case). This TDC concept can be quite
useful for time detection applications, with a relatively modest
number of channels, because the use of dedicated external
TDCs can be avoided. The proposed TDC concept was tested,
with 0.694 ns resolution as a time alignment probe for timing
calibration of a PET scanner [17], confirming the reliability
and ease of use of the system.

REFERENCES
[1] M.-A. Ttrault et al., System architecture of the LabPET small animal PET scanner, IEEE Trans. Nucl. Sci., vol. 55, pp. 25462550.
[2] P. Vaska et al., RatCAP: miniaturized head-mounted PET for conscious rodent brain imaging, IEEE Trans. Nucl. Sci., vol. 51, no. 5, pt.
2, pp. 27182722, Oct. 2004.
[3] K. Ziemons et al., The ClearPET project: Development of a
2nd generation high-performance small animal PET scanner, Nucl.
Instrum. Meth. Phys. Res. A, vol. 537, no. 12, pp. 307311, Jan. 2005.
[4] R. Lecomte et al., Cardiac PET imaging of blood flow, metabolism,
and function in normal and infarcted rats, IEEE Trans. Nucl. Sci., vol.
51, no. 3, pt. 2, pp. 696704, Jun. 2004.
[5] J. W. H. e. A. S. Yousif, A fine resolution TDC architecture for next
generation PET imaging, IEEE Trans. Nucl. Sci., vol. 54, no. 5, pp.
15741582, Oct. 2007.
[6] P. Dudek et al., A high-resolution CMOS time-to-digital converter
utilizing a Vernier delay line, IEEE Trans. Solid State Circuits, vol.
35, no. 2, pp. 240247, Feb. 2000.
[7] A. H. C. e. G. W. Roberts, A jitter characterization system using a
component-invariant Vernier delay line, IEEE Trans. VLSI Syst., vol.
12, no. 1, pp. 7995, Jan. 2004.
[8] B. K. Swan et al., A 100-ps time-resolution CMOS time-to-digital
converter for positron emission tomography imaging applications,
IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 18391852, Nov.
2004.
[9] J. Wu and Z. Shi, The 10-ps wave union TDC: Improving FPGA TDC
resolution beyond its cell delay, presented at the NSS-MIC Conf. Rec.,
Dresden, Germany, 2008.
[10] S. S. Junnarkar et al., FPGA-Based self-calibrating time-to-Digital
converter for time-of-Flight experiments, IEEE Trans. Nucl. Sci., vol.
56, no. 4, pt. 3, pp. 23742379, Aug. 2009.
[11] S. S. Junnarkar, FPGA based front end instrumentation for Mariachi
experiment, presented at the NPSS-Real Time Conf., Beijing, China,
2009.
[12] D. K. Xie et al., Cascading delay line time-to-digital converter with 75
ps resolution and a reduced number of delay cells, Rev. Sci. Instrum.,
vol. 76, 2005.
[13] M. Z. Straayer et al., A multi-path gated ring oscillator TDC with
first-order noise shaping, IEEE Trans. Solid State Circuits, vol. 44,
no. 4, pp. 10891098, Apr. 2009.
[14] Virtex-4 FPGA Platform FPGAs: Complete Datasheet Xilinx Inc, 2008
[Online]. Available: http://www.xilinx.com
[15] Virtex-4 FPGA Platform FPGAs: User Guide Xilinx Inc, 2008 [Online]. Available: http://www.xilinx.com
[16] J. Snow, Xilinx Application Note 861: Virtex-4 and Virtex-5
FPGA Families: Efficient 8X Oversampling Asynchronous Serial Data Recovery Using IDELAY 2007 [Online]. Available:
http://www.xilinx.com
[17] M. Bergeron et al., A handy time alignment probe for timing calibration of PET scanners, Nucl. Instrum. Meth. Phys. Res. A, vol. 599, no.
1, pp. 113117, 2009.
[18] M. Bergeron et al., Performance evaluation of the LabPET APDbased digital PET scanner, IEEE Trans. Nucl. Sci., vol. 56, no. 1, pp.
1016, Feb. 2009.

S-ar putea să vă placă și