Documente Academic
Documente Profesional
Documente Cultură
2, APRIL 2010
519
I. INTRODUCTION
A. TDC Architectures
A TDC commonly includes a coarse counter incremented at
each system clock and a fine counter generated by a more complex architecture. The latter can exploit the time domain [5][9],
[11], [12] or the phase domain [10], [13] to refine the coarse
counter. In the time domain, the fine portion of the TDC is implemented with delay buffers, where delayed versions of the
input data or the clock signal are fed into a parallel register. The
total delay is set to fit one system clock. The rising edges position in the delay chain allows the system to create a precise
timestamp. Such an approach achieves good timing resolution
)] with minimal logic [5]. However, it
[down to 31 ps (9 ps
is not PT invariant, as the buffers delays are very sensitive to
such parameters.
To overcome delay line problems, one can think of the Vernier
Delay Line (VDL) technique [6], [7]. In this scheme, both data
and clock are delayed with a slightly different time factor. This
approach is based on the fact that the measurement will be taken
on the time difference of the 2 delayed signals, assuming that
the PT variation will be identical on both lines. After each delay
unit, the data signal is latched in a flip-flop when a rising edge
of the longer delayed clock signal is detected. The goal is to
determine when the clock signal catches up with the latched
version of the data signal in the delay line. At this position in the
delay chain, a timestamp is produced at the desired resolution.
520
Fig. 1. Deserializer used in a four bit output configuration. The four bits represent the input at four moments, each separated by half a sampling period through
one system clock tick. For this example, the system clock is two times slower than the sampling clock.
The output bit stream (generated by the DDR sampling operation) length is user selectable and determines the system clocks
period in relation to the sampling clock (an example is shown in
Fig. 1). The stream can be 6 bit long at maximum, and the sampling clock period must be an integer factor of the system clock
period.
In this design context, the fabric is similar to a delay chain
without the manual placing and routing burden of logical-based
TDC implemented in FPGA. The maximum delay between bits
is defined by the maximum sampling clock the FPGA can tolerate (1 ns for the one used in the experiment) [14]). A close
study of the output bit stream of the deserializer allows for the
extraction of the timestamp of a rising or falling edge for a given
input signal. A transition in the output bit stream from 0 to 1 defines a rising edge and, a transition from 1 to 0 defines a falling
edge (Fig. 2).
During a system clock period (between two vertical bars of
the Fig. 2), there are two sampling clock periods that come one
after the other, composed of two rising and two falling edges.
During that time, four different instants are sampled because
the sampler unit operates in DDR mode. Bit 0 and 2 are sampled at the rising edge of the sampling clock, while bit 1 and 3
are sampled at the falling. The output bit stream consists of these
four bits, which cover an entire system clock period, allowing
smaller time intervals to be detected. An input signal transition
creates a transition in the output bit stream as seen in Fig. 2. A
close study of this stream, checked during each system clock period, allows the location of the rising/falling edge present on the
input signal. In this case, the transition of the output bit stream
signifies that a rising edge occurred after a series
of five consecutive bits equal to 0 (LSB being the first, chronologically speaking).
Two or more deserializers with delay blocks of different delay
time values can be used in parallel to increase the timing resolution. Being originally a high frequency serial transmitter/receiver, only one pin is wired to the deserializer to send/receive
data. Thus, the FPGA has unfortunately only one such delay unit
521
Fig. 2. Output bit stream example of a deserializer used in a four bit output configuration.
per I/O pin. A copy of the input signal to other I/O pins is then
required if the user wants to parallelize two or more deserializers.
The added deserializers perform as if they were increasing
the sampling clock speed, helping to detect smaller time intervals. This is done by delaying inputs with delay blocks that are
embedded in the I/O pin in front of the deserializer. They are
composed of 64-tap wrap-around delay element with a fixed,
calibrated tap resolution of 75 ps, as well as a 10 ps jitter [14].
They are calibrated by a delay control element inside the FPGA,
which ensures stability with respect to PT variance in the system
[15].
If the resolution needs to be doubled, two embedded parts
will be used and the input lines will be delayed by a quarter
of the sampling clock period. The resolution of the TDC will
increase, along with the number of deserializers and I/Os used.
In fact, it will only be limited by the lowest delay available in the
delay block of the FPGA, 75 ps. An example of a 4 deserializers
configuration (used for the experiment) is shown in Fig. 3.
522
Fig. 5. Photograph of the setup showing the custom board on the left and the
FX12 development board on the right.
steps of 312.5 ps. In that regard, 200000 events per channel, for
each delay values of the sweep, were generated by the DG535
at a frequency of 10 kHz for the TDC characterization. Experiments were conducted at different temperatures: 0 C, 20 C,
and 50 C, to confirm the stability of the system.
A statistical code density test was used to demonstrate that
the timing bins of this 16 bit TDC were well balanced. Random
signals were generated to cover each bit of the register: 200000
events for each of the 16 bins.
Also, the TDC was successfully used in a PET application for
timing alignment [17]. A radioactive source was placed at the
center of a LabPET scanner [18] and a PMT-based timing probe
was used to detect beta emission, in coincidence with gamma
events found in the detector ring. A timing resolution of 0.694 ns
was used in the TDC to match the resolution of the PET system.
A timing spectrum was acquired between each channel and the
alignment probe, allowing an offline program to calculate the
time offset between each channel. This timing offset was used
to correct event timestamps and reduce the systems global coincidence FHWM.
IV. RESULTS
Fig. 6 demonstrates a significant timing improvement for the
scanner after the timing alignment, where the overall timing
resolution passed from 11.1 to 6.6 ns for LYSO-LYSO coincidences, from 13.0 to 10.3 ns for LGSO-LGSO coincidences and
from 12.6 to 8.6 ns FWHM for mixed coincidences [17]. LYSO
and LGSO, defined as types of coincidences, are also the two
types of crystals used in the LabPET. They are both used on each
acquisition channel in a phoswich arrangement and coupled to
an avalanche photodiode. As each material has a different decay
Fig. 6. LabPET scanners overall timing resolution before and after timing
alignment [17]. Much tighter Gaussian curves are obtained after timing alignment with a timing resolution improvement of the scanner.
Fig. 7. Staircase graph of the time differences generated between two channels
and measured by the TDC at three different temperatures.
time, a digital analysis determines where the annihilation photons deposited their energy.
The power consumption for one TDC on a channel has been
estimated to 85 mW by the Xilinx platform tool XPower.
A staircase graph, produced from a series of 20 seconds measurements, taken at three different temperatures (0 C, 20 C,
and 50 C), shows a good linearity over all temperatures (Fig. 7).
The measured differential non-linearity (DNL) is less than 0.3
LSB for each operating temperature (Fig. 8).
The measured integral non-linearity (DNL) is less than 0.65
LSB for each operating temperature (Fig. 9).
523
For three different temperatures, linear regressions were realized for DNL measurement points at bin 1, 10 and 20, corre, 0 ns 3.125 ns, demonstrating a
sponding to delay of
temperature dependence lower than 0.125 ps C (Fig. 10).
The statistical code density test was applied on one channel
to confirm that the timing bins are properly balanced. Results of
524
INL histograms are all under 1 LSB, and demonstrate the high
resolution obtained. It was observed that a temperature change
does not degrade the TDC performance (Fig. 7). The temperature dependence was shown to be in the order of 0.1 ps per degree Celsius (Fig. 10). The power consumption of 85 mW per
channel is high when compared to 1 mW in [5], but it is not that
feature that makes the system attractive. All experiments were
conducted without calibration procedures for the TDC. This design only required a deserializers module instantiation, along
with the delay blocks, to overcome the calibration procedure.
This simple method makes the proposed architecture highly attractive compared to other methods where special care must be
taken in the routing process [9][12]. Even though better timing
resolution can be obtained with phase-based TDCs [10][13],
the logic required to build the proposed TDC is minimal because it is already embedded in the I/O unit circuitry. One minor
drawback of the proposed solution is the number of PCB wiring
needed when one wants to achieve sub-nanosecond resolutions.
Many I/Os must then be sacrificed and the architecture may become a limitation for multichannel applications. Therefore, a
compromise must be found between the resolution needed and
the number of channel that the system will analyze to get the
maximum out of this TDC concept. To increase performance of
the testing system, one should think of using LVDS 2.5 V transmission differential lines instead of CMOS 2.5 V single ended
transmission line in a next iteration. This would increase the performance (Figs. 711) by suppressing unwanted noise.
VI. CONCLUSION
A Sub-Nanosecond Time Interval Detection System with
312.5 ps timing resolution was designed and tested. It was
clearly shown to be PT invariant according to experimental
results. A low dead time of only 3 clock cycles was achieved
with the test firmware, which allows a high timestamp generation rate (66 MHz in this case). This TDC concept can be quite
useful for time detection applications, with a relatively modest
number of channels, because the use of dedicated external
TDCs can be avoided. The proposed TDC concept was tested,
with 0.694 ns resolution as a time alignment probe for timing
calibration of a PET scanner [17], confirming the reliability
and ease of use of the system.
REFERENCES
[1] M.-A. Ttrault et al., System architecture of the LabPET small animal PET scanner, IEEE Trans. Nucl. Sci., vol. 55, pp. 25462550.
[2] P. Vaska et al., RatCAP: miniaturized head-mounted PET for conscious rodent brain imaging, IEEE Trans. Nucl. Sci., vol. 51, no. 5, pt.
2, pp. 27182722, Oct. 2004.
[3] K. Ziemons et al., The ClearPET project: Development of a
2nd generation high-performance small animal PET scanner, Nucl.
Instrum. Meth. Phys. Res. A, vol. 537, no. 12, pp. 307311, Jan. 2005.
[4] R. Lecomte et al., Cardiac PET imaging of blood flow, metabolism,
and function in normal and infarcted rats, IEEE Trans. Nucl. Sci., vol.
51, no. 3, pt. 2, pp. 696704, Jun. 2004.
[5] J. W. H. e. A. S. Yousif, A fine resolution TDC architecture for next
generation PET imaging, IEEE Trans. Nucl. Sci., vol. 54, no. 5, pp.
15741582, Oct. 2007.
[6] P. Dudek et al., A high-resolution CMOS time-to-digital converter
utilizing a Vernier delay line, IEEE Trans. Solid State Circuits, vol.
35, no. 2, pp. 240247, Feb. 2000.
[7] A. H. C. e. G. W. Roberts, A jitter characterization system using a
component-invariant Vernier delay line, IEEE Trans. VLSI Syst., vol.
12, no. 1, pp. 7995, Jan. 2004.
[8] B. K. Swan et al., A 100-ps time-resolution CMOS time-to-digital
converter for positron emission tomography imaging applications,
IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 18391852, Nov.
2004.
[9] J. Wu and Z. Shi, The 10-ps wave union TDC: Improving FPGA TDC
resolution beyond its cell delay, presented at the NSS-MIC Conf. Rec.,
Dresden, Germany, 2008.
[10] S. S. Junnarkar et al., FPGA-Based self-calibrating time-to-Digital
converter for time-of-Flight experiments, IEEE Trans. Nucl. Sci., vol.
56, no. 4, pt. 3, pp. 23742379, Aug. 2009.
[11] S. S. Junnarkar, FPGA based front end instrumentation for Mariachi
experiment, presented at the NPSS-Real Time Conf., Beijing, China,
2009.
[12] D. K. Xie et al., Cascading delay line time-to-digital converter with 75
ps resolution and a reduced number of delay cells, Rev. Sci. Instrum.,
vol. 76, 2005.
[13] M. Z. Straayer et al., A multi-path gated ring oscillator TDC with
first-order noise shaping, IEEE Trans. Solid State Circuits, vol. 44,
no. 4, pp. 10891098, Apr. 2009.
[14] Virtex-4 FPGA Platform FPGAs: Complete Datasheet Xilinx Inc, 2008
[Online]. Available: http://www.xilinx.com
[15] Virtex-4 FPGA Platform FPGAs: User Guide Xilinx Inc, 2008 [Online]. Available: http://www.xilinx.com
[16] J. Snow, Xilinx Application Note 861: Virtex-4 and Virtex-5
FPGA Families: Efficient 8X Oversampling Asynchronous Serial Data Recovery Using IDELAY 2007 [Online]. Available:
http://www.xilinx.com
[17] M. Bergeron et al., A handy time alignment probe for timing calibration of PET scanners, Nucl. Instrum. Meth. Phys. Res. A, vol. 599, no.
1, pp. 113117, 2009.
[18] M. Bergeron et al., Performance evaluation of the LabPET APDbased digital PET scanner, IEEE Trans. Nucl. Sci., vol. 56, no. 1, pp.
1016, Feb. 2009.