# A 9 b, 1.25 ps Resolution Coarse–Fine Time-to-Digital Converter in 90 nm CMOS that Amplifies a Time Residue

Minjae Lee, Student Member, IEEE, and Asad A. Abidi, Fellow, IEEE

Abstract-This paper presents the design of a coarse-fine time-to-digital converter (TDC) that amplifies a time residue to improve time resolution, similar to a coarse-fine analog-to-digital converter (ADC). A new digital circuit has been developed to amplify the time residue with a higher gain (>16) and larger range (>80 ps) than existing solutions do. However, adapting the conventional coarse-fine architecture from ADCs is not an appropriate solution for TDCs: input time cannot be stored, and the gain of a time amplifier (TA) cannot be controlled precisely. This paper proposes a new coarse-fine TDC architecture by using an array of time amplifiers and two identical fine TDCs that compensate for the variation of the TA gain during the conversion process. The measured DNL and INL are  $\pm 0.8$  LSB and  $\pm 3$  LSB, respectively, with a value of 1.25 ps per 1 LSB, while the standard deviation of output code for constant inputs remains below 1 LSB across the TDC range. Although the nonlinearity is larger than 1 LSB, using an INL lookup table or better matched delays in the coarse TDC delay chain will improve the linearity further.

*Index Terms*—Coarse–fine architecture, open-loop residue amplification, subrange normalization, time amplifier, time-to-digital converter (TDC).

#### I. INTRODUCTION

TIME-TO-DIGITAL converter (TDC) is similar to an analog-to-digital converter (ADC), except that, instead of quantizing voltage or current, the TDC quantizes time intervals between two rising edges. Originally developed for nuclear experiments to locate single-shot events [1], the TDC is now being used in many applications such as laser range finders, space science instruments, and measurement devices. Recently, it has been employed to measure phase in all-digital phase-locked loops (PLLs) [2]. The TDC replaces a charge pump, the only true analog component in conventional PLLs, and allows the output word to drive a digital loop filter. An all-digital PLL brings with it the advantages of programmability and easy calibration. However, just as in any digital replacement of an analog function, the TDC creates quantization noise, which dominates the loop's in-band phase noise in a digital PLL [2]. Because both the resolution and linearity of the TDC affect system performance in most applications, this paper focuses on a method to improve both, simultaneously. In order to simplify the design, two inputs, the Start and Stop signals, are assumed



Fig. 1. TDC. (a) Chain of buffers. (b) Vernier delay line.

to be asynchronous events; their time difference is within the measurement range of 640 ps, and their rates are similarly low.

The delay chain of buffers [1], [3] and the Vernier delay line [4] are well-known methods to realize a TDC. In the delay chain shown in Fig. 1(a), the rising edge of the Start signal propagates through the chain of buffers; when the rising edge of the Stop signal arrives, a flip-flop samples the output of each buffer and produces a thermometer code that locates the relative time interval. However, this simple scheme cannot resolve the time interval better than a single buffer delay. On the other hand, using the delay difference between unequal buffers, the Vernier delay line in Fig. 1(b) can resolve more finely, but its area increases linearly with the resolution, and the devices must match more tightly. This leads to high power consumption. Calibrating the time offsets in the buffers and comparators can improve the resolution [5], but the measurement range is limited because of the calibration complexity.

We look to the evolution of ADCs, where there is much experience in solving similar problems. A coarse–fine ADC improves resolution by amplifying the residue between the input and closest coarse level, then quantizing the *amplified residue* again with the same coarse resolution. So, this brings us to the concept of time amplification. A coarse–fine TDC becomes feasible if a *time amplifier* (TA) can be realized [6].

The previous works that described the TA [7], [8] had limited control over gain and range. In this paper, the operating principle of the TA is revisited to see possible improvement in gain and range. A new circuit is proposed to control each independently to have high gain and large range. This circuit enables residue

Manuscript received August 24, 2007; revised November 3, 2007.

The authors are with the Electrical Engineering Department, University of California, Los Angeles, CA 90095 USA (e-mail: minjae@ee.ucla.edu).

Digital Object Identifier 10.1109/JSSC.2008.917405



Fig. 2. (a) SR latch followed by an XOR. (b) Regeneration process in SR latch. (c) Relationship between the regeneration time and the initial time difference.

amplification, which makes coarse-fine architecture attractive to improve the resolution.

Section II describes the operating principle of a TA and its proposed realization. Using the new TA, a coarse–fine TDC architecture and its detailed circuit implementations are elaborated in Section III. Section IV describes the circuit modification required to compensate its nonidealities. Digital calibration techniques are proposed to compensate offsets and gain mismatch in the amplifier. Measurement results are discussed in Section V. Finally, Section VI summarizes and concludes this work.

# II. TIME AMPLIFIER

#### A. Principle of TA

Arising out of studies of coincidence and metastability, a TA was first reported in [7]. It exploits the variable delay of an SR latch subject to nearly coincident input edges. An SR latch followed by an XOR gate, shown in Fig. 2(a), is one of the main building blocks of the TA. If rising edges are applied to S and to R at almost the same time, the latch will be metastable. After both inputs go to high, the initial voltage A(0) developed at the output of SR latch, is proportional to the input initial time difference  $\Delta T_{\rm SR}$ , and the positive feedback in the latch forces the output eventually to a binary level, one or zero, as shown in Fig. 2(b). The SR latch output voltage difference A(t) is

$$A(t) = A(0) \cdot e^{t/\tau},$$
  
for  $A(0) = \alpha \cdot \Delta T_{\rm SR}$  (1)

where  $\alpha$  is the proportional factor,  $\tau = C/g_m$  is the regeneration time constant, C is the output capacitance of a NAND gate, and  $g_m$  represents the transconductance of the NAND gate when it is metastable. When the outputs of the latch split far enough and their difference A(t) reach the threshold voltage  $V_{TH}$ , the XOR gate toggles to 1, indicating that regeneration is complete. It should be noted that a conventional static CMOS XOR gate is not appropriate for this purpose because while both inputs of



Fig. 3. (a) Concept of a TA. (b) Shifted SR latch delay characteristics. (c) TA characteristic.

the XOR travel around half of  $V_{dd}$ , the output becomes unstable. As the time difference between the two inputs gets shorter, it takes a longer period of time for the SR latch to regenerate. The relationship between regeneration time and the initial time difference is the logarithmic function as shown in Fig. 2(c). The following even symmetric logarithmic function is used to build a TA:

$$\Delta T_{\rm OUT} = \tau \cdot (\log V_{\rm TH} - \log |\alpha \cdot \Delta T_{\rm SR}|).$$
(2)

By adding a time offset  $T_{\text{off}}$  to one of the two inputs in an SR latch, the delay characteristic can be shifted either to the left or to the right. With two SR latches and delays in opposite inputs, represented by Fig. 3(a), we obtain two opposite shifts: one to the left and the other to the right, as shown in Fig. 3(b). The final delay characteristic in Fig. 3(c) can be obtained by subtracting the two characteristic curves in Fig. 3(b) from each other. The subtraction is realized by measuring the output time difference. Although the whole characteristic curve is nonmonotonic, the region around zero can be used as a TA. The final TA equation is derived in the following way, which is the same as in [7]:

$$T_{\text{OUT}} = \tau \cdot [\log(T_{\text{off}} + T_{\text{IN}}) - \log(T_{\text{off}} - T_{\text{IN}})],$$
  
for  $-T_{\text{off}} < T_{\text{IN}} < T_{\text{off}}.$  (3)

The small-signal gain is

$$A_T = \frac{2C}{g_m T_{\text{off}}} \tag{4}$$

where  $g_{\rm m}$  is the transconductance of a NAND gate in metastability and C is the capacitance at its output. From (3) and (4), it



Fig. 4. Conventional implementation of a TA.

should be noted that both the gain and linear range can be controlled by the time offset  $T_{\rm off}$ .

#### B. Conventional Realization

The  $T_{\text{off}}$  in a TA is analogous to the overdrive voltage in a CMOS voltage amplifier, which controls the gain and the range. One way to realize the  $T_{\rm off}$  is to use unbalanced SR latches [7], [8]. In the NAND gates N1 and N2 in Fig. 4, one of the two NFETs is sized 30% larger or smaller than its normal size, which generates a time offset. With this method, a gain of 10 is obtained with an input linear range of only  $\pm 6$  ps in 0.18  $\mu$ m CMOS [8]. This TA, which controls the gain and the range with device offsets, poses several problems. First, the linear range is too small to be used in the proposed TDC, which requires the range to be greater than one buffer delay, which is about 20 ps. Second, the small range  $T_{\text{off}}$  increases its sensitivity to process, voltage, and temperature (PVT) variation and the input rise time. Furthermore, according to (4), the gain is inversely proportional to the range  $T_{\text{off}}$ . It seems that high gain and large range cannot be achieved at the same time with this method.

# C. Proposed TA

The TAs used in this design must have a high gain (>16 to use a 4 b fine TDC) and a linear range large enough to cover the propagation delay ( $T_{\rm d} = 20 \text{ ps}$ ) of one stage in a delay chain. In Fig. 5(a), we show a newly designed TA where  $T_{\text{off}}$  is due to two inverters. Realizing both  $T_{\text{off}}$  and  $T_{\text{d}}$  with inverters makes it easy to obtain the sought linear range across PVT variation. The value of  $T_{\rm off}$  is designed to be around 80 ps, which is much larger than  $T_{\rm d}$  in order to have better linearity. By making  $T_{
m off}$  greater than  $T_{\rm d}$ , the operation in the region around the peaks in the TA gain curve is avoided. Timing uncertainty predominates in that region because at the peaks, in spite of a large input time, inputs to the SR latch arrive coincidently, and the small voltage difference is developed initially at the outputs of the SR latch. This small signal voltage reduces the signal-to-noise ratio (SNR) during the regeneration process. Therefore, the useful operating range of the TA is limited to less than  $T_{\text{off}}$ . Another consequence of the large  $T_{\text{off}}$  is lowered gain, as depicted in Fig. 5(b). To recover the gain, the ratio  $C/g_m$  for each NAND gate is increased by adding more capacitance at the output of the latches and by reducing the gm of the transistors, which provide a positive feedback during regeneration. This results in a new characteristic having a gain of 20 and a useful range of  $\pm 40$  ps, which is adequate for our



Fig. 5. (a) Proposed TA. (b) Inverters and capacitors increase the gain and the range.



Fig. 6. (a) XOR in TA. (b) Simplified circuit from XOR. (c) Final TA circuit.

needs. The capacitors added to boost the gain are implemented with MOSFETs because they take up less space.

To complete the TA, a special XOR circuit, shown in Fig. 6(a), is needed after the two latches. During time amplification, the inputs to the XOR do not reach binary levels, and thus output can be unstable if a conventional XOR gate is used. This signal condition requires a new circuit to disable the XOR function during that duration. This problem can be resolved by using one inverter's input as the power supply for the other and *vice versa*. Now the outputs of the inverters will toggle only when X and Y reach valid logic levels, and the outputs stay low when X and Y hover around half of V<sub>dd</sub>. Additionally, an OR gate can be attached to complete the XOR function. The circuit in Fig. 6(a) is further simplified by eliminating the OR gate, as shown in Fig. 6(b). This simplification is achieved because the valid input



Fig. 7. Conceptual diagram of the proposed coarse-fine TDC architecture.

to the TA lies between  $-T_{\text{off}}$  and  $T_{\text{off}}$  [see Fig. 6(c)] and only one side of the symmetric delay curve in (2) is used. Since the XOR needs to detect either the "10" or "01" patterns, only one of the gated buffers in Fig. 6(b) is necessary. However, a dummy buffer is used to balance the loading on the SR latch as shown in Fig. 6(c).

# III. COARSE-FINE TDC CIRCUIT

#### A. Proposed Coarse–Fine Architecture

The resolution of conventional TDCs can be increased simply by placing a TA at the input of a delay line, as in [8]. However, such implementation only allows the high resolution in a narrow measurement range over which a single amplifier can operate. Also, it does not provide a method to accurately control the gain over PVT variation. So, this paper focuses on the ways to increase the resolution without reducing the measurement range of a TDC and to calibrate the gain of the TA digitally in order to use it in the proposed coarse–fine TDC.

The conceptual diagram of the proposed coarse-fine TDC architecture is shown in Fig. 7. Unlike voltage, input time cannot be stored unless it is transformed to other forms such as voltage or current, but such transformation always introduces unwanted distortion and noise. Therefore, to process the input time as it is, every possible time residue must be created and amplified separately. Although this creation involves time subtraction from coarse levels, in multiples of  $T_{\rm d}$ , it can be readily available in a buffer chain, which is shown in the dashed box of Fig. 7. The time difference between the edge of each buffer, Start[k], and the edge of reference, Stop, creates every possible time residue. While the TAs regenerate and settle into a valid state, the coarse TDC (CTDC) determines which of the residues is the critical one, and then the multiplexer passes that residue to the fine TDC (FTDC). The latency through a TA is about 0.8 ns, which is large enough to decode the output of the CTDC. As can be seen in the coarse-fine ADC, this amplification greatly relaxes the device matching requirement in the FTDC.

## B. Delay Chains

A delay element in a chain can be implemented with either an inverter or a buffer that consists of two cascaded inverters. Using an inverter is better for obtaining a small propagation delay, but we need to maintain small input and output load capacitance



Fig. 8. (a) Detailed TDC architecture. (b) Residue creation.

to achieve this small propagation delay. It has been reported that using an inverter chain introduces uneven delay characteristics due to the rise and fall time mismatch of the inverter and the asymmetric flip-flop characteristics for the low-to-high and high-to-low input transition [3]. In the proposed coarse–fine architecture, buffers instead of inverters are used in the delay chain in order to improve linearity because the buffer does not generate the uneven systematic delay error.

Another consideration is that sharp signal transition reduces the delay sensitivity to a device offset, which changes the transition point in the delay elements. It is quite difficult to achieve sharp transitions in the cascaded structure of the chain with inverters because increasing the size of the inverter results in the bigger input and output capacitance of each stage. However, it is easier to achieve fast transitions with buffers because the two inverters in a buffer can isolate the input and output capacitance of each stage in the chain.

Fig. 8(a) shows the detailed circuits of the coarse–fine TDC. Both CTDC and FTDC use buffers for the delay chain, resolving 5 and 4 bits, respectively. Each stage in the chain comprises two inverters for which the delay is  $T_{\rm d} = 20$  ps in 90 nm CMOS.

# C. Arbiter

Just as an ADC requires a latching comparator, so does a TDC. Either a flip-flop or an arbiter can be used as a comparator of which output determines the output codes of the TDC. However, conventional flip-flops, such as the sense-amplifier-based flip-flop in [3], create mismatch in the data and clock propagation path, which results in a large time offset. This time offset requires large over-range in FTDC, which means additional delay taps. To reduce the over-range, the arbiter in Fig. 8(a) is used because it has equal delay from its two inputs, Start[n] and Stop, to the output [5].



Fig. 9. Residue selection MUX.

## D. Residue Selection

Residue generation is visualized in Fig. 8. The rising edge of the Start signal propagates through the chain of buffers. The buffers generate every possible residue: r0, r1, r2 and so on. These are measured by the distance between the rising edges of each buffer output and the Stop reference. Then each residue is amplified and goes to a multiplexer (MUX). For the case shown in Fig. 8(b), the arbiter creates a thermometer code 1, 1, 1, 0. By locating the transition point from 1s to 0s, the "10" DET logic identifies the critical amplified residue to be sent to the FTDC for fine conversion. As explained in Section III-A, the delay through both the arbiter and the transition detection logic must be smaller than that of a TA so that the residue selection MUX in Fig. 8(a) is properly activated in time.

The circuit detail of the residue selection MUX is shown in Fig. 9. The MUX consists of an array of nMOS pull-down devices connected to a common pMOS pull-up, which is a ratioed logic circuit. The signal feedthrough from unwanted residues to the output causes timing uncertainty. The selection device on top of the input transistor blocks the signal feed-through from the input to output node. Also, the charge sharing between the output and the drain of input nMOS after the selection signal, S[k], becomes active, can introduce timing error. This error is avoided by the weak pull-up transistors, MPs, which charge every node to a high level to obtain smooth transition at the output.

# IV. DIGITAL CALIBRATION TECHNIQUES AND CIRCUIT MODIFICATIONS FOR NONIDEALITIES

## A. Nonidealities and Their Solutions

The proposed TDC circuit faces problems similar to those of a typical coarse–fine ADC. First, the device mismatch in a TA shifts the zero crossing of the gain curve, which appears as a TA offset. This offset can be corrected by calibration in the following way. Inside the TB shown in Fig. 10, the same Stop signal is applied to both inputs of the TA, which creates zero input, and thus the amplified offset is observed at the output. The amplified offset is quantized by a FTDC and stored in a table, and then the value in the table will correct the output of the FTDC during normal operation. This offset sampling process is sequentially performed during startup by shifting the control pulse signal at the falling edge of the reference clock. Using the negative clock edge ensures that the control signal will be stable before the input rising edges arrive.



Fig. 10. Modified architecture to correct offsets.

The offsets can also drive the FTDC into over-range. Therefore, the number of stages in both directions is doubled in order to capture both the over-range and the under-range caused by the offsets in the arbiter and TA, as shown in Fig. 10. In the end, the FTDC quantizes to 6 bits.

The charge kickback from the arbiter and the TA also causes problems. It injects charge into the delay chain and the shared Stop line, which results in a timing error. To prevent that, a buffer is added inside the multiplexer circuit and drives strongly both the arbiter and TA. This buffering may be done separately at the input of the TA and the arbiter for better isolation between them, but it adds more power and area.

The last problem arises because multiple TAs and buffers will not match perfectly. For any coarse–fine ADC, the amplifier gain must be accurate enough so that the amplified residue aligns with the full scale of a fine ADC to an accuracy of one LSB. However, the gains of TAs vary across the array due to device mismatch. Although the gain variation over PVT variation in (4) is reduced by compensating  $g_m$  and  $T_{off}$  with each other [7], simulation shows that the gain variation for slow and fast corners is still intolerable, which is about  $\pm 10\%$ . This variation causes under-range or over-range in each subrange, degrading linearity. Fig. 11 illustrates the effect of over-range and under-range due to TA gain error. Mismatch in CTDC buffer delays ( $T_d$ ) also causes the same problem. Therefore, they must be corrected or well controlled.

Unlike a voltage amplifier, achieving gain accuracy with feedback is difficult in a TA. Open-loop residue amplification can be considered and has been recently reported in a pipelined ADC [13]. In the ADC, a digital calibration technique based on statistics is used to correct gain error and nonlinearity of the residue amplifier. However, in the proposed TDC, the calibration is complicated because of the large number of TAs. The time difference is also difficult to manipulate in order to create the random sequence as in the ADC. Therefore, a different approach is used, which is described in the following section.

#### B. TA Gain Estimation Technique

There is an alternative to precisely controlling the gains. If the maximum amplified residue in a subrange, which is mostly



Fig. 11. Over-range or under-range caused by gain error and  $T_{d}$  error.



Fig. 12. (a) Timing diagram showing the case that the Stop edge occurs in the nth subrange. (b) Digital subrange normalization.

determined by TA gain, is continuously monitored, the output of FTDC can be easily mapped into the ideal 4-bit range. This method can improve differential nonlinearity (DNL). Practically, the maximum amplified residue is also affected by  $T_{\rm d}$  mismatch, as shown in Fig. 11. If not corrected, integral nonlinearity (INL) remains because of the uncorrected  $T_{\rm d}$ mismatch. Therefore, good matching is also required in the CTDC buffer chain.

The proposed TA gain estimation, by measuring the maximum amplified residue, requires an architectural modification which adds a second FTDC (FTDC2). The timing diagram in Fig. 12(a) shows the case when the Stop edge lies in the nth subrange. This subrange is defined by two edges: one is the nth tap in the delay chain, and the other is the next tap.  $X_n$  is the residue between one tap and the Stop edge, while  $X_{n+1}$  is the residue between the Stop edge and the next tap.  $A_{T,n}X_n$  and  $A_{T,n+1}X_{n+1}$  are the two amplified residues. Each one is quantized by its own FTDC. Regardless of where the Stop edge is in the subrange, the sum of  $X_n$  and  $X_{n+1}$  is always equal to one tap delay,  $T_d$ .  $X_n$  and  $X_{n+1}$  always fall within the linear range of the two adjacent TAs. The amplifiers are connected to the *n*th tap in the delay chain and the next tap, and their gains are  $A_{T,n}$  and  $A_{T,n+1}$ , respectively. If the amplified  $X_n$  and amplified  $X_{n+1}$  are added together, assuming the same gain,



Fig. 13. Nonideal effects on subrange normalization. (a) Comparison of gain mismatch effect between averaged and nonaveraged  $S_{\rm EST}$  (illustrated for the case of  $A_{\rm T,n} > A_{\rm T,n+1}$ ). (b) Gain nonlinearity is suppressed by using non-averaged  $S_{\rm EST}$ .

 $A_{T,n} = A_{T,n+1} = A_T$ , the result should be equal to one amplified tap delay

$$A_{\rm T,n}X_{\rm n} + A_{\rm T,n+1}X_{\rm n+1} = A_{\rm T}T_{\rm d}.$$
 (5)

The digital outputs of the two FTDCs are summed to get the estimate of the TA gain in the nth subrange

$$S_{\text{EST}} = Q[A_{\text{T},n}X_n] + Q[A_{\text{T},n+1}X_{n+1}] + 1$$
(6)

where  $Q[\cdot]$  is the quantization function performed in the FTDCs. The correction factor 1 is added for better estimation because the quantization in the FTDCs is a floor function, which adds a constant offset. The inverse of this sum,  $S_{\rm EST}$ , is used to normalize the output of the first FTDC (FTDC1). This subrange normalization requires a digital adder, a digital divider, and a digital multiplier.

# C. Tradeoffs in Choosing $S_{\text{EST}}$ or Averaged $S_{\text{EST}}$

The TA gain estimation  $S_{\rm EST}$ , obtained by adding two quantized values, increases the quantization noise level in the final output code because of its inaccuracy. Averaging the  $S_{\rm EST}$  for multiple samples can reduce the effect of the quantization only if the gain mismatch between  $A_{\rm T,n}$  and  $A_{\rm T,n+1}$  is negligible. However, for a large gain mismatch, the averaging degrades DNL. The effect of the gain mismatch in the two adjacent TAs is illustrated in Fig. 13(a) for the case of  $A_{\rm T,n} > A_{\rm T,n+1}$ . If the gain  $A_{\rm T,n}$  is different than  $A_{\rm T,n+1}$ , the averaged  $S_{\rm EST}$  does not represent the correct normalization factor across the chosen subrange. Large gain variations will degrade the DNL by generating missing code ( $A_{\rm T,n} < A_{\rm T,n+1}$ ) or nonmonotonic code ( $A_{\rm T,n} > A_{\rm T,n+1}$ ) at the edge of the subrange, as shown in Fig. 13(a).

To minimize the DNL, this circuit uses nonaveraged  $S_{\rm EST}$ , which is not constant across the subrange in which the input falls. Although this introduces slight curvature in the transfer curve after normalization, the DNL error is reduced at the edge





Fig. 15. Code distribution for constant input (LSB = 1.25 ps, 1 M hits). (a) 0.4 LSB (rms) at 87. (b) 0.64 LSB (rms) at 451.

Fig. 14. Chip micrograph.

of the subrange. To mitigate the effect of quantization in the estimation, the gains of TAs are designed to be greater than 16.

Another benefit of using the nonaveraged  $S_{\rm EST}$  is that TA gain nonlinearity is also suppressed. As seen in Fig. 3(c), the TA gain curve is odd-symmetric. The nonlinearity of TA gain will appear in FTDC output as shown in Fig. 13(b) because two FTDCs quantize the residues amplified separately through the positive portion of one TA's gain curve and the negative portion of the next one. So, the sum of two amplified residues  $S_{\rm EST}$  reflects the nonlinearity, and the inverse of  $S_{\rm EST}$  acts as a boosting factor to compensate the nonlinearity through the normalization, as shown in Fig. 13(b).

Furthermore, using nonaveraged  $S_{\rm EST}$  reduces the effect of the correlated timing jitter added in TAs due to supply and substrate coupling. In the proposed TDC, if the noises induced into two adjacent TAs are correlated, which is a fair assumption due to shared supply and substrate, the nonaveraged  $S_{\rm EST}$  contains the noise information added to the both TAs due to the summation of two FTDC outputs, thus compensating and reducing the noise effect from the first TA just like the normalization reduces the TA nonlinearity.

#### V. EXPERIMENT RESULTS

The chip micrograph of the TDC is shown in Fig. 14. This is implemented using standard devices in ST Microelectronics 90 nm CMOS process. The active core occupies  $0.6 \text{ mm}^2$  and consumes 3 mA from a 1 V supply at 10 MHz input. The digital block is synthesized to operate up to 66 MHz, and its gate count is approximately 6 k. Because the digital calibration block limits the maximum operating speed, timing optimization or pipelining can further speed up its operation.

In order to see the noise contribution in the TDC, the same signal is applied to the Start and Stop inputs. Unequal loading conditions on the inputs create different input slopes. Adjusting input offset voltage varies input buffer's toggling point, which generates a time difference after buffering. This setup eliminates the input jitter contribution in the measurement because the two inputs to the TDC are from the same signal source. Fig. 15 shows the output code distribution for constant input repeated one million times. The achieved standard deviation stays below 1 LSB at both ends of the TDC range. The deviation increases for the

larger output code. This increase in the standard deviation is due to the accumulated timing jitter as the signal edge propagates through more stages in the chain.

The benefit of the proposed architecture over a simple buffer chain method can be shown by the following calculation. In the coarse–fine TDC, the total root mean square (rms) timing jitter  $\sigma_{\text{Tot}}$  can be formulated in the following way if we assume that each stage in the chain contributes an equal amount of jitter  $\sigma_{\text{buf}}$ :

$$\sigma_{\text{Tot}}(m,n) = \sqrt{m\sigma_{\text{buf}}^2 + (\sigma_{\text{TA}}^2 + n\sigma_{\text{buf}}^2)/A_T^2}$$
(7)

where *m* is the CTDC output code, *n* is the FTDC output code,  $A_{\rm T}$  is the TA gain,  $\sigma_{\rm buff}$  is the timing jitter from the delay element, and  $\sigma_{\rm TA}$  is timing jitter from the TA. Equation (7) shows that the timing jitter contribution from FTDC chain is greatly reduced by the amplification. Jitter contribution from the buffer and the TA  $\sigma_{\rm buff} = 130$  fs and  $\sigma_{\rm TA} = 1.8$  ps are obtained using the measurement data, 0.4 LSB at 87 LSB and 0.64 LSB at 451 with  $A_{\rm T} = 20$ .

This number can be compared with that of chain of buffers or Vernier delay line. If this design is simply implemented with a chain of buffers, the total output rms timing jitter becomes

$$\sigma_{\rm Tot}(m) = \sqrt{m\sigma_{\rm buf}^2} \tag{8}$$

where *m* is the output code. Because passing through 364 stages degrades the deviation from 0.4 LSB to 0.64 LSB, solving  $364\sigma_{buf}^2 = 0.64^2 - 0.4^2$  leads to  $\sigma_{buf} = 0.026 \text{LSB}(= 32.5 \text{ fs})$ . This calculation indicates that, in order to achieve similar performance as in the coarse–fine TDC, 512 delay stages are required, and each stage requires noise contribution less than 32.5 fs, which leads to large power consumption and difficulty in managing such low jitter.

To measure the linearity, two inputs with 0.02 Hz difference in frequency at 10 MHz are applied to generate a ramp input. DNL and INL are measured from code density analysis with over two million hits. Fig. 16(a) and (b) shows the case of using the moving average of the  $S_{\rm EST}$  with 128 samples and the nonaveraged  $S_{\rm EST}$ , respectively. In Fig. 16(a), because of the gain mismatch effect in the  $S_{\rm EST}$  and the nonlinearity in the TA, the DNL and INL are slightly worse at the edge of each subrange.



Fig. 16. (a) DNL and INL using averaged  $S_{\rm EST}.$  (b) DNL and INL using non-averaged  $S_{\rm EST}.$ 

From Fig. 16(b), the case of using the nonaveraged  $S_{\rm EST}$  shows the better result so that the DNL is  $\pm 0.8$  LSB and the INL stays below  $\pm 3$  LSB. In each subrange, the INL curve is pretty linear, and the INL error peaks at the edge of each subrange. Therefore, nonuniformity in the CTDC delay chain due to device mismatch dominates the INL. The INL will improve with better matched delays or may be calibrated with an INL lookup table (LUT). Especially for the INL LUT method, the size of the table can be reduced down to the number of stages in the CTDC chain because subrange INL error can be estimated by interpolating CTDC INL errors.

The resolution may change because the buffer delay in the CTDC chain varies across PVT variation. If this design is to be used in a larger system such as a digital PLL or a measurement instrument, the delay needs to be controlled or estimated against the variation. A delay-locked loop (DLL) can be used to control the delay in the CTDC chain [1], [4], [9], [10], [12], or period estimation in the digital PLL [3] makes it possible to compensate the variation while it is operating. Due to the fact that the first method requires analog charge pump and the DLL activity adds

more timing jitter to the output code [4], the second approach used in the digital PLL is preferred in future implementations. To prove the resolution of 1.25 ps, the same measurement data used in the linearity test are also used. With the known input frequency and its offset, the output codes describe a ramp in time. The best fit line to this ramp and its slope proves that the circuit resolves to 1.25 ps. This result matches with the simulation result and is expected because the buffer delay of CTDC is about 20 ps in 90 nm CMOS and the TA gain is greater than 20.

#### VI. CONCLUSION

We have presented the first realization of a coarse-fine TDC that amplifies a time residue. Understanding the principle behind the TA circuit leads to controllability over gain and linear range. The new TA is used to amplify a time residue in the proposed coarse-fine TDC. Unlike conventional coarse-fine ADCs, an array of TAs and two identical FTDCs are used to realize the coarse-fine architecture in TDC. Those modifications eliminate the need for storing time information and controlling TA gains precisely. TA Offset cancellation and digital subrange normalization utilizes the outputs of two FTDCs in order to compensate the TA gain variation and offset. The proposed coarse-fine architecture resolves to 9 bits and 1.25 ps without using any analog circuitry. The resolution of TDC is usually expressed as the rms single-shot precision of TDC, which is defined by the standard deviation from the time measurement of a pair of Start and Stop edges [9], [11], [14]. The rms single-shot resolution degrades toward 1 LSB as the output code increases. Averaging multiple measurements for some applications will further improve the resolution. The linearity in the proposed architecture is limited by the delay mismatch in the CTDC chain. Although the measured INL is greater than 1 LSB, it can be improved by using an INL LUT or better matched elements in the CTDC delay chain. Although this prototype is originally intended for digital PLL, it is expected to find many other uses in today's digital-intensive signal processing.

#### ACKNOWLEDGMENT

The authors would like to acknowledge ST Microelectronics for fabrication, Y. S. Park for providing useful tips of digital placement and routing, and S. Dumont and K. Torki for setting up tools related to digital synthesis.

#### REFERENCES

- Y. Arai and T. Baba, "A CMOS time to digital converter VLSI for highenergy physics," in VLSI Circuits Symp. Dig., Aug. 1988, pp. 121–122.
- [2] R. B. Staszewski, D. Leipold, C.-M. Hung, and P. T. Balsara, "TDC-based frequency synthesizer for wireless applications," in *Proc. IEEE Radio Frequency Integrated Circuits (RFIC) Symp.*, Jun. 2004, pp. 215–218.
- [3] R. B. Staszewski et al., "1.3V 20ps time-to-digital converters for frequency synthesis in 90-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 3, pp. 220–224, Mar. 2006.
- [4] P. Dudek, S. Szczepanski, and J. V. Hatfield, "A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line," *IEEE J. Solid-State Circuits*, vol. 35, no. 2, pp. 240–247, Feb. 2000.
- [5] K. Nose *et al.*, "A 1ps-resolution jitter-measurement macro using interpolated jitter oversampling," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 520–521.
- [6] M. Lee and A. A. Abidi, "A 9b, 1.25 ps resolution coarse-fine time-todigital converter in 90 nm CMOS that amplifies time residue," in VLSI Circuits Symp. Dig., Jun. 2007, pp. 168–169.

- [7] A. M. Abas *et al.*, "Time difference amplifier," *Electron. Lett.*, vol. 38, no. 23, pp. 1437–1438, Nov. 2002.
- [8] A. M. Abas, G. Russell, and D. J. Kinniment, "Design of sub 10-picoseconds on-chip time measurement circuit," in *Proc. Design Automation Test Europe Conf.*, 2004, vol. 2, pp. 804–809.
- [9] J. P. Janson *et al.*, "A CMOS time-to-digital converter with better than 10 ps single-shot precision," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 286–1296, Jun. 2006.
- [10] C. S. Hwang, P. Chen, and H. W. Tsao, "A high-precision time-todigital converter using a two-level conversion scheme," *IEEE Trans. Nucl. Sci.*, vol. 51, no. 4, pp. 1349–1352, Aug. 2004.
- [11] S. Tisa *et al.*, "Monolithic time-to-digital converter with 20 ps resolution," in *Proc. ESSCIRC*, Sep. 2003, pp. 465–468.
  [12] K. Karadamoglou *et al.*, "An 11-bit high-resolution and adjustable-
- [12] K. Karadamoglou *et al.*, "An 11-bit high-resolution and adjustablerange CMOS time-to-digital converter for space science instruments," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 214–222, Jan. 2004.
- [13] B. Murmann and B. E. Boser, "A 12-bit 75-MS/s pipelined ADC using open-loop residue amplication," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2040–2050, Dec. 2003.
- [14] "Time interval averaging," Hewlett Packard Inc., Application note 162–1.



**Minjae Lee** (S'03) received the B.Sc. and M.S. degrees in electrical engineering from Seoul National University, Seoul, Korea, in 1998 and 2000, respectively. He is currently working toward the Ph.D. degree in the field of analog and mixed mode circuit design at the Electrical Engineering Department, University of California, Los Angeles.

He was a Consultant with GCT Semiconductor, Inc., and Silicon Image Inc., designing analog circuits for wireless communication and digital signal processing blocks for Gigabit Ethernet. He joined

Silicon Image Inc., in 2001, developing the Serial ATA product.



Asad A. Abidi (S'75–M'80–SM'95–F'96) received the B.Sc. (with honors) degree from the Imperial College, London, U.K., in 1976, and the M.S. and Ph.D. degrees in electrical engineering from the University of California at Berkeley in 1978 and 1981, respectively.

From 1981 to 1984, he was with Bell Laboratories, Murray Hill, NJ, as a Member of Technical Staff at the Advanced LSI Development Laboratory. Since 1985, he has been with the Electrical Engineering Department, University of California at Los Angeles

(UCLA), where he is a Professor. He was a Visiting Faculty Researcher at Hewlett Packard Laboratories in 1989. His research interests are in CMOS RF design, high-speed analog integrated circuit design, data conversion, and other techniques of analog signal processing.

Dr. Abidi was the Program Secretary for the IEEE International Solid-State Circuits Conference (ISSCC) from 1984 to 1990 and the General Chairman of the Symposium on VLSI Circuits in 1992. He was the Secretary of the IEEE Solid-State Circuits Council from 1990 to 1991. From 1992 to 1995, he was the Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS. He was the recipient of an IEEE Millennium Medal, the 1988 TRW Award for Innovative Teaching, and the 1997 IEEE Donald G. Fink Award and is corecipient of the Best Paper Award at the 1995 European Solid-State Circuits Conference, the Jack Kilby Best Student Paper Award at the 1996 ISSCC, the Jack Raper Award for Outstanding Technology Directions Paper at the 1997 ISSCC, the Design Contest Award at the 1998 Design Automation Conference, an Honorable Mention at the 2000 Design Automation Conference, and the 2001 ISLPED Low Power Design Contest Award. He is was named one of the top ten contributors to the ISSCC. He has been elected to the National Academy of Engineering, the highest professional lifetime distinction accorded to American engineers.