5.1 An 8Gb/s Transceiver with 3x-Oversampling 2-Threshold Eye-Tracking CDR Circuit for -36.8dB-loss Backplane

K. Fukuda1, H. Yamashita1, F. Yuki1, M. Yagyu1, R. Nemoto1, T. Takemoto1, T. Saito1, N. Chugo1, K. Yamamoto2, H. Kanai2, A. Hayashi2

Hitachi, Tokyo, Japan; Hitachi, Kanagawa, Japan

IT systems such as servers and routers need high-speed lower-power area-efficient chip-to-chip interconnections through backplane boards. These interconnections must overcome signal degradation due to the large insertion loss of low-cost boards. In this work, a 90nm CMOS 8Gb/s transceiver is developed. A TX 5-tap FFE, an RX analog equalizer, and a 2-tap DFE combined with a 2-threshold eye-tracking CDR achieve a BER of less than 10-12 through a 160cm backplane board with -36.8dB loss at 4GHz and a transceiver power consumption of 232mW (transmission efficiency of 1.2Gb/sx6B/mmW).

A block diagram of the half-rate transmitter with a 5-tap FFE is shown in Fig. 5.1.1. A 5b data stream from a shift register is inverted every other bit and fed into output drivers that consist of 25 CMOS inverters in parallel. FFE tap coefficients are determined by the number of inverters that each bit drives. The output impedance is adjusted to 50Ω by variable resistors consisting of pass transistors.

A block diagram of the half-rate receiver is shown in Fig. 5.1.2. The VGA has a gain range of 12dB. The variable analog equalizer using a capacitive feedback amplifier has a maximum 12dB gain peaking at 4GHz compared to that at 1GHz. Two cascode transistors and a latch form the half-rate 2-tap DFE combines with a 3x-oversampling 2-threshold eye-tracking CDR circuit, which is the main component described in this paper. The 2-tap DFE is achieved by the loop-unrolling technique. The input signal is divided into two signals, DH and DL, and fed into 8 latches for data determination and clock recovery.

The clock recovery (CR) method is a significant issue for receivers with DFE. Conventional CR circuits operate on a non-DFE equalized signal [1]. However, particularly for loop-unrolling DFE, this approach increases circuit complexity and power because an additional amplifier for the non-DFE signal is needed. To make matters worse, large jitter in a non-DFE equalized signal causes large jitter in the recovered clock.

Performing CR on a post-DFE signal is preferable, because its edges exhibit lower jitter. However, a conventional 2x-oversampling Alexander phase detector (PD) with data and edge sampling cannot maintain the data sampling at the center of the eye, because the time difference between the edge and the eye center of the post-DFE signal is not equal to 0.5UI. To eliminate this defect, data sampling and edge sampling must operate on different thresholds, where the time difference between these two kinds of sampling is exactly 0.5UI. This means two additional thresholds are needed, increasing circuit complexity and power [2].

The eye-tracking PD proposed in [3] is extended to a DFE receiver to solve the aforementioned issues. The latch-point alignment of the proposed half-rate 3x-oversampling 2-threshold eye-tracking PD is shown within the eye diagram in Fig. 5.1.3. The overview flowchart of the CDR logic is also shown in the figure. A half-rate receiver with a loop-unrolling 2-tap DFE needs 4 latches (HC0, HC1, LC0, and LC1) for data determination, where prefixes H and L denote the threshold at which the latch operates, and suffixes 0 and 1 denote rise and fall edge of the half-rate clock, respectively. For CR, there are 4 guard latches (HP0, HR0, LF1, and LR1), as shown in Fig. 5.1.3. When the outputs of the data-determination latches and guard latches differ, which means that data determination latches become too close to the edge of the wave signal, the CDR logic switches the clock phase position of a phase interpolator (PI) to keep those latches away from the edge. As a result, data determination latches remain in the center of a horizontal eye opening automatically regardless of the eye shape.

Furthermore, the phase-detection performance of the eye-tracking PD is better than that of the conventional Alexander PD, particularly for a high-loss transmission line with large data-dependent jitter. The Alexander PD, which focuses on the center of jitter distribution, suffers adverse effects from the asymmetric shape of data-dependent jitter. On the other hand, the eye-tracking PD, which focuses on the tail of the jitter distribution (where symmetrically-shaped random jitter is dominant), averts these negative effects.

The PI has 64 steps within 2UI (1/32 UI per step). Three phase clocks (I, J, and K) required for the eye-tracking CDR are generated by 2 delay buffers and are adjusted to ~0.3UI away from each other.

Another significant issue for receiver performance is a reduction of setup/hold time of latches. The proposed peaking CML latch is shown in Fig. 5.1.4. To shorten the setup/hold time, improving high frequency characteristics of the first-stage amplifier when the latch is in the through state is important. To this end, capacitive source degeneration with two tail current sources is applied. However, this solution has a problem. When the latch is in the hold state, nodes n1 and n2 become floating. The voltages of these nodes are thus fluctuated slowly by the charge in capacitance. This undesirable fluctuation disturbs the latch in the smooth state transition back to the through state. To suppress this disturbance, n1 and n2 are connected with n3, which is activated in the hold state, through high resistance R1 and R2 to discharge the capacitance slowly. The result of circuit-level simulation with respect to the relationship between setup/hold time and propagation delay of the proposed latch is also shown in Fig 5.1.4. The measured setup/hold time is 5.3ps for a 125ps pulse with a power consumption of 1mW.

A link experiment is carried out between two chips placed on two daughter boards installed onto a 160cm FR4 board with connectors as shown in Fig 5.1.5. The total insertion loss is -36.8dB at 4GHz from TX chip output to RX chip input excluding packages. The transceiver achieves a BER of less than 10-12, supplied with a reference clock that has 1.57ps_m jitter from the PLL. The output jitter of the TX chip is 1.63ps_m and the recovered clock jitter of the RX chip is 2.27ps_m.

The bathtub curve measured from the post-DFE signal and distribution of the clock phase position of the PI measured in the line experiment, are shown in Fig. 5.1.6. The RX horizontal eye-opening of the post-DFE signal is 13ps (0.11UI) at a BER of 10-12, which includes input jitter, reference-clock jitter, and setup/hold time of latches. The distribution of the PI position ranged across 6 positions (0.16UI) at a probability of 10-12. Note that the actual BER is the product of the bathtub curve and PI position probability.

A micrograph of the chip, fabricated in a 8M 90nm CMOS process is shown in Fig. 5.1.7. There are 4 pairs of TX and RX sharing one on-chip PLL. One TX occupies 0.183mm2 and consumes 101mW, while one RX occupies 0.148mm2 and consumes 131mW, powered by a 1.2V supply.

References:

Please click on paper title to view Visual Supplement.
Figure 5.1.1: Transmitter block diagram.

Figure 5.1.2: Receiver block diagram.

Figure 5.1.3: Latch-point alignment of the 2-threshold eye-tracking PD within eye diagram and overview flowchart of the CDR logic.

Figure 5.1.4: Circuit diagram of the CML latch and simulated setup/hold property.

Figure 5.1.5: Experimental setting and measured eye diagrams.

Figure 5.1.6: Measurement result with 160-cm FR4 board.

Continued on Page 598
Figure 5.1.7: Chip micrograph.