#### ECEN620: Network Theory Broadband Circuit Design Fall 2023

#### Lecture 12: CDRs



Sam Palermo Analog & Mixed-Signal Center Texas A&M University

#### Announcements

#### Project Preliminary Report (HW5) due Nov 16

### Agenda

- CDR overview
- CDR phase detectors
- Analog & digital CDRs
- Dual-loop CDRs
- Phase Interpolators
- CDR jitter properties

#### High-Speed Electrical Link System



## Clock and Data Recovery



- A clock and data recovery system (CDR) produces the clocks to sample incoming data
- The clock(s) must have an effective frequency equal to the incoming data rate
  - 10GHz for 10Gb/s data rate
  - OR, multiple clocks spaced at 100ps
  - Additional clocks may be used for phase detection
- Sampling clocks should have sufficient timing margin to achieve the desired bit-error-rate (BER)
- CDR should exhibit small effective jitter

# Embedded Clocking (CDR)



- Clock frequency and optimum phase position are extracted from incoming data
- Phase detection continuously running
- Jitter tracking limited by CDR bandwidth
  - With technology scaling we can make CDRs with higher bandwidths and the jitter tracking advantages of source synchronous systems is diminished
- Possible CDR implementations
  - Stand-alone PLL
  - "Dual-loop" architecture with a PLL or DLL and phase interpolators (PI)
  - Phase-rotator PLL

### Agenda

- CDR overview
- CDR phase detectors
- Analog & digital CDRs
- Dual-loop CDRs
- Phase Interpolators
- CDR jitter properties

#### **CDR Phase Detectors**



- A primary difference between CDRs and PLLs is that the incoming data signal is not periodic like the incoming reference clock of a PLL
- A CDR phase detector must operate properly with missing transition edges in the input data sequence

### **CDR Phase Detectors**

- CDR phase detectors compare the phase between the input data and the recovered clock sampling this data and provides information to adjust the sampling clocks' phase
- Phase detectors can be linear or non-linear
- Linear phase detectors provide both **sign and magnitude** information regarding the sampling phase error
  - Hogge
- Non-linear phase detectors provide only sign information regarding the sampling phase error
  - Alexander or 2x-Oversampled or Bang-Bang
  - Oversampling (>2)
  - Baud-Rate

### Hogge Phase Detector



- With a data transition and assuming a full-rate clock
  - The late signal produces a signal whose pulse width is proportional to the phase difference between the incoming data and the sampling clock
  - A Tb/2 reference signal is produced with a Tb/2 delay
- The ideal lock point is in the middle of a bit period, i.e. a  $\pi$  or Tb/2 phase shift between clock and the data transition
- If the clock is sampling early, the late signal will be shorter than Tb/2 and vice-versa

#### Hogge Phase Detector



## Hogge Phase Detector Nonidealities



- Flip-Flop Clk-to-Q delay widens Late pulse, but doesn't impact Tb/2 reference pulse
- CDR will **lock with a phase shift** (early sample clock) to equalize Tb/2 reference and Late pulse widths

# Hogge Phase Detector Nonidealities



- CDR phase shift compensated with a dummy delay element
- Other issues:
  - Need extremely high-speed XOR gates
  - Phase skew between Tb/2 reference and Late signals induces a "triwave" disturbance (ripple) on the control voltage

#### PLL-Based CDR with a Hogge PD



- XOR outputs can directly drive the charge pump
- Need a relatively high-speed charge pump

# Hogge PD Triwave on Vctrl



- Under nominal lock conditions, the control voltage integrates up and down with each transition
- Periodic disturbance produces data-dependent jitter (DDJ), as the triangular pulse exhibits a nonzero net area
- Since the data transition activity is random, a low frequency noise source is created that is not attenuated by the PLL dynamics

# Modified Hogge PD



- Two additional latches and XOR gates are added
- The first flip-flop, latch, and 2 XORs are identical to the original Hogge
- The second 2 latches and XORs produce an inverted version of the original triwave, which can drive a second parallel charge pump to produce a nominally zero net area waveform

#### Alexander (2x-Oversampled) Phase Detector

- Most commonly used CDR phase detector
- Non-linear (Binary) "Bang-Bang" PD
  - Only provides sign information of phase error (not magnitude)
- Phase detector uses 2 data samples and one "edge" sample
- Data transition necessary

 $D_n \oplus D_{n+1}$ 

• If "edge" sample is same as second bit (or different from first), then the clock is sampling "late"

 $E_n \oplus D_n$ 

• If "edge" sample is same as first bit (or different from second), then the clock is sampling "early"

 $E_n \oplus D_{n+1}$ 





# Alexander Phase Detector Characteristic (No Noise)



- Phase detector only outputs phase error sign information in the form of a late OR early pulse whose width doesn't vary
- Phase detector gain is ideally infinite at zero phase error
  - Finite gain will be present with noise, clock jitter, sampler metastability, ISI

# Alexander Phase Detector Characteristic (With Noise)

- Total transfer characteristic is the convolution of the ideal PD transfer characteristic and the noise PDF
- Noise linearizes the phase detector over a phase region corresponding to the peak-to-peak jitter

$$K_{PD} \approx \frac{2}{J_{PP}} (TD)$$

- TD is the transition density no transitions, no information
  - A value of 0.5 can be assumed for random data



### **Oversampling Phase Detectors**



- Multiple clock phases are used to sample incoming data bits
- PD can have multiple output levels
  - Can detect rate of phase change for frequency acquisition

- Baud-rate phase detector only requires one sample clock per symbol (bit)
- Mueller-Muller phase detector commonly used
- Attempting to equalize the amplitude of samples taken before and after a pulse



MM-PD is measuring the effective

$$h_1 - h_{-1}$$

which can be computed by  $E[y_k \cdot d_{k-1}] - E[y_k \cdot d_{k+1}]$ 

- If this is positive, then the effective post-cursor ISI is too high and we are sampling too early
- If this is negative, then the effective pre-cursor ISI is too high and we are sampling too late





- Comparing the current sample versus the desired reference level (e<sub>n</sub>) and correlating that with the appropriate data sample (d<sub>n</sub>) gives pre/post-cursor information
- This requires additional error samplers w/ |VREF| thresholds
- e<sub>n</sub> gives d<sub>n-1</sub> post-cursor (h<sub>1</sub>) information
- e<sub>n-1</sub> give d<sub>n</sub> pre-cursor (h<sub>-1</sub>) information



#### [Spagna ISSCC 2010]



- Simplified MM-PD only considers transition patterns
- If consecutive error samples are different, phase error polarity is given by e<sub>j</sub>

### Agenda

- CDR overview
- CDR phase detectors
- Analog & digital CDRs
- Dual-loop CDRs
- Phase Interpolators
- CDR jitter properties

#### Analog PLL-based CDR



### Analog PLL-based CDR



- CDR "bandwidth" will vary with input phase variation amplitude with a non-linear phase detector
- Final performance verification should be done with a time-domain non-linear model

# 56Gb/s PAM4 Analog PLL-based CDR

- Quarter-rate architecture
- 3 data samplers for PAM4 detection
- 1 edge sampler for CDR and DFE adaptation
- 1 error sampler for threshold adaptation



#### [Roshan-Zamir JSSC 2019]

# 56Gb/s PAM4 Analog PLL-based CDR

- PLL-based CDR to reduce power consumption
- Bang-bang phase detector works on symmetric PAM4 transitions to minimize detection errors
- PAM4 Early Loop Filter BBPD D<sub>n</sub>[1:3] Charge 12 Late ≩ Pump 4:8 14 GHz LC-VCO 2X Oversampling 4 **Clock Generators** Data Phase CLK Calibration Q CML # VCNT CLK45 Divider IB Edge QB CLK CML to CMOS
- Parallel charge pumps minimize logic and loop delay

# 56Gb/s PAM4 Analog PLL-based CDR

- LC-VCO w/ additional source LC filter improves phase noise
- 8-phase quarterrate clock
  - CML divider
  - 2X oversampling clock



#### [Roshan-Zamir JSSC 2019]

Fig. 16. Measured PAM4 jitter tolerance (BER =  $10^{-9}$ ) operating over Channel 2.

#### **Digital PLL-based CDR**



#### Digital PLL-based CDR



**Open-Loop Gain:** 

$$L(z^{-1}) = \left(\frac{K_{\rm PD}K_{\rm V}K_{\rm DPC}}{1-z^{-1}}\right) \left(phug + \frac{frug}{(1-z^{-1})}\right) z^{-N_{\rm EL}}.$$

$$\Phi_{\text{samp}}/\Phi_{\text{in}} = \left(L(e^{-j\omega})\right) / \left(1 + L(e^{-j\omega})\right)$$

#### [Sonntag JSSC 2006]

#### **Digital PLL-based CDR**



# 52Gb/s PAM4 Digital PI-based CDR

[Kiran JSSC 2019]



- Baud-rate digital CDR with Mueller-Muller phase detector
- MM-PD is placed directly after the ADC to minimize loop latency
- Proportional and integral loop filter produces control signals to adjust the phase of 4 differential phase interpolators (PIs) that provide the input T/H sample clocks

### Agenda

- CDR overview
- CDR phase detectors
- Analog & digital CDRs
- Dual-loop CDRs
- Phase Interpolators
- CDR jitter properties

### Single-Loop CDR Issues



- Phase detectors have limited frequency acquisition range
  - Results in long lock times or not locking at all
  - Can potentially lock to harmonics of correct clock frequency
- VCO frequency range variation with process, voltage, and temperature can exceed PLL lock range if only a phase detector is employed

#### Phase and Frequency Tracking Loops



- Frequency tracking loop operates during startup or loss of phase lock
  - Ideally should be mostly off in normal operation
- Frequency loop bandwidth typically much smaller than phase loop bandwidth to prevent loop interaction

#### **Frequency Detector**



- Uses double-edged triggered input flip-flops with the Data signal sampling 2 quadrature clocks
- The Q output is then samples the I output
- For fast clocks relative to the data,  $X_A$  will go high first and the output flip-flop will give a high value
- For fast clocks relative to the data,  $X_{B}$  will go high first and the output flip-flop will give a low value

#### Frequency Detector Transfer Characteristic

#### [Razavi]



#### **Small frequency offsets**



- With large frequency offsets, the frequency detector output is unreliable
- Capture range ~<15% frequency offset





# Analog Dual-Loop CDR w/ Two VCOs

- Frequency synthesis loop with replica VCO provides a "coarse" control voltage to set phase tracking loop frequency
- Frequency loop can be a global PLL shared by multiple channels
- Issues
  - VCO matching
  - VCO pulling
  - Distributing voltage long distances



[Hsieh]

# Analog Dual-Loop CDR w/ One VCO

- Frequency loop operates during startup or loss of phase lock
  - Ideally should be mostly off in normal operation
- Input reference clock simplifies frequency loop design
- Care must be taken when switching between loops to avoid disturbing VCO control voltage and loose frequency lock



# Phase Interpolator (PI) Based CDR

- Frequency synthesis loop produces multiple clock phases used by the phase interpolators
- Phase interpolator mixes between input phases to produce a fine sampling phase
  - Ex: Quadrature 90° PI inputs with 5 bit resolution provides sampling phases spaced by 90°/(2<sup>5</sup>-1)=2.9°
- Digital phase tracking loop offers advantages in robustness, area, and flexibility to easily reprogram loop parameters



[Hsieh]

# Phase Interpolator (PI) Based CDR

- Frequency synthesis loop can be a global PLL
- Can be difficult to distribute multiple phases long distance
  - Need to preserve phase spacing
  - Clock distribution power increases with phase number
  - If CDR needs more than 4 phases consider local phase generation



### **DLL Local Phase Generation**

- Only differential clock is distributed from global PLL
- Delay-Locked Loop (DLL) locally generates the multiple clock phases for the phase interpolators
  - DLL can be per-channel or shared by a small number (4)
- Same architecture can be used in a forwarded-clock system
  - Replace frequency synthesis PLL with forwarded-clock signals



## Phase Rotator PLL

- Phase interpolators can be expensive in terms of power and area
- Phase rotator PLL places

   one interpolator in PLL
   feedback to adjust all VCO
   output phases
   simultaneously
- Now frequency synthesis and phase recovery loops are coupled
  - Need PLL bandwidth greater than phase loop
    - Useful in filtering VCO noise



### Agenda

- CDR overview
- CDR phase detectors
- Analog & digital CDRs
- Dual-loop CDRs
- Phase Interpolators
- CDR jitter properties

# Phase Interpolators

- Phase interpolators realize digital-to-phase conversion (DPC)
- Produce an output clock that is a weighted sum of two input clock phases
- Common circuit structures
  - Tail current summation
     interpolation
  - Voltage-mode interpolation
- Interpolator code mapping techniques
  - Sinusoidal
  - Linear



#### Sinusoidal Phase Interpolation



 Arbitrary phase shift can be generated with linear summation of I/Q clock signal

where  $a_1 = \cos(\phi)$  and  $a_2 = \sin(\phi)$ 

 $a_1^2 + a_2^2 = 1$ 

#### Sinusoidal vs Linear Phase Interpolation



- It can be difficult to generate a circuit that implements sinusoidal weighting  $a_1^2 + a_2^2 = 1$
- In practice, a linear weighting is often used

$$a_1 + a_2 = 1$$



#### Phase Interpolator Model

0.8

Vormalized 0.4

0.2



- Interpolation linearity is a function of the phase spacing,  $\Delta t$ , to output time constant, RC, ratio
- Important that interpolator output time constant is not too small (fast) for phase mixing quality





## Tail-Current Summation PI



- Control of I/Q polarity allows for full 360° phase rotation with phase step determined by resolution of weighting DAC
- For linearity over a wide frequency range, important to control either input or output time constant (slew rate)

#### Voltage-Mode Summation PI

#### [Joshi VLSI Symp 2009]



• For linearity over a wide frequency range, important to control either input or output time constant (slew rate)

### Agenda

- CDR overview
- CDR phase detectors
- Analog & digital CDRs
- Dual-loop CDRs
- Phase Interpolators
- CDR jitter properties
  - Jitter transfer
  - Jitter generation
  - Jitter tolerance

#### **CDR** Jitter Model



#### Jitter Transfer



- Jitter transfer is how much input jitter "transfers" to the output
  - If the PLL has any peaking in the phase transfer function, this jitter can actually be amplified

$$\frac{\phi_{out}}{\phi_{in}} = \frac{s \cdot K_P \cdot K_{PD} \cdot K_{VCO} + K_i \cdot K_{PD} \cdot K_{VCO}}{s^2 + s \cdot K_P \cdot K_{PD} \cdot K_{PD} \cdot K_{VCO} + K_i \cdot K_{PD} \cdot K_{VCO}}$$

#### Jitter Transfer Measurement



#### **Jitter Transfer Specification**



| Data Rate | f <sub>c</sub> [kHz] | P[dB] |
|-----------|----------------------|-------|
| 155 Mb    | 130                  | 0.1   |
| 622 Mb    | 500                  | 0.1   |
| 2.488 Gb  | 2000                 | 0.1   |

This specification is intended to control jitter peaking in long repeater chains

#### Jitter Generation



- Jitter generation is how much jitter the CDR "generates"
  - Assumed to be dominated by VCO
- Assumes jitter-free serial data input

VCO Phase Noise: 
$$H_{n_{VCO}}(s) = \frac{\phi_{out}}{\phi_{n_{VCO}}} = \frac{s^2}{s^2 + \left(\frac{K_{Loop}}{N}\right)RCs + \frac{K_{Loop}}{N}} = \frac{s^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$

For CDR, N should be 1

#### Jitter Generation



- SONET specification:
  - rms output jitter  $\leq 0.01$  UI

#### Open-Loop VCO Jitter – Self-Referenced



- Measure distribution of clock threshold crossings
- Plot  $\sigma$  as a function of delay  $\Delta T$

#### Open-Loop VCO Jitter – Self-Referenced



- Jitter  $\sigma$  is proportional to sqrt( $\Delta T$ )
- K is VCO time domain figure of merit

#### VCO in Closed-Loop PLL Jitter – Self-Referenced vs Ref-Clk Referenced



• PLL limits  $\sigma_{\Delta T}$  for delays longer than loop bandwidth  $\tau_{\text{L}}$ 

$$\tau_L = 1/2\pi f_L \quad \sigma_{\Delta T} = \kappa_{\sqrt{\frac{1}{2\pi f_L}}}$$

• If we refer the jitter to the reference (or transmit) clock,  $\sigma_x$ , the correlation between the clocks reduces the jitter sigma

$$\sigma_x = \frac{\sigma_{\Delta T}}{\sqrt{2}} = \kappa \sqrt{\frac{1}{4\pi f_L}}$$

#### Ref Clk-Referenced vs Self-Referenced



 Depending on how you measure jitter generation, you will get a different number, with the self referenced sigma being sqrt(2) higher

#### Jitter Tolerance

 How much sinusoidal jitter can the CDR "tolerate" and still achieve a given BER?
 [Sheikholeslami]

$$\phi_{in} \xrightarrow{\phi_e} K_{pd} \xrightarrow{H_{LPF}(s)} K_{VCO}/s \xrightarrow{\phi_{OUT}} \phi_{OUT}$$

#### Maximum tolerable $\phi_e$

$$\phi_e(s) = \left(1 - \frac{\phi_{out}(s)}{\phi_{in}(s)}\right) \phi_{n.in}(s) \le \frac{\text{Timing Margin}}{2}$$

As jitter tolerance is often specified in units of peak - to - peak jitter amplitude  $(UI_{pp})$ 

$$JTOL(s) = 2\phi_{n.in}(s) = \frac{TM}{\left(1 - \frac{\phi_{out}(s)}{\phi_{in}(s)}\right)}$$



### Jitter Tolerance Measurement



**Differential PRBS Data** 

- While jitter tolerance testing quantifies the tolerance to sinusoidal jitter, often "stressed eyes" are used that have additional random and deterministic jitter to emulate realistic operating conditions
- Random and sinusoidal jitter are added by modulating the BERT clock
- Deterministic jitter is added by passing the data through the channel
- For a given frequency, sinusoidal jitter amplitude is increased until the minimum acceptable BER (10<sup>-12</sup>) is recorded

#### Jitter Tolerance Measurement



#### **CDR Take-Away Points**

- CDRs extract the proper clock frequency and phase position to sample the incoming data symbols
- Specialized phase detectors suited for random data symbols are required
- Dual-loop CDRs are often used to both optimize jitter performance and provide robust frequency acquisition
- Jitter tolerance is an important CDR metric that is improved with increased loop bandwidth

### Next Time

- Broadband amplifiers
  - Transimpedance amplifiers
  - Limiting amplifiers