ECEN720: High-Speed Links
Circuits and Systems
Spring 2017

Lecture 6: RX Circuits

Sam Palermo
Analog & Mixed-Signal Center
Texas A&M University
Announcements

• Lab 4 Report and Prelab 5 due Mar. 6

• Sampler and comparator papers are posted on the website
Agenda

• RX Circuits
  • RX parameters
  • RX static amplifiers
  • Clocked comparators
    • Circuits
    • Characterization techniques
  • Integrating receivers
  • RX sensitivity
    • Offset correction
  • Demultiplexing receivers
High-Speed Electrical Link System

Diagram showing the components of a high-speed electrical link system:
- Serializer
- TX
- TX clk
- PLL
- Channel
- RX
- RX clk
- Deserializer
- RX data
- Timing Recovery

Timing diagrams for TX data, TX clk, and RX clk.
Receiver Parameters

- RX sensitivity, offsets in voltage and time domain, and aperture time are important parameters.
- Minimum eye width is determined by aperture time plus peak-to-peak timing jitter.
- Minimum eye height is determined by sensitivity plus peak-to-peak voltage offset.
RX Block Diagram

- RX must sample the signal with high timing precision and resolve input data to logic levels with high sensitivity
- Input pre-amp can improve signal gain and improve input referred noise
  - Can also be used for equalization, offset correction, and fix sampler common-mode
  - Must provide gain at high-bandwidth corresponding to full data rate
- Comparator can be implemented with static amplifiers or clocked regenerative amplifiers
  - Clocked regenerative amplifiers are more power efficient for high gain
- Decoder used for advanced modulation (PAM4, Duo-binary)
RX Static Amplifiers – Single-Ended Inverter

- CMOS inverter is one of the simplest RX pre-amplifier structures
- Termination voltage, $V_{TT}$, should be placed near inverter trip-point
- Issues:
  - Limited gain (<20)
  - High PVT variation results in large input referred offset
  - Single-ended operation makes it both sensitive to and generate supply noise
**RX Static Differential Amplifiers**

- Differential input amplifiers often used as input stage in high performance serial links
  - Rejects common-mode noise
  - Sets input common-mode for preceding comparator
- Input stage type (n or p) often set by termination scheme
- High gain-bandwidth product necessary to amplify full data rate signal
- Offset correction and equalization can be merged into the input amplifier

**Equations**

\[ A_v = g_{m1}(R_L | r_{01}) \approx g_{m1}R_L \]

\[ A_v = \frac{g_{m1}}{g_{m3} + g_o3 + g_o4 + g_{o1}} \approx \frac{g_{m1}}{g_{m3}} \]
RX Clocked Comparators

- Also called regenerative amplifier, sense-amplifier, flip-flop, latch
- Samples the continuous input at clock edges and resolves the differential to a binary 0 or 1

[J. Kim]
Important Comparator Characteristics

- Offset and hysteresis
- Sampling aperture, timing resolution, uncertainty window
- Regeneration gain, voltage sensitivity, metastability
- Random decision errors, input-referred noise
Dynamic Comparator Circuits

- To form a flip-flop
  - After strong-arm latch, cascade an R-S latch
  - After CML latch, cascade another CML latch
- Strong-Arm flip-flop has the advantage of no static power dissipation and full CMOS output levels
StrongARM Latch Operation
[J. Kim TCAS1 2009]

- 4 operating phases: reset, sampling, regeneration, and decision
• Sampling phase starts when clk goes high, \( t_0 \), and ends when PMOS transistors turn on, \( t_1 \)
• M1 pair discharges \( X/X' \)
• M2 pair discharges out+/-

\[
\frac{v_{out}(s)}{v_{in}(s)} = \frac{g_{m1}g_{m2}}{sC_{out}C_x \left( s + \frac{g_{m2}(C_{out} - C_x)}{C_{out}C_x} \right)} \\
\approx \frac{g_{m1}g_{m2}}{s^2 C_{out}C_x} = \frac{1}{s^2 \tau_{s1} \tau_{s2}} \\
\text{where } \tau_{s1} = \frac{C_x}{g_{m1}}, \tau_{s2} = \frac{C_{out}}{g_{m2}}
\]
Regeneration phase starts when PMOS transistors turn on, \( t_1 \), until decision time, \( t_2 \)

Assume M1 is in linear region and circuit no longer sensitive to \( v_{\text{in}} \)

Cross-coupled inverters amplify signals via positive-feedback:

\[
G_R = \exp\left(\frac{t_2 - t_1}{\tau_R}\right)
\]

\[
\tau_R = \frac{C_{out}}{(g_{m2,r} + g_{m3,r})}
\]
StrongARM Latch Operation – Diff. Output

[J. Kim TCAS1 2009]
Conventional RS Latch

- RS latch holds output data during latch pre-charge phase.

- Conventional RS latch rising output transitions first, followed by falling transition.
Optimized RS Latch

- Optimizing RS latch for symmetric pull-up and pull-down paths allows for considerable speed-up.

- During evaluation, large driver transistors are activated to change output data and the keeper path is disabled.

- During pre-charge, large driver transistors are tri-stated and small keeper cross-coupled inverter activated to hold data.

**Evaluation Mode (Clock High)**  **Driver Branches**

**Hold/Precharge Mode (Clock Low)**  **Keeper Branches**

17
Delay Improvement w/ Optimized RS Latch

- Strong-Arm flip-flop delay improves by close to a factor of two

- Has better delay performance than other advanced flip-flop topologies
Sampler Analysis

- Sampler analysis provides insight into comparator operation

\[ v_{\text{sample}} = \int_{-\infty}^{\infty} v_{\text{in}}(\tau)h(\tau)d\tau \]

Switch can be modeled as a device which determines a weighted average over time of the input signal

- The weighting function is called the sampling function
Sampling Function Properties

- Sampling function should (ideally) integrate to 1
  \[ \int_{-\infty}^{\infty} h(\tau) d\tau = 1 \]

- Ideal sampling function is a delta function
  - Sampled value is only a function of exact sampling time

ideal \( h(\tau) = \delta(t) \)

\[ v_{sample} = \int_{-\infty}^{\infty} v_{in}(\tau) h(\tau) d\tau \]
Sampling Function Example

- Practical sampling function will weight the input signal near the nominal sampling time

\[ v_{sample} = \int_{-\infty}^{\infty} v_{IN}(\tau) h(\tau) d\tau \]

Practical \( h(\tau) \)
Sampler Frequency Response

- Fourier transform of the sampling function yields the sampler frequency response
- Sampler bandwidth is a function of sample clock transition time

\[ h(\tau) \]

\[ F.T.\{h(-\tau)\} \]
Sampler Aperture Time

- Aperture time is defined as the width of the SF peak were a certain percentage (80%) of the sensitivity is confined.

\[ w_{80} = t_{90} - t_{10} \]

\[ 0.1 = \int_{-\infty}^{t_{10}} h(\tau) d\tau \]

\[ 0.9 = \int_{-\infty}^{t_{90}} h(\tau) d\tau \]
Clocked Comparator LTV Model

- Comparator can be viewed as a noisy nonlinear filter followed by an ideal sampler and slicer (comparator).
- Small-signal comparator response can be modeled with an ISF \( \Gamma(\tau) = h(t_{obs}, \tau) \)

\[ V_k = V_o(kT + t_{obs}), \quad n_k = n_o(kT + t_{obs}) \]

[J. Kim]
Clocked Comparator ISF

- Comparator ISF is a subset of a time-varying impulse response \( h(t, \tau) \) for LTV systems:
  \[
y(t) = \int_{-\infty}^{\infty} h(t, \tau) \cdot x(\tau) d\tau
  \]

  - \( h(t, \tau) \): system response at \( t \) to a unit impulse arriving at \( \tau \)
  - For LTI systems, \( h(t, \tau) = h(t-\tau) \) (above integral is convolution)

- ISF \( \Gamma(\tau) = h(t_{obs}, \tau) \)
  - For comparators, \( t_{obs} \) is before (ideally when) decision is made
  - Output voltage of comparator
    \[
    v_o(t_{obs}) = \int_{-\infty}^{\infty} v_i(\tau) \cdot \Gamma(\tau) d\tau
    \]
  - Comparator decision
    \[
    D_k = \text{sgn}(v_k) = \text{sgn}(v_o(t_{obs} + kT)) = \text{sgn}\left(\int_{-\infty}^{\infty} v_i(\tau) \cdot \Gamma(\tau) d\tau\right)
    \]
Clocked Comparator ISF

- ISF is defined with respect to $t_{\text{obs}}$, or the decision time

- The comparator provides the most gain during the sampling phase

[J. Kim]
Clocked Comparator ISF

- ISF shows sampling aperture or timing resolution
- In frequency domain, it shows sampling gain and bandwidth

$$\text{ISF } \Gamma(\tau)$$

$$\text{F.T. } \{ \Gamma(-\tau) \}$$

[J. Kim]
Characterizing Comparator ISF

1. Find Metastable \( V_{ms}(\tau) = V_{os}(t \to \infty, \tau) \) such that \( V(out+) = V(out-) \)

2. Measure \( V_{MS} \) for varying \( \tau \)

3. Derive ISF

\[
\text{SSF}_{\text{norm}}(\tau) = \frac{V_{MS}(\tau) - V_L}{V_H - V_L}
\]

\[
\text{ISF}_{\text{norm}}(\tau) = \frac{d}{d\tau} \text{SSF}_{\text{norm}}(\tau)
\]

[Jeeradit VLSI 2008]
Comparator ISF Measurement Setup

**StrongARM Comparator**

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Sim w/o Channel</td>
<td>23</td>
<td>14.9</td>
<td>67.6</td>
</tr>
<tr>
<td>Sim w Channel</td>
<td>300</td>
<td>1.4</td>
<td>56.6</td>
</tr>
<tr>
<td>Lab</td>
<td>280</td>
<td>1.4</td>
<td>N/A</td>
</tr>
</tbody>
</table>

**CML Comparator**

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Sim w/o Channel</td>
<td>50</td>
<td>6.8</td>
<td>88.8</td>
</tr>
<tr>
<td>Sim w Channel</td>
<td>300</td>
<td>1.4</td>
<td>58.0</td>
</tr>
<tr>
<td>Lab</td>
<td>280</td>
<td>1.4</td>
<td>N/A</td>
</tr>
</tbody>
</table>

Note: the aperture time is defined as the width that contains 80% of the sensitivity similar to [1]

**Strong-Arm Latch**

**CML Latch**

[Jeeradit VLSI 2008]

[Toifl]
Comparison of SA & CML Comparator (1)

[Ceeradit VLSI 2008]

- CML latch has higher sampling gain with small input pair
- StrongARM latch has higher sampling bandwidth
  - For CML latch increasing input pair also directly increases output capacitance
  - For SA latch increasing input pair results in transconductance increasing faster than capacitance
Comparison of SA & CML Comparator (2)

- Sampling time of SA latch varies with VDD, while CML isn’t affected much

[Jeeradit VLSI 2008]
Low-Voltage SA – Schinkel ISSCC 2007

• Does require clk & clk_b
  • How sensitive is it to skew?

Advantages:
• Less stacking
• Wide tail for fast latching
• More isolation between input and output
• Small tail → input stage in weak inversion → less offset from latch
Low-Voltage SA – Schinkel ISSCC 2007

90nm CMOS simulations. $\Delta V_{\text{in}}=50\text{mV}$.
Circuits designed for equal offset $\sigma_{\text{os}}=10\text{mV}$ at $V_{\text{cm}}=1.1\text{V}$
• Similar stacking to conventional SA latch
• However, now P0 and P1 are initially on during evaluation which speeds up operation at lower voltages
• Does require clk & clk_b
  • How sensitive is it to skew?
Low-Voltage SA – Goll TCAS2 2009

![Graph showing the delay of OUT-OUT vs supply voltage of comparator](image-url)

- **Conventional comparator**
- **Comparator with modified latch**

**Delay of OUT-OUT [ns]**

**Supply voltage of comparator $V_{Co}$ [V]**
Integrating RX & High-Frequency Noise

- A small aperture time is desired in most receiver samplers
- However, high-frequency noise can degrade performance at sampling time
  - Can be an issue in single-ended systems with excessive LdI/dt switching noise
- Integrating the input signal over a sampling interval reduces the high-frequency noise impact
Integrating Amplifier

- Differential input voltage converted to a differential current that is integrated on the sense nodes’ capacitance
Windowed Integration

- Windowing integration time can minimize transition noise and maximize integration of valid data

[Zerbe J SSC 2001]
RX Sensitivity

- RX sensitivity is a function of the input referred noise, offset, and minimum latch resolution voltage

\[ V_{pp}^S = 2 V_{n_{rms}} \sqrt{SNR} + V_{min} + V_{offset} \]

- Gaussian (unbounded) input referred noise comes from input amplifiers, comparators, and termination
  - A minimum signal-to-noise ratio (SNR) is required for a given bit-error-rate (BER)
    
    For BER = 10^{-12} (\sqrt{SNR} = 7)

- Minimum latch resolution voltage comes from hysteresis, finite regeneration gain, and bounded noise sources
  
  Typical \( V_{min} < 5mV \)

- Input offset is due to circuit mismatch (primarily \( V_{th} \) mismatch) & is most significant component if uncorrected
RX Sensitivity & Offset Correction

- RX sensitivity is a function of the input referred noise, offset, and min latch resolution voltage

\[
v_S^{pp} = 2v_n^{rms} \sqrt{SNR} + v_{min} + v_{offset}^* \quad \text{Typical Values: } v_n^{rms} = 1mV_{rms}, v_{min} + v_{offset}^* < 6mV
\]

For BER = 10^{-12} (\sqrt{SNR} = 7) \Rightarrow v_S^{pp} = 20mV_{pp}

- Circuitry is required to reduce input offset from a potentially large uncorrected value (>50mV) to near 1mV
Input Referred Offset

- The input referred offset is primarily a function of $V_{th}$ mismatch and a weaker function of $\beta$ (mobility) mismatch

$$\sigma_{V_t} = \frac{A_{V_t}}{\sqrt{WL}}, \quad \sigma_{\Delta\beta/\beta} = \frac{A_\beta}{\sqrt{WL}}$$

- To reduce input offset 2x, we need to increase area 4x
  - Not practical due to excessive area and power consumption
  - Offset correction necessary to efficiently achieve good sensitivity

- Ideally the offset “A” coefficients are given by the design kit and Monte Carlo is performed to extract offset sigma

- If not, here are some common values:
  - $A_{Vt} = 1mV\mu m$ per nm of $t_{ox}$
    - For our default 90nm technology, $t_{ox}=2.8nm \rightarrow A_{Vt} \sim 2.8mV\mu m$
  - $A_\beta$ is generally near 2% $\mu m$
Offset Correction Range & Resolution

- Generally circuits are designed to handle a minimum variation range of ±3σ for 99.7% yield
- Example: Input differential transistors W=4μm, L=150nm

\[ \sigma_{V_t} = \frac{A_{V_t}}{\sqrt{WL}} = \frac{2.8mV \mu m}{\sqrt{4 \mu m \cdot 150nm}} = 3.6mV, \quad \sigma_{\Delta \beta / \beta} = \frac{A_{\beta}}{\sqrt{WL}} = \frac{2% \mu m}{\sqrt{4 \mu m \cdot 150nm}} = 2.6\% \]

- If we assume (optimistically) that the input offset is only dominated by the input pair V_t mismatch, we would need to design offset correction circuitry with a range of about ±11mV
- If we want to cancel within 1mV, we would need an offset cancellation resolution of 5bits, resulting in a worst-case offset of

\[ 1\text{LSB} = \frac{\text{Offset Correction Range}}{2^{\text{Resolution}} - 1} = \frac{22mV}{2^5 - 1} = 0.65mV \]
Current-Mode Offset Correction Example

- Differential current injected into input amplifier load to induce an input-referred offset that can cancel the inherent amplifier offset
  - Can be made with extended range to perform link margining

- Passing a constant amount of total offset current for all the offset settings allows for constant output common-mode level

- Offset correction performed both at input amplifier and in individual receiver segments of the 2-way interleaved architecture

\[ I_{\text{offset}} = I_{\text{offset}_n} + I_{\text{offset}_p} \]
Capacitive Offset Correction Example

- A capacitive imbalance in the sense-amplifier internal nodes induces an input-referred offset.
- Pre-charges internal nodes to allow more integration time for more increased offset range.
- Additional capacitance does increase sense-amp aperture time.
- Offset is trimmed by shorting inputs to a common-mode voltage and adjusting settings until an even distribution of “1”s and “0”s are observed.
- Offset correction settings can be sensitive to input common-mode.
Demultiplexing RX

- Demultiplexing allows for lower clock frequency relative to data rate
- Gives extra regeneration and pre-charge time in comparators
- Need precise phase spacing, but not as sensitive to duty-cycle as TX multiplexing
1:4 Demultiplexing RX Example

- Increased demultiplexing allows for higher data rate at the cost of increased input or pre-amp load capacitance
- Higher multiplexing factor more sensitive to phase offsets in degrees
Next Time

• Equalization theory and circuits
  • Equalization overview
  • Equalization implementations
    • TX FIR
    • RX FIR
    • RX CTLE
    • RX DFE
  • Setting coefficients
  • Equalization effectiveness
  • Alternate/future approaches