#### ADAPTIVE DATA-TRANSITION DECISION FEEDBACK EQUALIZERS FOR HIGH-SPEED SERIAL LINKS

by

#### Yue Li

Master of Science, University of Macau, Macau, China, 2015 Bachelor of Science, University of Macau, Macau, China, 2012

> A dissertation presented to Ryerson University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the program of Electrical and Computer Engineering

Toronto, Ontario, Canada, 2020 ©Yue Li, 2020

# AUTHOR'S DECLARATION FOR ELECTRONIC SUBMISSION OF A DISSERTATION

I hereby declare that I am the sole author of this dissertation. This is a true copy of the dissertation, including any required final revisions, as accepted by my examiners.

I authorize Ryerson University to lend this dissertation to other institutions or individuals for the purpose of scholarly research.

I further authorize Ryerson University to reproduce this dissertation by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.

I understand that my dissertation may be made electronically available to the public.

#### Abstract

Yue Li

Adaptive Data-Transition Decision Feedback Equalizers for High-Speed Serial Links Doctor of Philosophy, Electrical and Computer Engineering Ryerson University, 2020

This dissertation investigates adaptive decision feedback equalizers for high-speed serial data links.

An adaptive data-transition decision feedback equalizer (DT-DFE) was developed. The DT-DFE boosts the eye-opening of the high-frequency components of data without attenuating their low-frequency counterparts. Reference voltages were obtained by transmitting consecutive 1s and 0s and measuring the output of the continuous-time linear equalizer using a pair of successive approximation register analog-to-digital converters in a training phase. It uses loop unrolling to detect data transitions, activate tap-tuning, launch DFE, and combat timing constraints. The performance of the DT-DFE and its advantages over commonly used data-state DFE were validated using the schematic-level simulation results of 5 Gbps backplane links.

A new adaptive DT-DFE with edge-emphasis (EE) taps and raised references was developed. Loop-unrolling was further developed for DT-DFE with EE-taps. The reference voltages were raised beyond that set by the low-frequency components of data to increase vertical eye-opening. Clock and data recovery was performed using 4x oversampling. The DT-DFE was validated using the schematiclevel simulation results of 10 Gbps backplane links.

A pre-skewed bi-directional gated delay line (BDGDL) bang-bang frequency difference-to-digital converter and a BDGDL integrating frequency difference-todigital converter (iFDDC) were proposed for clock and data recovery. Both frequency difference detectors feature all-digital realization, low power consumption, and high-speed operation. The built-in integration of iFDDC results in a zero static frequency error and the first-order noise-shaping of the quantization errors of the BDGDL and digitally-controlled oscillators. Their effectiveness was validated using schematic-level simulation results of 5-GHz frequency-locked loops.

All systems validating the proposed adaptive DFE and frequency-difference detectors were designed in TSMC's 65 nm CMOS technology and analyzed using Spectre from Cadence Design Systems.

## Acknowledgements

I would first express my deepest gratitude to my supervisor Professor Fei Yuan for his guidance, support and kindness through every stages of my doctoral study. I have been extremely lucky to have the opportunity to be his student and assistant. I remember the countless hour we spent in his office discussing when I first started this work. With all kinds of questions and requests, he showed nothing but technical expertise and great personality. I have learned numerous things from Fei and without his help this work would not be possible.

I would also like to thank Professor Andy G. Ye, Professor Gul N. Khan and Professor Vadim Geurkov for being the committee members of my candidacy exam and thesis defense. I would like to thank Professor Guangjun Liu from Aerospace Engineering, Ryerson University and Professor Wai Tung Ng from University of Toronto for kindly joining the defense committee. Thanks to all the committee members for reviewing my thesis and providing constructive comments.

I would like to thank all my colleagues also friends Young Jun Park, Durand Jarrett-Amor, Matthew Dolan, Jean-Claude Clarke, Parth Patel and Rashed Siddiqui from the Integrated Circuits and Systems (ICS) Group for their kind help and wonderful friendship. I would like to thank Ryerson University and the department of Electrical, Computer and Biomedical Engineering for funding my research and study. I also want to thank all the lovely students in the courses I assisted teaching for their cooperation and appreciation. Thanks to them for making conducting the lab session pleasant and rewarding.

I would especially like to thank my parents for encouraging and supporting me to pursue doctoral studies overseas. I thank them for their unconditional love and belief in me. Finally, I would like to thank my wife Xiaoshi Zhang. I did not feel any frustration or disappointment in the last four years with the support of my motivated and brilliant partner. I thank her for bringing delight to every single second of my life and I want dedicate this thesis to her.

# **Table of Contents**

| Ał                       | Abstract iii |                                                                                                                                               |  |  |
|--------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Li                       | st of        | Figures xii                                                                                                                                   |  |  |
| Li                       | st of        | Tables xiii                                                                                                                                   |  |  |
| List of Abbreviations xi |              |                                                                                                                                               |  |  |
| 1                        | Intr         | oduction 1                                                                                                                                    |  |  |
|                          | 1.1          | Serial Links                                                                                                                                  |  |  |
|                          | 1.2          | Wire Channels                                                                                                                                 |  |  |
|                          | 1.3          | Channel Equalization                                                                                                                          |  |  |
|                          | 1.4          | Motivation $\ldots \ldots 12$ |  |  |
|                          | 1.5          | Objectives                                                                                                                                    |  |  |
|                          | 1.6          | Thesis Organization                                                                                                                           |  |  |
|                          | 1.7          | Contributions $\ldots \ldots 15$                   |  |  |
|                          | 1.8          | Summary                                                                                                                                       |  |  |
| <b>2</b>                 | Dat          | -Transition DFE 19                                                                                                                            |  |  |
|                          | 2.1          | DS-DFE Limitations                                                                                                                            |  |  |
|                          | 2.2          | DT-DFE                                                                                                                                        |  |  |
|                          |              | 2.2.1 Architecture                                                                                                                            |  |  |
|                          |              | 2.2.2 Data-Transition Unit                                                                                                                    |  |  |
|                          |              | 2.2.3 Max/Min Vertical Eye-Openings                                                                                                           |  |  |

|                 |      | 2.2.4                                                                         | $Sign^3$ -LMS                                                                     | 25       |
|-----------------|------|-------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|----------|
|                 |      | 2.2.5                                                                         | Tap Addition/Subtraction                                                          | 26       |
| 2.3 Discussions |      | sions $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 26                                                                                |          |
|                 |      | 2.3.1                                                                         | Tap Tuning                                                                        | 26       |
|                 |      | 2.3.2                                                                         | DFE Activation                                                                    | 27       |
|                 |      | 2.3.3                                                                         | DS-DFE versus DT-DFE                                                              | 28       |
|                 |      | 2.3.4                                                                         | Error Propagation                                                                 | 33       |
|                 |      | 2.3.5                                                                         | Loop-Unrolling                                                                    | 35       |
|                 |      | 2.3.6                                                                         | Hardware Cost                                                                     | 35       |
|                 | 2.4  | Simula                                                                        | ations Results                                                                    | 36       |
|                 | 2.5  | Summ                                                                          | ary                                                                               | 37       |
|                 |      |                                                                               |                                                                                   | _        |
| 3               |      |                                                                               | Edge-Emphasis Data-Transition DFE with Raised                                     |          |
|                 |      |                                                                               | s in Oversampling CDR                                                             | 40       |
|                 | 3.1  |                                                                               | non drawbacks of DS-DFE and DT-DFE                                                | 41       |
|                 | 3.2  | Edge-1                                                                        | Emphasis Taps                                                                     | 42       |
|                 | 3.3  | DT-D                                                                          | FE with EE-taps $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$             | 46       |
|                 |      | 3.3.1                                                                         | Architecture                                                                      | 46       |
|                 |      | 3.3.2                                                                         | Loop unrolling for $S_n$                                                          | 47       |
|                 |      | 3.3.3                                                                         | Loop unrolling for DT-DFE with EE-tap                                             | 51       |
|                 |      | 3.3.4                                                                         | Raised reference voltages                                                         | 53       |
|                 |      | 3.3.5                                                                         | Determination of EE-tap                                                           | 55       |
|                 | 3.4  | Clock                                                                         | and Data Recovery $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$       | 57       |
|                 | 3.5  | Simula                                                                        | ation Results                                                                     | 60       |
|                 |      | 3.5.1                                                                         | 5 Gbps Serial Link $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 60       |
|                 |      | 3.5.2                                                                         | 10 Gbps Serial Link with 4x-oversampling CDR $\ .$                                | 64       |
|                 | 3.6  | Summ                                                                          | ary                                                                               | 70       |
| 4               | Δ11_ | Digita                                                                        | l Frequency-Locking using Bi-Directional Gated De                                 | <b>_</b> |
| I               |      | 0                                                                             | requency Difference-to-Digital Converters                                         | 71       |
|                 | 4.1  |                                                                               | ency Difference Detection                                                         | 72       |
|                 | 4.2  | -                                                                             | DL Bang-Bang Frequency Difference Detector                                        |          |
|                 |      |                                                                               |                                                                                   |          |

| Bi       | Bibliography 97 |        |                                                 |      |
|----------|-----------------|--------|-------------------------------------------------|------|
|          | 5.2             | Future | e Work                                          | . 96 |
|          | 5.1             | Conclu | usions                                          | . 94 |
| <b>5</b> | Cor             | clusio | ns and Future Work                              | 94   |
|          | 4.4             | Summ   | nary                                            | . 93 |
|          |                 | 4.3.3  | Simulation Results                              | . 90 |
|          |                 | 4.3.2  | All-Digital FLL with iFDDC                      | . 87 |
|          |                 | 4.3.1  | iFDDC                                           | . 83 |
|          | 4.3             | Integr | ating Frequency Difference-to-Digital Converter | . 83 |
|          |                 | 4.2.5  | Simulation Results                              | . 81 |
|          |                 | 4.2.4  | Frequency Error                                 | . 79 |
|          |                 | 4.2.3  | Frequency Detection                             | . 74 |
|          |                 | 4.2.2  | Pre-Skewed BDGDL                                | . 73 |
|          |                 | 4.2.1  | BDGDL                                           | . 73 |

# List of Figures

| 1.1  | Dependence of data rate on the minimum channel length of         |    |
|------|------------------------------------------------------------------|----|
|      | MOS transistors.                                                 | 2  |
| 1.2  | A typical structure of serial links                              | 3  |
| 1.3  | A backplane example.                                             | 4  |
| 1.4  | Skin Effect.                                                     | 4  |
| 1.5  | Channel loss. Left: time domain response Right: frequency        |    |
|      | domain response                                                  | 5  |
| 1.6  | Feed-forward equalizer. Top: Block diagram. Bottom: Spec-        |    |
|      | tral behavior ( $f_{DR}$ : Data rate frequency)                  | 7  |
| 1.7  | CTLE. Left: Schematic. Right: Spectral behavior                  | 8  |
| 1.8  | DFE block diagrams. Top: Regular. Bottom: Loop-unrolling.        | 9  |
| 1.9  | Comparison between FEE and DFE                                   | 10 |
| 1.10 | DFE with $L$ taps. Left: Conventional DS-DFE. Right: IIR-        |    |
|      | DFE                                                              | 11 |
| 1.11 | Equalization with different types of DFEs. Left: Conventional    |    |
|      | DS-DFE. Right: IIR-DFE                                           | 11 |
| 2.1  | Data-state DFE.                                                  | 20 |
| 2.2  | Architecture of sign <sup>3</sup> -LMS DT-DFE (highlighted part) | 21 |
| 2.3  | Data transition unit.                                            | 23 |

| 2.4  | Data transition signal $s_n$ . If data=0, since the tap increases                                    |      |
|------|------------------------------------------------------------------------------------------------------|------|
|      | $v_A$ in path A and decreases $v_B$ in path B, the output of slicer                                  |      |
|      | A is selected while that of slicer B is discarded. If data=1,                                        |      |
|      | since the tap decreases $v_A$ in path A and increases $v_B$ in path                                  |      |
|      | B, the output of slicer A is discarded while that of slicer B is                                     |      |
|      | selected.                                                                                            | 24   |
| 2.5  | DT-DFE with sign <sup>3</sup> -LMS. The output of the charge pumps is                                |      |
|      | fed to a loop filter to smooth tap tuning                                                            | 25   |
| 2.6  | Tap addition/subtraction.                                                                            | 27   |
| 2.7  | DFE activation schemes                                                                               | 28   |
| 2.8  | Simulated eye diagram equalized using DS-DFE with one tap. The                                       |      |
|      | strength of the tap : 0.7 mA, 1.0 mA, and 2.0 mA . Data rate : 5 $$                                  |      |
|      | Gbps. Channel characteristics : Depicted in Fig. 1.5                                                 | 29   |
| 2.9  | Simulated eye diagram equalized using DT-DFE with one constant                                       |      |
|      | tap. The strength of the tap : $0.7~\mathrm{mA},1.0~\mathrm{mA},\mathrm{and}~2.0~\mathrm{mA}$ . Data |      |
|      | rate : 5 Gbps. Channel characteristics : Depicted in Fig. 1.5                                        | 30   |
| 2.10 | STF of both DFEs with different normalized taps                                                      | 32   |
| 2.11 | ETF of both DFEs with different normalized taps                                                      | 33   |
| 2.12 | Error propagation. (a) Tap tuning perspective. (b) DFE per-                                          |      |
|      | spective. Error bits are highlighted                                                                 | 35   |
| 2.13 | Adaptation process of taps $v_A^+$ and $v_B^+$ of proposed DT-DFE.                                   | 37   |
| 2.14 | Left: Waveform of $v_A^+$ . Right: Waveform of $v_B^+$                                               | 38   |
| 2.15 | Voltages of paths A, B, C, and equalized data (solid line)                                           | 38   |
| 2.16 | Eye diagrams. Left: DS-DFE. Right: DT-DFE                                                            | 39   |
| 3.1  | Comparison of DS-DFE, DT-DFE with constant taps, DT-DFE                                              |      |
| 0.1  | with EE taps, and DT-DFE with EE-taps and raised references.                                         | 41   |
| 3.2  | Top : Voltages of unequalized data. Bottom : Currents of EE-taps.                                    | 11   |
| 0.2  | (a) EE-taps for 010 transitions. (b) EE-taps for 101 transitions                                     | 42   |
| 3.3  | Schematic of edge-emphasis taps for 010 transitions. The polarity                                    | 14   |
| 0.0  | in each tap indicates the polarity of the tap                                                        | 43   |
| 3.4  | 4-stage digitally controlled differential ring oscillator.                                           | 44   |
| 0.4  | T-suage dignamy controlled differential tilly oscillator                                             | -1-1 |

| 3.5  | Gating signals for edge-emphasis taps                                                                                                                     | • | 45 |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|---|----|
| 3.6  | Implementation of tap $1. \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$                                                                 |   | 46 |
| 3.7  | Implementation of taps 2 and 3. The implementation of the current                                                                                         |   |    |
|      | sources is detailed in Fig. 3.6                                                                                                                           | • | 46 |
| 3.8  | DT-DFE with an EE-tap when a data transition is present. When                                                                                             |   |    |
|      | no data transition is present, DT-DFE with a constant tap is per-                                                                                         |   |    |
|      | formed                                                                                                                                                    | • | 48 |
| 3.9  | Selection of equalized data. The polarity box on the waveform                                                                                             |   |    |
|      | indicates the polarity of the signal. The check box at the top-left                                                                                       |   |    |
|      | corner of each sub-figure indicates whether the output of the slicer                                                                                      |   |    |
|      | is selected ( $\checkmark$ ) or discarded ( $\times$ )                                                                                                    | • | 49 |
| 3.10 | Detection of the occurrence of $1{\rightarrow}0$ and $0{\rightarrow}1$ data transitions.                                                                  | • | 50 |
| 3.11 | Determination of reference voltages $V_{ref+}$ and $V_{ref-}$ using 8-bit                                                                                 |   |    |
|      | SAR ADCs                                                                                                                                                  | • | 53 |
| 3.12 | Circuit that implements S <sup>3</sup> -LMS algorithms to obtain $c_{DT+}$ and                                                                            |   |    |
|      | $c_{DT-}\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots\ldots$ | • | 55 |
| 3.13 | Circuit that implements S <sup>4</sup> -LMS algorithms to obtain $c_{EE}$                                                                                 | • | 57 |
| 3.14 | Top-left : Edge generator. Top-right : Edge accumulator. Bottom                                                                                           |   |    |
|      | : Phase mapping between the phase of incoming data and that                                                                                               |   |    |
|      | of EE-tap gating signals. Legends : $0^\circ, \ 45^\circ, \ldots$ : The phase of                                                                          |   |    |
|      | incoming data. $\hat{0}^{\circ},\hat{45}^{\circ},\!$ : The phase of EE-tap gating signals                                                                 | • | 58 |
| 3.15 | Time and frequency responses of the backplane channel. The input                                                                                          |   |    |
|      | for the time-domain response is a 200 mV pulse of 100 ps pulse                                                                                            |   |    |
|      | width measured at the near end of the channel. $\ldots$ $\ldots$ $\ldots$                                                                                 | • | 60 |
| 3.16 | DT-DFE with an EE-tap. The EE-tap consists of 7 taps of different                                                                                         |   |    |
|      | strength.                                                                                                                                                 | • | 61 |
| 3.17 | Comparison of DS-DFE, DT-DFE with a constant tap, and DT-                                                                                                 |   |    |
|      | DFE with an EE-tap. The light-color traces in (a,b,c) are unequal-                                                                                        |   |    |
|      | ized data                                                                                                                                                 | • | 62 |
| 3.18 | Eye diagrams of unequalized and equalized data with DS-DFE,                                                                                               |   |    |
|      | DT-DFE with a constant tap, and DT-DFE with an EE-tap                                                                                                     |   | 63 |

| 3.19 | Eye diagrams of unequalized data and equalized data with 3 dif-                  |    |
|------|----------------------------------------------------------------------------------|----|
|      | ferent settings                                                                  | 65 |
| 3.20 | Output voltage of the charge pumps tuning $c_{DT+}$ , $c_{DT-}$ , and $c_{EE}$ . |    |
|      | Top : Adaptation without raised references. Bottom : Adaptation                  |    |
|      | with $20 \text{mV}$ raised references                                            | 67 |
| 3.21 | Waveforms of transmitted, received, and equalized data                           | 68 |
| 3.22 | Waveforms of transmitted, received, and recovered data. $\ldots$ .               | 69 |
| 4.1  | Left: bi-directional gated delay cell (BDGDC). Right: bi-directional             |    |
|      | gated delay line. Legends : RS - Right-shift. LS - left-shift                    | 73 |
| 4.2  | Left: pre-skewing in a delay line. Right: pre-skewed BDGDL.                      |    |
|      | Pre-skewed inverters are highlighted                                             | 74 |
| 4.3  | Dependence of per-stage-delay (left) and power (right) on the                    |    |
|      | number of pre-skewed inverters                                                   | 75 |
| 4.4  | Schematic of BDGDL frequency detector                                            | 75 |
| 4.5  | Timing diagram of BDGDL frequency detector                                       | 76 |
| 4.6  | Frequency relationship implied by per-stage pulse width de-                      |    |
|      | tection results.                                                                 | 77 |
| 4.7  | schematic of a K-bit redundant successive approximation reg-                     |    |
|      | ister                                                                            | 78 |
| 4.8  | schematic of the DCO and varactor                                                | 78 |
| 4.9  | Block diagram of proposed FLL                                                    | 79 |
| 4.10 | Frequency detector dead zone reflects on FD and DCO transfer                     |    |
|      | characteristics, and the maximum locking error under different                   |    |
|      | circumstances                                                                    | 80 |
| 4.11 | Simulation results of frequency locking process                                  | 82 |
| 4.12 | Locked frequency with intentionally added phase shift                            | 82 |
| 4.13 | Breakdown of power consumption by key blocks                                     | 83 |
| 4.14 | Timing diagram of REF and DCO.                                                   | 84 |
| 4.15 | Schematic of pulse width extractor.                                              | 85 |
| 4.16 | Timing diagram of gating signal generation and FCW adaptation.                   | 86 |
| 4.17 | Bi-directional gated delay line with read-out DFFs                               | 86 |
|      |                                                                                  |    |

| 4.18 | (a) Frequency-locked loop with an iFDDC. (b) Block diagram of     |    |
|------|-------------------------------------------------------------------|----|
|      | the core of frequency-locked loop with an iFDDC                   | 88 |
| 4.19 | Quantization error of BDGDL (a) and DCO (b). The center fre-      |    |
|      | quency of the DCO is set to $f^* = 1/T^*$ at which FCW=0011       |    |
|      | with 16 0s and 16 1s                                              | 89 |
| 4.20 | Locking process of ADFLL in nominal process conditions $(TT)$ and |    |
|      | process corners (FF, FS, SF, and SS)                              | 91 |
| 4.21 | Locking process of ADFLL at process corners with tuned DCO        | 91 |
| 4.22 | Impact of per-stage-delay of BDGDL on normalized steady state     |    |
|      | frequency error of ADFLL                                          | 92 |
| 4.23 | Impact of per-stage-delay of BDGDL on locking process of ADFLL.   | 92 |
|      |                                                                   |    |

# List of Tables

| 2.1 | Data transition                                        | 32 |
|-----|--------------------------------------------------------|----|
| 2.2 | Hardware comparison of DS-DFE and DT-DFE               | 36 |
| 3.1 | Performance comparison of DFEs with different Settings | 63 |
| 3.2 | Performance comparison normalized to CTLE output       | 64 |
| 3.3 | Performance comparison of DT-DFE with EE-tap           | 66 |
| 3.4 | Performance comparison normalized to CTLE output       | 66 |

# List of Abbreviations

| ADC           | Analog-to-digital converter       |
|---------------|-----------------------------------|
| ADFLL         | All-digital frequency-locked-loop |
| BDGDL         | Bi-directional gated delay line   |
| BOS           | Blind oversampling                |
| CDR           | Clock and data recovery           |
| CP            | Charge pump                       |
| CTLE          | Continuous-time linear equalizer  |
| DAC           | Digital-to-analog converter       |
| DCO           | Digitally controlled oscillator   |
| DFE           | Decision feedback equalizer       |
| <b>DS-DFE</b> | Data-state DFE                    |
| DRV           | Driver                            |
| DT-DFE        | Data-transition DFE               |
| DTU           | Data-transition unit              |
| $\mathbf{EE}$ | Edge emphasis                     |
| FCW           | Frequency control word            |
| FFE           | Feedforward equalizer             |

| FIR                    | Finite impulse response                               |
|------------------------|-------------------------------------------------------|
| FVC                    | Frequency-to-voltage-converters                       |
| IC                     | Integrated chip                                       |
| iFDDC                  | Integrating frequency difference-to-digital converter |
| IIR                    | Infinite impulse response                             |
| ISI                    | Inter-symbol interference                             |
| $\mathbf{LMS}$         | Least mean square                                     |
| PCB                    | Printed circuit boards                                |
| PLL                    | Phase-locked loop                                     |
| PRBS                   | Pseudo-random bit sequence                            |
| $\mathbf{R}\mathbf{X}$ | Receiver                                              |
| SAR                    | Successive approximation register                     |
| SBR                    | Single bit response                                   |
| TDC                    | Time-to-digital converter                             |
| $\mathbf{T}\mathbf{X}$ | Transmitter                                           |
| UI                     | Unit interval                                         |

### Chapter 1

### Introduction

This chapter examines the characteristics of wire channels and existing equalization techniques. An overview of Gb/s serial links and challenges encountered in design of Gb/s serial links are given in Section 1.1. The characteristics of wire channels are studied briefly in Section 1.2. Three most widely used channel equalization techniques to combat the imperfections of wire channels are introduced in Section 1.3 with an emphasis on decision feedback equalization. Section 1.4 presents the motivation of this research and outlines the reasons why the research focused on DFE. The following Section 1.5 highlights the objective of the the research work. The organization of this thesis is detailed in Section 1.6. Research publications originated from this research are given in Section 1.7. The chapter is concluded in Section 1.8.

#### 1.1 Serial Links

The explosive growth of data processed by integrated circuits (ICs) demands that data be transmitted over wire channels (interconnects, vias, connectors, package pins, printed circuit boards PCBs, and coaxial cables) at multiple giga-bit-per-second (Gbps). Although increasing the number of wire channels directly improves the total data bandwidth, a large number of parallel



Figure 1.1: Dependence of data rate on the minimum channel length of MOS transistors.

channels not only increase the cost of routing, the overall data rate is also affected by clock and data skews caused by the mismatch of the channels. As a result, parallel links are only attractive for short-range data communications such as multi-processor systems, processor-to-memory interfaces, and network switches. Unlike parallel links, serial links transmit data and clock using a single wire channel, typically a pair of differential wires to minimize electromagnetic interference with neighboring devices. The elimination of a dedicated channel for clock transmission removes the difficulties associated with clock skew. The use of only a single wire channel also eliminates the bottle neck associated with data skew. Moreover, it greatly reduces the cost associated with routing. As a result, serial links are attractive in applications such as block-to-block (on-chip), chip-to-chip, chassis-to-chassis, and computer-to-computer links where the distance over which data are transmitted is large and the number of channels available is small. Although the maximum transit frequency of MOS transistors has well exceeded 100 GHz, the data rate of serial links is much lower, as evident in Fig. 1.1. The low data rate is mainly due to inter-symbol interference (ISI) arising from channel imperfections. In wire channels, ISI is mainly caused by skin-effect and dielectric loss that leads to frequency dependent attenuation, or dispersion. Pulse widening caused by reflections and interference from cross-talks also bring additional ISI [1, 2, 3, 4, 5]. ISI manifests itself as pre-cursors and post-cursors with the latter dominate typically [6, 7].

#### 1.2 Wire Channels



Figure 1.2: A typical structure of serial links.

Fig. 1.2 shows the structure of a typical serial link, which consists of a transmitter (TX), a wire channel, a receiver (RX). A feed-forward equalizer (FFE) also known as pre-emphasis resides in the transmitter while a continuous-time linear equalizer (CTLE) and a decision feedback equalizer (DFE) reside in the receiver. The multiplexer (MUX) in the TX multiplexes low-rate data in the parallel buses into a high-rate data stream and feeds it forward to the channel. The RX recovers the digital bits from the received data stream with a clock and data recovery (CDR) block and a slicer. The demultiplexer (DMUX) converts the received serial data stream back to

low-speed parallel data. The equalizers combat the impact of channel imperfections while the CDR block adjusts RX clock frequency and phase to guarantee the correct sampling of the slicer.



Figure 1.3: A backplane example.

Fig. 1.3 is a typical example of a backplane data communication environment. Data and clock are transmitted simultaneously via the same wire channel with timing information embedded in data as transitions. At Gb/s data rates, channels suffer from two major frequency-dependent losses.



Figure 1.4: Skin Effect.

The first loss mechanism is resistive loss which is induced by conductor's skin effect. At low frequencies, the current flows evenly through the cross section of the conductor. At high frequencies, due to either the rapid current direction alteration inside the wire or outside in another adjacent wire, the current tends to follow only near the conductor surface due the induced magnetic field as illustrated in Fig. 1.4.

Skin depth, which is the depth at which the current density decays to electrons of that at the surface, is given by

$$\delta = \sqrt{\frac{1}{\sigma \pi f \mu}},\tag{1.1}$$

where  $\delta$  is the skin depth, f is frequency,  $\mu$  is permeability, and  $\sigma$  is the material conductivity. Since the skin depth is inversely proportional to  $\sqrt{f}$ , the resistive channel loss increases proportionally with  $\sqrt{f}$ .



Figure 1.5: Channel loss. Left: time domain response Right: frequency domain response

The second channel loss mechanism is dielectric loss. Dielectric dissipation is induced by the polarization of the molecules in dielectric materials and is illustrated in Fig. 1.4. When an alternating electric field is applied, molecules in the dielectric material rotate so as to align with the either external or internal electric field. The molecules rubs against each other during movements and covert some of the electric energy into heat. The loss is directly related to the speed of field direction rotation. Dielectric loss  $L_D$  can be expressed as

$$L_D = \frac{4.34}{c} 2\pi f \cdot tan(\sigma) \sqrt{\varepsilon_{r,eff}},$$
(1.2)

where  $tan(\sigma)$  is the loss tangent of the dielectric material. It shows  $L_D \propto f$ and dielectric loss surpasses resistive loss at high frequencies. Both of these channel losses increase with frequency. Channel loss normally falls from 20 to 30dB for baud-rate of 10+ Gbps serial links [8, 9, 10]. Fig. 1.5 shows the response of a lossy channel in both time and frequency domains. In time domain, received data lose more energy at high frequencies and leak energy to adjacent symbols, giving rise to ISI. Among all the channel impairments, bandwidth limitation caused ISI dominants in high data-rate applications. Therefore, limited bandwidth is the main impairment that channel equalization techniques are developed to combat.

#### **1.3** Channel Equalization

ISI manifests itself as pre-cursors and post-cursors and can be minimized via channel equalization at either the near and far ends of channels. Pre-cursors and post-cursors can be removed by either boosting the high-frequency components [6, 7] or attenuating the low-frequency components of data symbols prior to their transmission [11, 12]. This is known as pre-emphasis. The former increases cross-talk as crosstalk intensifies at high frequencies whereas the latter reduces the power of the transmitted symbols as the power of non-return-to-zero data is largely concentrated at half baud-rate frequency. Near-end channel equalization is often implemented using finite impulse response (FIR) filters that introduce zeros to offset the effect of the poles of the channels, as shown in Fig. 1.6 [13, 6, 7]. By adjusting the tap wights, a relatively flat equalized channel response can be obtained. Since the characteristics of the channel are not known prior to data transmission, the optimal tap coefficients of pre-emphasis FIR filters can only be obtained if a back channel exists. This constraint severely undermines the robustness of preemphasis channel equalization. Another limitation of FFE is the equalizer attenuates low-frequency components of data to achieve channel equalization instead of amplifying high-frequency components. The purpose is to minimize crosstalk. Data equalized with FFE will experience signal power loss due to this. The last limitation of pre-emphasis channel equalization is its inability to remove ISI caused by reflection and crosstalk as these ISI manifest themselves as crests and troughs rather than uniformly sloped attenuation. ISI caused by reflection and crosstalk is typically significant when channels contain multiple vias, connectors, and branches (highly reflective channels).



Figure 1.6: Feed-forward equalizer. Top: Block diagram. Bottom: Spectral behavior ( $f_{DR}$ : Data rate frequency).

Far-end channel equalization also known as post-equalization combats ISI by either amplifying the high-frequency components of received data symbols in the analog domain or removing post-cursors in the digital domain prior to clock and data recovery (CDR). As compared with near-end equalization, post-equalization offers the ability to combat ISI caused by reflection and crosstalk, and is therefore the most widely used and effective channel equalization techniques. Post channel equalization can be performed using either a linear equalizer or a nonlinear equalizer, or both. Linear post-equalization boosts the high-frequency components of received symbols with a continuoustime linear equalizer (CTLE), as shown in Fig. 1.7 [12, 14]. CTLE provides zeros to cancel out the poles of the channels so that the equalized channel exhibits an all-pass transfer characteristic. The RC load and source degeneration circuit provides one zero and two poles, which are given by

$$\omega_z = \frac{1}{R_S C_S},\tag{1.3a}$$

$$\omega_{p1} = \frac{1}{R_L C_L},\tag{1.3b}$$

$$\omega_{p2} = \frac{1 + 0.5(g_m + g_{mb})R_S}{R_S C_S}.$$
 (1.3c)



Figure 1.7: CTLE. Left: Schematic. Right: Spectral behavior

Since the maximum rising slope of this single-zero CTLE is +20dB/dec, the product of the peaking gain and the second pole (bandwidth), which denotes the peaking factor satisfies

$$A_0 \frac{\omega_{p1}}{\omega_z} \omega_{p2} \approx \frac{g_m}{C_L}.$$
(1.4)

This means that the performance of CTLE in terms of peaking gain and bandwidth is limited by the cut-off frequency of the technology. Also, due to the high bandwidth and linearity requirements, CTLE tends to be power hungry. On top of this, the difficulty of designing a CTLE with higher order of zero peaking also limits its development. In current high speed serial links, CTLE serves as an standard front-end of RX just to guarantee the minimal vertical eye-opening of received data and leaves the rest of the calibration work to decision feedback equalization (DFE).



Figure 1.8: DFE block diagrams. Top: Regular. Bottom: Loop-unrolling.

Unlike CTLE, nonlinear post-equalization mitigates the effect of channel loss, reflection, and crosstalk by removing the post-cursors of the received symbol in the digital domain. The most widely used nonlinear equalization is DFE. A single tap DFE is shown in Fig. 1.8. It directly remove post-cursor ISI of the previous symbol from the current symbol. Another point of view it is that the DFE adjusts the slicer threshold depending on the previous data state. Since DFE does not amplify received data symbols, it does not deteriorate crosstalk with neighboring devices. Moreover, since the number, the weight, and the order of the taps of DFE can be adjusted in accordance with the characteristics of the channel to be equalized, DFE is not only most effective in eliminating ISI caused by the finite bandwidth of the channel, it is also most effective in eliminating ISI caused by reflection and crosstalk. Since DFE only utilizes the past decisions, it has no effect on pre-cursors.

In a regular DFE, the multiplication and addition/subtraction need to be



Figure 1.9: Comparison between FEE and DFE.

completed in one UI, which is a significant challenge at high data rate. To relax this time constraint, DFE with loop-unrolling structure is developed by moving the multiplication and addition/subtraction out of the feedback loop [15, 16]. The final decision is selected from two slicer according to previous recovered data [17, 18, 19, 20]. By using loop-unrolling technique in DFE, the one-UI time constraint is only applied to the slicer. The complexity of the loop-unrolling DFE increases exponentially with the number of loop-unrolled taps. Loop-unrolling has been used for the first 2 taps in [21] to cope with 28 Gbps data rate and first 3 taps in [22, 23] to cope with 30 Gbps data rate. Fig. 1.9 plots the transfer characteristics of FFE and DFE. It shows DFE boosts the high frequency components to achieve channel equalization while the method of FFE is attenuating the low frequency components. To sum up, DFE has more potential on performance improvements due to its post-channel digital equalizer nature.

DFE using infinite impulse response (IIR) filtered feedback signal are also widely adopted to combat the long ISI induced by severe channel loss [8, 24, 25, 26, 27]. Channel has loss dominated by skin effect will have longer tail of a impulse response compared to channel has loss dominated by



Figure 1.10: DFE with L taps. Left: Conventional DS-DFE. Right: IIR-DFE.



Figure 1.11: Equalization with different types of DFEs. Left: Conventional DS-DFE. Right: IIR-DFE.

dielectric loss even with the same baud-rate channel loss. Fig. 1.10 shows the block diagram of conventional DS-DFE and IIR-DFE. It is seen the key difference between two DFEs is the generation of feedback signal. DS-DFE multiplies the delayed slicer output with tap coefficient to construct a series of discrete time feedback signal to cancel ISI. IIR-DFE feed the slicer output to RC networks with different time constant to construct a series of continuous time feedback signal to cancel ISI. Fig. 1.11 shows the time domain behavior with a 2-tap example of both DFEs. The advantages and limitations of each DFE are evident: conventional DS-DFE has strong feedback signal can improve equalized data slope effectively. Since active duration of each DS-DFE tap is constant of one unit interval (UI), DS-DFE taps with programmable delays have been used to remove reflections in the channel [1, 2, 3], called floating or roving tap DFE. However its complexity and power consumption increase significantly for equalizing channel with a long tail of ISI. On the other hand, IIR-DFE is much more suitable for equalizing channel with long tail of ISI, however as seen in Fig. 1.11 (right), it is less effective on improve data edges due to the tap slope lost on the RC network. To solve this trade-off, hybrid DS- and IIR- DFE are developed to improve the data slope on the main post-cursor with a single DS-DFE tap and use IIR-DFE taps for the long tail of other post cursors [28, 29].

DFE in this dissertation will be based on the conventional discrete time feedback tap since we consider not all the post-cursor ISI is undesired. Only the post-cursor appears near data transitions will reduce the vertical eye opening therefore need to be removed. Post-cursors applied on consecutive data will improve vertical eye opening and should not be removed. Corresponding details will be addressed in the following Chapter.

#### 1.4 Motivation

Most reported DFE falls into the category of data-state (DS) DFE where taps are chosen to maximize the vertical eye opening of equalized data. DS-DFE searches for optimal taps using a sign<sup>2</sup> LMS algorithm that minimizes the error between the vertical eye-opening of desired and equalized data adaptively. DFE that either adds taps to or subtracts taps from incoming data, depending upon the decision of the slicer, is performed on every data. When consecutive 1s are present, DS-DFE will successively subtract taps from the incoming data, reducing vertical eye-opening. Similarly, when consecutive 0s are present, DS-DFE will add taps to the incoming data successively, resulting in reduced vertical eye-opening as well. The successive addition of taps to or subtraction of taps from incoming data not only reduces vertical eye-opening, it also alters the maximum and minimum eye-open boundaries. The eye-open boundaries are also used as the references with respect to which the equalized data are compared. As a result, the references of DS-DFE is tap-dependent and are not known a priori. They need to be obtained iteratively using DFE-like algorithms [30]. To overcome the aforementioned drawbacks of DS-DFE, DFE should not be performed when consecutive 1s or 0s are encountered in order to retain obtained vertical eye-opening. The reference voltages with respect to which the error between the references and the equalized data to be minimized should also be independent of taps such that they can be obtained in a preprocessing phase prior to DFE.

The effectiveness of DS-DFE is increasingly challenged by a shrinking aperture window at high data rates, arising from data-dependent and dataindependent data jitter with the former typically dominant. Data-edge (DE) DFE where taps are chosen to minimize data jitter so as to maximize the aperture window becomes increasingly attractive at high data rates [31, 32]. DE-DFE possesses a number of intrinsic characteristics including the reduced lock time of the receiver phase-locked loop (PLL) and relaxed constraints on the slew rate of the CTLE. Most reported DE-DFE trades performance for design simplicity by only utilizing the sign rather than value of data jitter. The absence of a constraint on vertical eye-opening in DE-DFE results in reduced vertical eye-opening. To overcome this, DS-DFE and DE-DFE are often deployed simultaneously [33, 34].

A common drawback of reported DS-DFE, DE-DFE, and DT-DFE is that tap strength remains constant over the entire data eye. The fact is that the impact of channel imperfection is most severe at the edges of the data eye where the high-frequency components of data concentrate. And channel imperfection is least severe at the center of data eyes where the low-frequency components of data reside. As a result, although equalized eyes have good vertical eye-opening at the center of data eyes, eye-opening drops rapidly when moving away from the center of data eyes, undermining the reduction of data jitter from channel equalization. As a consequence, stringent constraints must be imposed on both the slew rate of the output of the CTLE and the jitter of the receiver PLL.

#### 1.5 Objectives

The objectives of this research were to develop and implement novel adaptive DT-DFEs capable of overcoming the intrinsic drawbacks of DS-DFE. The DT-DFEs should be able to maximize the eye-opening, both vertically and horizontally, of the high-frequency components of received data without attenuating the low-frequency components of the data when consecutive 1s and 0s are encountered so as to retain obtained eye-opening. The performance of the DT-DFEs need to be validated quantitatively by comparing with the performance of the same data links equalized using conventional DS-DFE.

#### **1.6** Thesis Organization

Chapter 1 provides an overview of Gb/s serial links and challenges encountered. It briefly examines various channel equalization techniques with an emphasis on DFE, which is the core of the dissertation. The chapter also highlights the motivation of this study. Chapter 2 examines the intrinsic drawbacks of conventional data-state DFE. It introduces data-transition DFE that overcomes the drawbacks of DS-DFE. The characteristics of DT-DFE are studied in both time and frequency domains. To sharpen transition edges so as to maximize sampling window, DT-DFE with edge-emphasis taps is developed. The effectiveness of the proposed DT-DFE and DT-DFE with edge-emphasis taps is validated using simulation results. Chapter 3 introduces a DT-DFE with EE-taps and raised reference voltages capable of both sharpening the transition edges and enlarging the vertical eye-opening of equalized data. The effectiveness of the proposed DFE is assessed using simulation results. Chapter 4 introduces two power-efficient all-digital frequency difference-to-digital converters and their applications in Gb/s clock and data recovery. The first utilizes two bi-directional gated delay lines (BDGDLs) to perform bang-bang frequency difference detection while the second uses

a single BDGDL to perform integrating frequency difference-to-digital conversion. Finally, the dissertation is summarized in Chapter 5 with a look at future work that is built upon the dissertation.

#### 1.7 Contributions

This dissertation makes the following original contributions :

• An adaptive DT-DFE with a sign<sup>3</sup> least-mean-square (LMS) algorithm is developed. The proposed DT-DFE boosts the high-frequency components of data without attenuating their low-frequency components, overcoming the intrinsic drawbacks of DS-DFE that achieves the same level of eye-opening at the expense of the attenuated low-frequency components of data. The maximum and minimum vertical eye-opening of DS-DFE is tap-dependent whereas that of DT-DFE are tap -independent and can be obtained by conveying consecutive 1s and 0s to the channel in a training phase prior to DFE and measuring the resultant output of the CTLE using successive approximation register analog-todigital converters (SAR ADCs). The proposed DT-DFE utilizes a loopunrolling approach to both detect the occurrence of data transition so as to activate tap tuning and DFE, and combat timing constraints. The effectiveness of the proposed DFE has been validated using simulation results.

#### **Publication**:

- Y. Li and F. Yuan, "Sign3-LMS data-transition decision feedback equalizer," IET Circuits Devices Systems, vol.13, no. 7, pp. 998-1006, Nov. 2019.
- Y. Li and F. Yuan, "Data-transition decision feedback equalizer with S3-LMS adaptation algorithm," Proc. IEEE Int'l NEWCAS Conf., pp.221-224, Montreal, Que., Canada, June 2018.

- Y. Li and F. Yuan, "Adaptive data-transition decision feedback equalizer for serial links," Proc. IEEE Mid-West Symp. Circuits and Systems, pp. 1609-1612, Boston, USA, 2017.
- Y. Li and F. Yuan, "Data-transition adaptive decision feedback equalizer for 2/4 PAM serial links," Proc. IEEE Mid-West Symp. Circuits and Systems, pp. 531-534, Boston, USA, 2017.
- An adaptive DT-DFE with edge-emphasis taps and raised references was developed. The proposed DT-DFT improves both transition edge slopes and vertical eye-opening simultaneously. Loop-unrolling normally used to combat the timing constraints of data-state DFE is further developed for DT-DFE with EE-taps. The reference voltages with respect to which the error signals of DFE are defined are raised beyond that set by the low-frequency components of data to increase vertical eye-opening. Clock and data recovery is performed using blind 4x oversampling and phase-picking. The effectiveness of the proposed DFE has been validated using simulation results of a 10 Gb/s serial link.

#### **Publication**:

- Y. Li and F. Yuan, "Adaptive edge-emphasis data-transition DFE with raised references in oversampling CDR," IEEE Trans. Circuits and Systems I - Regular Papers. Submitted in August 2019 (invited).
- Y. Li and F. Yuan, "Data-transition decision feedback equalizer with edge-emphasis taps and raised references," IEEE NEWCAS, Munich, Germany, June 2019.
- Y. Li and F. Yuan, "Data-transition decision feedback equalizer with edge-emphasis taps," presented at IEEE Int'l Symp. on Circuits and Systems, Florance, Italy, as a "breaking news" paper, May 2018.

• A pre-skewed BDGDL bang-bang frequency detector (BBFD) is developed and applied to frequency-locking of 10 Gb/s serial links. Bangbang frequency detection is performed using a pair of BDGDLs that digitize the logic-1 pulse of receiver oscillator and a reference clock. A redundant successive approximation register (SAR) driven by the output of the BBFD is used to generate the frequency control word (FCW) of the digitally controlled oscillator (DCO) of the receiver. A frequency detection decision can be made in only 4 cycles of the reference clock. The effectiveness of the proposed DFE has been validated using simulation results of a 5 GHz frequency-locked loop.

<u>Publication</u>: Y. Li and F. Yuan, "A pre-skewed bi-directional gated delay line bang-bang frequency detector with applications in 10 Gbps serial link frequency-locking," IEEE Mid-West Symp. Circuis Syst., pp. 263-266, Dallas, TX, Aug. 2019.

• An all-digital low-power high-speed integrating frequency difference-todigital converter (iFDDC) is proposed. The iFDDC utilizes a BDGDL to detect the difference between the frequencies of two 50% duty-cycle square waves. The iFDDC possesses attractive characteristics including all-digital, low power consumption, high-speed operation, and built-in integration. The built-in integration of the iFDDC results in zero static frequency error and first-order noise-shaping when the iFDDC resides in a frequency-locked loop. When the quantization errors of BDGDL and DCO are accounted for, the static frequency error of the FLL is proportional to the quantization errors of the BDGDL and DCO. The effectiveness of the iFDDC has been validated in a 5-GHz frequencylocked loop designed in a TSMC 65 nm 1.2V CMOS.

#### 1.8 Summary

An overview of Gb/s serial links was presented and challenges encountered in design of Gb/s serial links were examined. The characteristics of wire channels were studied briefly with an emphasis on the loss of wire channels at high data rates. Various channel equalization techniques were briefly examined with an emphasis on DFE. The motivations of this study were provided. The organization of the dissertation was detailed and the original contributions of the dissertation including research publications were provided.

## Chapter 2

### **Data-Transition DFE**

Data-state (DS) DFE widely used to combat the impact of the imperfections of wire channels in serial links suffers from an intrinsic drawback of shrunk vertical eye-opening when consecutive 1s or 0s are encountered in data to be equalized. To overcome this fundamental drawback of DS-DFE, in this chapter, we propose a new adaptive data-transition (DT) DFE that performs DFE only when a data transition is present in data to be equalized as to retain vertical eye-opening. Section 2.1 examines the fundamental limitations of DS-DFE. Section 2.2 presents the architecture, algorithm, and implementation of the proposed DT-DFE. Section 2.3 investigates the intrinsic differences between DS-DFE and DT-DFE, error propagation in DS-DFE and DT-DFE, loop-unrolling, and the hardware cost of DS-DFE and DT-DFE. In Section 2.4, the effectiveness of the proposed DT-DFE is compared with that of DS-DFE with the aid of two 5 Gbps backplane serial links. The chapter is concluded in Section 2.5.

#### 2.1 DS-DFE Limitations

Fig. 2.1 shows the configuration of a conventional DS-DFE with 2PAM signaling. The difference between equalized data symbols  $y_{DT,n}$  and desired data symbols  $D_n$ , denoted by  $e_n$ , is used to guide the search for optimal tap coeffi-



Figure 2.1: Data-state DFE.

cients  $c_l$ , l = 1, 2, ..., L, from which the post-cursors of the data are estimated. The estimated post-cursors of past data are then subtracted from the current data symbol to remove their impact. Optimal tap coefficients are obtained iteratively by minimizing the power of  $e_n$  using the following sign-sign(SS)-LMS algorithm

$$c_{l,n+1} = c_{l,n} + h \operatorname{sgn}[e_n] \operatorname{sgn}[D_{n-1}], \qquad (2.1)$$

where h is the step size used to adjust  $c_1$ ,  $e_n$  is the difference between the equalized and desired data symbols at current sample n,  $D_n$  is the output of the slicer. The sign function is defined as sgn[x] = 1 if  $x \ge 0$  and -1 otherwise. Since the output of the slicer, which is the logic state of the equalized data, is used in (2.1). Therefore, the equalized data with DS-DFE is given by

$$y_{DT,n} = x_n - \sum_{l=1}^{L} c_l D_{n-l}.$$
 (2.2)

Existing SS-LMS data-state DFE suffers from the following fundamental drawback: If the output of the slice is logic-1, i.e, data[n]=1, data-DFE will subtract DFE taps from the current data symbol so as to remove the post-cursors. This is desirable only if the next data is 0. If the next data

is also 1, the result of data-DFE operation will actually reduce rather than increase vertical eye-opening. To address this issue, we propose the DT-DFE algorithm uses a S<sup>3</sup>-LMS algorithm to search for optimal tap coefficients utilizing the state transition rather than the state of data. No DFE operation will take place if data transition is absent.

#### **2.2 DT-DFE**



#### 2.2.1 Architecture

Figure 2.2: Architecture of sign<sup>3</sup>-LMS DT-DFE (highlighted part).

Fig. 2.2 shows the architecture of the proposed sign<sup>3</sup>-LMS DT-DFE. The output of the PRBS generator is fed to a current-mode transmitter that conveys a constant current to the channel. The channel is terminated at both ends to minimize reflection. The output of the CTLE is fed to SAR-ADCs where the maximum and minimum vertical eye-openings, denoted by  $V_{max}$  and  $V_{min}$ , respectively, are obtained in a 11-cycle training phase preceding DFE. Two slicers are employed to compare the equalized data with  $V_{max}$  and  $V_{min}$  so as to determine the sign of the difference (error) between the vertical eye-opening of desired and equalized data. The equalized data is also fed to a DTU where the occurrence of two consecutive data transitions of opposite polarities is detected. The output of the DTU is fed to the control logic block of the CPs that tune taps. The CP control logic determines whether

the charge pumps should be activated or not and the direction of the tuning of the charge pumps. Optimal taps are obtained by using a sign<sup>3</sup> LMS algorithm evolved from conventional sign<sup>2</sup> LMS algorithms. In the example presented, although only one tap is used, it is readily extended to multiple taps. Data equalized with proposed DT-DFE can be written as

$$y_{DT,n} = x_n - \sum_{l=1}^{L} c_l D_{n-l} s_n, \qquad (2.3)$$

where  $s_n$  is the data transition signal.  $s_n = 1$  denotes data transition presents from time instance n-1 to n.  $s_n = 0$  means no data transition occurs. Therefore, data is only equalized when  $s_n = 1$  which means DFE only activates when data transition presents.

### 2.2.2 Data-Transition Unit

In order to determine whether a data transition occurs or not in *n*th cycle, both  $D_{n-1}$  and  $D_n$ , the previous and current decisions of the slicer, respectively, are needed. The former is readily available while the latter is not at the time of equalization. This difficulty can be overcome using the loop-unrolling approach commonly used to combat timing constraints of DFE [15]. Fig. 2.3 shows the simplified schematic of the DTU that utilizes loop-unrolling. It has three paths, namely paths A, B, and C. Path A performs DFE with the assumption  $D_{n-1} = +1$  and  $s_n = 1$ . Tap  $c_{1,A}$  implemented using a current source drains a current from the non-inverting terminal of the comparator such that  $v_A$  decreases. This is equivalent to subtracting the tap from the incoming data. Path B performs DFE with the assumption  $D_{n-1} = -1$  and  $s_n = 1$ . Tap  $c_{1,B}$  injects the constant current to the non-inverting terminal of the comparator such that  $v_B$  increases. This is equivalent to adding the tap to the incoming data. Path C performs no DFE and is selected only when no data transition is detected  $(s_n = 0)$ . The outputs of slides A and B are also sampled by DFFs at the falling edge of CLK to generate data transition signal  $s_n$ . To illustrate how  $s_n$  is generated, we consider two cases : (i) Case



Figure 2.3: Data transition unit.

1 (data=0) : In path A, since  $v_A$  decreases and data=0, the decision of slicer A is selected. In Path B, since  $v_A$  decreases and data=0, the decision of slicer B is discarded. Fig. 2.4 provides a graphical explanation on why the decision of slicer A should be kept while that of slicer B should be discarded. (ii) Case 2 (data=1) : In path A, since  $v_A$  decreases and data=1, the decision of slicer A is discarded. In Path B, since  $v_B$  increases and data=1, the decision of slicer B is selected. The preceding analysis shows that in order to detect  $0\rightarrow 1$  transitions,  $A_2 \oplus B_1$  should be used. Similarly, to detect  $1\rightarrow 0$  transitions,  $A_1 \oplus B_2$  should be used. We therefore have  $s_n = A_2 \oplus B_1 + A_1 \oplus B_2$ . The pull-down NMOS transistor gated by  $\overline{A_1 + A_2 + \overline{B_1} + \overline{B_2}}$  is needed to discard data transition indication when there is a definitive data sampling er-



Figure 2.4: Data transition signal  $s_n$ . If data=0, since the tap increases  $v_A$ 

in path A and decreases  $v_B$  in path B, the output of slicer A is selected while that of slicer B is discarded. If data=1, since the tap decreases  $v_A$  in path A and increases  $v_B$  in path B, the output of slicer A is discarded while that of slicer B is selected.

ror. The generation of  $s_n$  during CLK=0 allows the multiplexing operations of loop-unrolling to be performed during CLK=1.

### 2.2.3 Max/Min Vertical Eye-Openings

When consecutive 1s or 0s are encountered, DT-DFE disables DFE to retain obtained vertical eye-opening, revealing that the maximum and minimum vertical eye-opening can be obtained by conveying a train of 1s and 0s to the channel and measuring the corresponding output voltage of the CTLE. Since for a given channel, the maximum and minimum voltages of the output of the CTLE are constant in this case, they can be obtained using SAR ADCs with  $V_{max}$  and  $V_{min}$  the output voltage of the digital-to-time converter (DAC) of the ADCs. Once  $V_{max}$  and  $V_{min}$  are obtained, the SAR ADCs are disabled thereby retaining the content of the SARs. In this work, two 8-bit chargeredistribution SAR ADCs with redundant SARs are used to find  $V_{max}$  and  $V_{min}$  in a 11 cycle training phase preceding DFE. Although redundant SARs are used [35], one can also use a non-redundant SAR to lower cost [36].

### 2.2.4 Sign<sup>3</sup>-LMS

Considering single tap for simplicity, conventional DS-DFE uses the sign<sup>2</sup>-LMS algorithm given in (2.4) to search for the optimal tap

$$c_{1,n} = c_{1,n-1} + h \operatorname{sgn}[e_n] \operatorname{sgn}[D_{n-1}], \qquad (2.4)$$

where h is the step size used to adjust  $c_1$ , e is the error between the desired and equalized data,  $s_n$  is data transition,  $D_{n-1}$  is the past decision of the slicer, and n indicates nth iteration cycle. To eliminate the intrinsic drawback of DS-DFE detailed earlier, the proposed DT-DFE uses the sign<sup>3</sup>-LMS algorithm given in (2.5) to tune the tap

$$c_{1,n+1} = c_{1,n} + h \operatorname{sgn}[e_n] \operatorname{sgn}[D_{n-1}] \operatorname{sgn}[s_n s_{n-1}].$$
(2.5)



Figure 2.5: DT-DFE with sign<sup>3</sup>-LMS. The output of the charge pumps is fed to a loop filter to smooth tap tuning.

The difference between the tap adaptation of DT-DFE and DS-DFE is the addition of  $sign[s_n s_{n-1}]$ . The tap is updated only when two consecutive state

transitions of opposite polarities are detected. Fig. 2.5 shows the simplified schematic of the adaptation scheme of DT-DFE. Equalized data in paths A and B are compared with  $V_{min}$  and  $V_{max}$ , respectively. As the output voltage of the slicers is not full swing, DFFs are employed at the output of the slicers to restore voltage swing. In order to synchronize with  $s_n s_{n-1}$ , additional DFFs are used to generate  $e_{A,n-1}$  and  $e_{B,n-1}$ . The signals that activate the charge pumps are given by  $\text{En}_A = s_n s_{n-1} \overline{D}_{n-1}$  and  $\text{En}_B = s_n s_{n-1} D_{n-1}$ . The charge pumps are activated only when two consecutive data transitions of opposite polarities are encountered, i.e.  $s_n s_{n-1}=1$ . The sign of  $D_{n-1}$ determines either tap  $c_{1,A}$  or  $c_{1,B}$  needs to be updated. The direction of the tuning of the charge pumps is set by the polarity of the error signal.

### 2.2.5 Tap Addition/Subtraction

The simplified schematic of the tap generator that performs the addition /subtraction of the tap to/from the output of the CTLE is shown in Fig. 2.6. Since both CTLE and slicer are differential, the tap generator is also differentially configured. It is seen that the tap generator is activated only when  $s_n = 1$ . Further when  $D_{n-1} = 1$ , C+ will be charged by the head current while C- will be discharged by the tail current, resulting in a smaller equalized signal  $v_e = v_e^+ - v_c^-$ . This is equivalent to subtracting the tap from the incoming data. Similarly when  $D_{n-1} = 0$ ,  $v_e$  will increase.

# 2.3 Discussions

### 2.3.1 Tap Tuning

The tap will be tuned when two consecutive data transitions of opposite polarities occur e.g.  $s_n s_{n-1} = 1$ . DFE operation e.g. the addition/subtraction of the tap to/from the output of CTLE will be performed when a data transition is detected e.g.  $s_n = 1$ . Since data with two consecutive transitions contain more high-frequency components compared with those with single



Figure 2.6: Tap addition/subtraction.

data transitions, tap adjustment that is dictated by the high-frequency components of data allows the tap to be optimized to better accommodate the more stringent equalization constraints of the high-frequency components of data so as to get better eye-opening. Since the tap is optimized for the high-frequency components of data, it exceeds the necessary strength for dealing with the low-frequency components of data. As a result, the channel is equalized for the high-frequency components of data but attenuated for the low-frequency components of data.

### 2.3.2 DFE Activation

Two options to determine when to activate DFE are available : Activate DFE when a single data transition is detected or when two consecutive data transitions of opposite polarities are present. Let us examine them in detail : (i) Option 1 (Active DFE on a data transition) : Consider the data sequence in Fig. 2.7(a). One post-cursor needs to be removed. At t=2,  $0\rightarrow 1$  transition at t=1 has been detected and DFE is activated at t=3. It removes the post-cursor at t=3. Similarly, at t=6,  $0\rightarrow 1$  transition at t=5 has been detected and DFE is activated at t=6. DFE operation

performed at t=3 lowers vertical eye-opening while that performed at t=6 sharpens the falling edge of the eye, improving data jitter. (ii) Option 2 (DFE is activated on two consecutive data transitions of opposite polarities) : Consider the same data sequence shown in Fig. 2.7(b). Since two consecutive data transitions of opposite polarities do not exist, no DFE is performed. As a result, the vertical eye-opening is larger as compared with that of Option 1 but the falling edge has a smaller slope, worsening data jitter. It is evident that both options have their pros and cons. In this work, option 1 is used.



Figure 2.7: DFE activation schemes.

### 2.3.3 DS-DFE versus DT-DFE

In this section, we analyze the intrinsic differences between DS-DFE and DT-DFE. Let x(t) be the output of the CTLE. The equalized data of DS-DFE and that of DT-DFE in *n*th cycle, denoted by  $y_{DS,n}$  and  $y_{DT,n}$ , respectively, are given by

$$y_{DS,n} = x_n - c_1 D_{n-1}, (2.6a)$$

$$y_{DT,n} = x_n - c_1 D_{n-1} s_n, (2.6b)$$

where  $0 < c_1 < 1$  is the tap. It is seen from (2.6a) that when consecutive 1s are present in data, DS-DFE will successively subtract the tap from data even though no DFE is preferred as ISI in this case actually increases rather than reduces vertical eye-opening. Similarly, when consecutive 0s



Figure 2.8: Simulated eye diagram equalized using DS-DFE with one tap. The strength of the tap : 0.7 mA, 1.0 mA, and 2.0 mA . Data rate : 5 Gbps. Channel characteristics : Depicted in Fig. 1.5.

are present, DS-DFE will successively add the tap to data even though no DFE is also preferred in this case as well. Therefore, DS-DFE increases the vertical eye-opening of the high-frequency components of data at the expense of the reduced vertical eye-opening of the low-frequency components of data . Increasing tap strength, though improving the vertical eye-opening of the high-frequency components of data, lowers the vertical eye-opening of the low-frequency components of data, as evidenced in Fig. 2.8. The vertical eye-opening that DS-DFE can achieve is upper-bounded by the low-frequency component rather than the high-frequency components of data as the tap strength increases. Although modulation is typically performed on data prior to their transmission to allow modulated data to possess a DC-balanced characteristic critical so as to prevent the drift of the DC operating point of the CTLE, consecutive 1s and 0s still exist in data and the vertical eye-opening of equalized data is data-dependent. As a result, the reference

voltages of DS-DFE are data-dependent, unknown a priori, and can only be obtained using a DFE-like algorithm [30].



Figure 2.9: Simulated eye diagram equalized using DT-DFE with one constant tap. The strength of the tap : 0.7 mA, 1.0 mA, and 2.0 mA . Data rate : 5 Gbps. Channel characteristics : Depicted in Fig. 1.5.

The drawback of DS-DFE can be eliminated if DFE is activated only when a data transition is present and disabled when two consecutive 1s or 0s are encountered, as (2.6b). Since DFE is disabled when two consecutive 1s or 0s are encountered, the vertical eye-opening of the low-frequency components of data is least affected by DFE, as evidenced in Fig. 2.9. The reference voltages in this case can be obtained in a preprocessing phase where only consecutive 1s and 0s are transmitted. Since the high-frequency components of data is attenuated the most, the optimal value of tap c should be chosen such that the high-frequency components of data is compensated the most. As two consecutive transitions data with opposite polarities (010 and 101) carry more high-frequency components, tap c is obtained iteratively using the sign<sup>3</sup>-LMS algorithm given in (2.5) such that the vertical eye-opening of data of two consecutive transitions of opposite polarities is maximized.

Examinations in frequency domain further evident preceding findings. Let the difference between the desired and equalized data in *n*th cycle be  $e_n = D_n - y_n$ , we have

$$y_{DS,n} = x_n - c_1 \left( y_{DS,n-1} + e_{n-1} \right).$$
(2.7)

Since  $s_n = 1$  if  $D_n$  and  $D_{n-1}$  differ and 0 otherwise, it is seen in Table 2.1 that  $D_{n-1}s_n = 2(D_{n-1} - D_n)$  holds. Eq.(2.6b) can be written as

$$y_{DT,n} = x_n + c'_1 \left( y_{DT,n} + e_n - y_{DT,n-1} - e_{n-1} \right), \qquad (2.8)$$

where  $c'_1 = c_1/2$ . z-transform of  $y_{DS,n}$  and that of  $y_{DT,n}$  are given by

$$Y_{DS}(z) = \frac{X(z)}{1 + c_1 z^{-1}} - \frac{c_1 z^{-1}}{1 + c_1 z^{-1}} E(z), \qquad (2.9a)$$

$$Y_{DT}(z) = \frac{X(z)}{1 - c_1'(1 - z^{-1})} + \frac{c_1'(1 - z^{-1})}{1 - c_1'(1 - z^{-1})}E(z).$$
(2.9b)

The first and second terms of (2.9a) are the signal transfer function (STF) and error transfer function (ETF) of DS-DFE and are denoted by STF<sub>DS</sub> and ETF<sub>DS</sub>. Similarly, the first and second terms of (2.9b) are the SFT and ETF of DT-DFE and are denoted by STF<sub>DT</sub> and ETF<sub>DT</sub>. The frequency range of interest is  $0 \le \omega \le \omega_s$  where  $\omega_s = 2\pi/(2T_s) = \pi/T_s$  is the baud-rate frequency and  $T_s$  is the symbol time. For the low-frequency components of data,  $z^{-1} = e^{-j\omega T_s} \approx 1 - \omega T_s$  holds. We have STF<sub>DS</sub>( $j\omega$ )  $\approx 1/(1 + c_1)$ and STF<sub>DT</sub>( $j\omega$ )  $\approx 1$ . The low-frequency components of data are attenuated in DS-DFE but not in DT-DFE. Also for the low-frequency components of data, DT-DFE has a unity signal gain while the signal gain of DS-DFE is tapdependent. The larger the tap, the severer the attenuation and the smaller the vertical eye-opening. At  $\omega_s$ , we have STF<sub>DS</sub> =  $1/(1 - c_1)$  and STF<sub>DT</sub> =  $1/(1 - 2c'_1) = 1/(1 - c_1)$ , revealing that the vertical eye-opening of DS-DFE and that of DT-DFE are the same for the high-frequency components of data.

Table 2.1: Data transition.

| $D_{n-1}$ | $D_n$ | $s_n$ | $D_{n-1}s_n$ | $D_{n-1} - D_n$ |
|-----------|-------|-------|--------------|-----------------|
| 1         | 1     | 0     | 0            | 0               |
| 1         | -1    | 1     | 1            | 2               |
| -1        | 1     | 1     | -1           | -2              |
| -1        | -1    | 0     | 0            | 0               |

For the low-frequency components of data, we have  $\text{ETF}_{DS}(j\omega) \approx -c_1/(1+c_1)$  and  $\text{ETF}_{DT}(j\omega) \approx (j\omega T_s)c_1'$ . The ETF of DS-DFE is non-zero and constant while that of DT-DFE is proportional to frequency and exhibits a first-order error-shaping profile. For the high-frequency components of data, in particular,  $\text{ETF}_{DS}(j\omega) \approx c_1/(1-c_1)$  and  $\text{ETF}_{DT}(j\omega) \approx c_1/(1-c_1)$ . The ETF of DS-DFE and that of DT-DFE are the same.



Figure 2.10: STF of both DFEs with different normalized taps.

Fig. 2.10 compares  $\text{STF}_{DS}$  and  $\text{STF}_{DT}$ . It is seen that at low frequencies,  $\text{STF}_{DT}$  is larger as compared with  $\text{STF}_{DS}$ .  $\text{STF}_{DS}$  levels off with the increase in the tap while  $\text{STF}_{DT}$  is independent of the tap. Since most of the energy



Figure 2.11: ETF of both DFEs with different normalized taps.

of NRZ (Non-return-to-zero) data is concentrated at low frequencies, DT-DFE outperforms DS-DFE by preserving more energy of data subsequently larger vertical eye opening. 2.11 compares  $\text{ETF}_{DS}$  and  $\text{ETF}_{DT}$ . It is seen that  $\text{ETF}_{DT}$  is much smaller at low frequencies and exhibits first-order noiseshaping. The reduced ETF of DT-DFE at low frequencies allows equalized data to be more close to the desired value thereby yielding large vertical eye-opening. In addition to the analysis, both DFEs can be considered as IIR filter with transfer function in (2.9). Since the normalized tap strength is always smaller than 1, all the poles of the DFE transfer functions falls within the unit circle. Therefore, DFE can be considered as a stable feedback system with causal input. However, if the nonlinearity of slicer is taken into consideration, erroneous decision from slicer can cause divergence on tap adaptation due to error propagation.

### 2.3.4 Error Propagation

Error propagation is inherent to DFE. Techniques were developed to minimize the slicer error [37, 38]. Without considering effect of slicer kick-back [39, 40], we compare error propagation in DS-DFE and DT-DFE from (i) a tap tuning perspective and (ii) a DFE perspective. (i) Error propagation from a tap tuning perspective : It is seen in (2.4) that an erroneous decision of the slicer in DS-DFE will directly affect the tuning of the tap. Consider  $D_{n-1} = +1$ . The is subtracted from nth incoming data to remove the postcursor of  $D_{n-1}$  at *n*th sampling instant. Suppose that the slicer makes an erroneous decision such that  $D_{n-1} = -1$ . The tap in this case will be added to rather than subtracted from nth incoming data. As a result, not only the post-cursor of  $D_{n-1}$  at nth sampling instant will not be removed, it will in fact be added to the incoming data. As a result, the subsequent decision of the slicer will likely be incorrect. Although DT-DFE is also subject to error propagation, it is seen from (2.5) that an erroneous decision of the slicer will only affect the tap if it results in two consecutive data transitions of opposite polarities. To illustrate this, consider 3 consecutive decisions of the slicer without and with errors shown in Fig. 2.12(a). It is seen that in the cases of 1 error bit or two error bits, tap tuning takes place at different locations as compared with those without error. In the case of 3 error bits, tap tuning takes place at the same locations as those without error. (ii) Error propagation from a DFE perspective : When the slicer makes an erroneous decision, it will immediately impact DS-DFE as DFE is performed on every data. Since DFE is performed only on a data transition in DT-DFE, an erroneous decision of the slicer will activate DFE only if such an erroneous decision results in a state transition. Consider 2 consecutive decisions of the slicer without and with errors shown in Fig. 2.12(b). It is seen that in the case of 1 error bit, DFE takes place at different locations as compared with those without error. In the case of 2 error bits, DFE takes place at the same locations as those without error. These observations show that DT-DFE is less sensitive to slicer error.

| No err.    | 1 err. bit              | 2 err. bits                       | 3 err. bits |         |                |             |
|------------|-------------------------|-----------------------------------|-------------|---------|----------------|-------------|
| 000        | 100 010 001             | 110 101 011                       | 111         | No err. | 1 err. bit     | 2 err. bits |
| 001        | 101 011 000 110 000 011 | 111 100 <u>010</u><br>100 111 001 | 110<br>101  | 00      | 10 01          | 11          |
| 011        | 111 001 010             |                                   | 100         | 01      | 11 00          | 10          |
| 100        | 000 110 101             |                                   | 011         | 10      | 00 11<br>01 10 | 01<br>00    |
| 101<br>110 | 001 111 100             | 011 000 110<br>000 011 101        | 010         |         |                | 00          |
| 111        | 011 101 110             | 001 010 100                       | 000         |         |                |             |
|            |                         | (b)                               |             |         |                |             |

Figure 2.12: Error propagation. (a) Tap tuning perspective. (b) DFE perspective. Error bits are highlighted.

### 2.3.5 Loop-Unrolling

The proposed DT-DFE utilizes a loop-unrolling approach to accommodate data transitions. Its functionality is two-fold : (i) It determines the occurrence of 0-to-1 or 1-to-0 transitions and performs DT-DFE when such a transition has occurred. (ii) It performs loop-unrolling in a similar way as conventional loop-unrolling so as to combat timing constraints of DFE. The former is the core of DT-DFE whereas the latter is a by-product of the DT-DFE.

### 2.3.6 Hardware Cost

Table 2.2 compares the hardware cost of DS-DFE with loop-unrolling and that of the proposed DT-DFE. Since  $V_{max}$  and  $V_{min}$  of DS-DFE are tapdependent, they are not known a priori and can be obtained using DFE-like algorithms, resulting in an extra power and silicon consumption [30, 41].  $V_{max}$ and  $V_{min}$  of DT-DFE are tap-independent and are obtained using SAR ADCs in a training phase prior to DFE operation. Aside from  $V_{max}$  and  $V_{min}$ , the hardware overhead of DT-DFE also consists of path C, logic for generating  $s_n$ , and that for switching off the charge pumps when two consecutive data transitions of opposite polarities are absent. These minor hardware overhead is well justified by the performance improvement obtained from DT-DFE to be detailed in the next section.

DescriptionDS-DFEDT-DFE $V_{max} \& V_{min}$ DFE-like blocksSAR ADCsLoop-unrollingPaths A & BPaths A, B, & CDT detectNone $s_n$ -logicCP logicNoneYes

Table 2.2: Hardware comparison of DS-DFE and DT-DFE.

### 2.4 Simulations Results

The proposed DT-DFE was designed in a TSMC 65 nm CMOS technology and analyzed using Spectre with BSIM4 device models. For the purpose of comparison, a DS-DFE with loop-unrolling was also designed. The channel to be equalized is a backplane channel consisting of two identical sections, each has 3 backplane traces and 2 connection points with an overall channel length of 66 cm and a 30 dB loss at 100 Gbps [42]. Both channels are tested with 5Gbps data rate. At 2.5 GHz baud rate frequency, channel loss is approximately -13 dB which has been depicted in Fig. 1.5.

Fig. 2.13 shows adaptation process. Adaptation completes in approximately 100 ns. Fig. 2.14 plots  $v_A^+$  and  $v_B^+$  of DT-DFE. It is seen that the voltages stabilize in approximately 100 ns, which is the adaptation time of DT-DFE. Fig. 2.15 plots  $v_A$ ,  $v_B$ ,  $v_C$ , and equalized data (solid line) with DS-DFE and DT-DFE. Note that the equalized data is selected among  $v_A$ ,  $v_B$ , and  $v_C$ , depending upon data transition. It is seen that for the highfrequency components of data, both DS-DFE and DT-DFE select equalized data from path B in order to maximize vertical eye-opening. For the lowfrequency components of data, DS-DFE selects equalized data from path A



Figure 2.13: Adaptation process of taps  $v_A^+$  and  $v_B^+$  of proposed DT-DFE.

thereby lowering vertical eye-opening while DT-DFE selects those from path C preserving obtained vertical eye-opening.

Since the equalized data are obtained from  $v_A$ ,  $v_B$ , and  $v_C$ , depending upon data transition, to generate the eye-diagram of the equalized data,  $v_A$ ,  $v_B$ , and  $v_C$  are recorded, along with  $s_n$ , and processed in MATLAB. Fig. 2.16 compares the eye diagram of equalized data using DS-DFE and DT-DFE. It is seen that the eye-opening of the equalized data with DT-DFE is approximately 1.54 times that with DS-DFE. The minor hardware overhead detailed in Table 2.2 is well justified by the performance improvement.

# 2.5 Summary

An adaptive DT-DFE with sign<sup>3</sup>-LMS was proposed. It was shown that DS-DFE suffers from an intrinsic drawback of shrunk vertical eye-opening when



Figure 2.14: Left: Waveform of  $v_A^+$ . Right: Waveform of  $v_B^+$ .



Figure 2.15: Voltages of paths A, B, C, and equalized data (solid line).

consecutive 1s or 0s are encountered whereas DT-DFE performs DFE only when data transitions are presented thereby preserving obtained vertical eyeopening. We further showed that DT-DFE boosts the high-frequency components of data without attenuating their low-frequency components whereas DS-DFE achieves the same at the cost of attenuated low-frequency components of data. Moreover, we showed that the maximum and minimum vertical eye-opening of DS-DFE are tap-dependent and can only obtained using DFElike algorithms whereas those of DT-DFE are tap-independent. They can be obtained by conveying consecutive 1s and 0s to the channel and measuring the resultant output of the CTLE using SAR ADCs. The proposed DT-DFE



Figure 2.16: Eye diagrams. Left: DS-DFE. Right: DT-DFE.

utilizes a loop-unrolling-like approach to both detect data transitions so as to activate tap tuning and DFE and combat timing constraints. The hardware cost of DS-DFE and that of DT-DFE were compared. The effectiveness of the DT-DFE was compared with DS-DFE using two 5 Gbps backplane serial links. Simulation results demonstrated that the vertical eye-opening of the equalized data with DT-DFE is 1.54 times that with DS-DFE.

# Chapter 3

# Adaptive Edge-Emphasis Data-Transition DFE with Raised References in Oversampling CDR

This chapter presents a DT-DFE with edge-emphasis (EE) taps that provides enhanced DFE compensation at the edges of data eyes and adequate DFE compensation at the center of data eyes so as to both sharpen the edges and maximize the vertical opening of equalized data [43]. It allows the vertical eye-opening of equalized data to exceed the upper and lower bounds of the output of the CTLE when consecutive 1s and 0s are transmitted so as to increase vertical eye-opening. Section 3.1 investigates the common drawbacks of DS-DFE and DT-DFE with constant taps. Section 3.2 introduces edgeemphasis taps and their implementation. Adaptive DT-DFE with EE-taps is developed in Section 3.3. Loop-unrolling for determining transition indicator and DT-DFE with EE-tap is developed. The principle of raising the reference voltages so as to increase vertical eye-opening is also provided. Section 3.4 presents 4x blind oversampling clock and data recovery. Simulation results validating the proposed DT-DFE with EE-taps are presented in Section 3.5. The chapter is concluded in Section 3.6.

# 3.1 Common drawbacks of DS-DFE and DT-DFE



Figure 3.1: Comparison of DS-DFE, DT-DFE with constant taps, DT-DFE with EE taps, and DT-DFE with EE-taps and raised references.

Fig. 3.1 detailed the distinct characteristics of DS-DFE and DT-DFE. It is seen that both DFEs suffer from the common drawback that the tap is constant over the entire duration of data symbol. Even though the impact of channel imperfection is most severe at the edges of data eyes where the high-frequency components of data concentrate. And the imperfection is least severe at the center of data eyes where the low-frequency components of data reside. As a result, although equalized eyes have good vertical eyeopening at the center of data eyes, eye-opening deteriorates rapidly when moving away from the center of data eyes. As a result, data jitter suppression obtained from channel equalization is rather limited. As minimizing data jitter becomes increasingly critical at high data rates, in order to sharpen the transition edges of equalized data so as to minimize the jitter of equalized data, the strength of taps should be set as per the profile of received data.

# 3.2 Edge-Emphasis Taps



Figure 3.2: Top : Voltages of unequalized data. Bottom : Currents of EE-taps. (a) EE-taps for 010 transitions. (b) EE-taps for 101 transitions.

Fig. 3.2 shows the profile of the EE-taps. There are multiple taps of reduced duration and variable strength per EE-tap with strong taps at the edges of the data eye and adequate taps at the center of the data eye so as to minimize data jitter and maximize vertical eye-opening simultaneously. The duration of each data symbol is partitioned into 4 equal intervals of identical duration  $\tau = T_{sym}/4=25$  ps where  $T_{sym}=100$  ps is symbol time. Consider the voltage of the non-inverting terminal of the slicer. Tap 1 injects a large current to the non-inverting node of the slicer to increase the slope of the voltage. Taps



Figure 3.3: Schematic of edge-emphasis taps for 010 transitions. The polarity in each tap indicates the polarity of the tap.

2 and 3 continue to inject currents to the node to further improve vertical eye-opening. To avoid the difficulty encountered in generating current pulses of narrow pulse width, the duration of  $I_1$ ,  $I_2$ , and  $I_3$  are set to 50 ps, 75 ps, and 100 ps, respectively. The strength of taps 1, 2, and 3, denoted by  $c_{EE,1}$ ,  $c_{EE,2}$ , and  $c_{EE,3}$ , respectively, are given by

$$c_{EE,1} = I_1 + I_2 + I_3,$$
  

$$c_{EE,2} = I_2 + I_3,$$
  

$$c_{EE,3} = I_3.$$
  
(3.1)

Equalized data, denoted by  $y_{DT,EE,n}$ , can be written as

$$y_{DT,EE,n} = x_n - \left\{ c_{EE,1} \left[ u(t) - t(t - 2\tau) \right] + c_{EE,2} \left[ u(t - 2\tau) - t(t - 3\tau) \right] + c_{EE,3} \left[ u(t - 3\tau) - t(t - 4\tau) \right] \right\} D_{n-1} S_n,$$
(3.2)

where  $\tau=25$  ps. Note that *n*th data cycle is  $[0,t_4]$  rather than  $[t_1,t_5]$ . Eq. (3.2) is valid for both  $0\rightarrow 1$  and  $1\rightarrow 0$  transitions. For example, for a  $0\rightarrow 1$  transition, since  $D_{n-1}=-1$ , the EE-tap is added to  $x_n$  as a current injected to the summing node. Similarly for a  $1\rightarrow 0$  transition, since  $D_{n-1}=+1$ , the EE-tap is subtracted from  $x_n$  as a current drained from the summing node.

Tap currents are obtained using the current-steering tap generator shown in Fig. 3.3 with its strength set by the tail/head currents. Because currentsteering only steers the direction of the current rather than switches on/off the current, the timing constraints of the tap are satisfied.



Figure 3.4: 4-stage digitally controlled differential ring oscillator.

The gating signals that steer the currents of the tap generator are generated from the outputs of a 4-stage digitally controlled differential ring oscillator with its schematic shown in Fig. 3.4. Buffers are added at the output nodes of the oscillator to allow the oscillator to have adequate driving ability. The gating signals of the tap generator are shown in Fig. 3.5 and are given by  $\phi_1 = \phi_{0^\circ} \cdot \phi_{270^\circ}, \phi_2 = \phi_{0^\circ} \cdot \phi_{315^\circ}, \phi_3 = \phi_{0^\circ}, \phi_4 = \phi_{90^\circ} \cdot \phi_{180^\circ}, \phi_5 = \phi_{135^\circ} \cdot \phi_{180^\circ},$ and  $\phi_6 = \phi_{180^\circ}$ .

Let us use tap 1 shown in Fig. 3.6(a) to illustrate the implementation of the tap. The gating signals of the tap are  $\phi_1 = \phi_{0^\circ} \cdot \phi_{270^\circ}$  and  $\phi_4 = \phi_{90^\circ} \cdot \phi_{180^\circ}$ . As the top and bottom switches are realized using PMOS and



Figure 3.5: Gating signals for edge-emphasis taps.

NMOS transistors, respectively, the gating signals of the PMOS transistors need to be negated as compared with those in Fig. 3.6(a), as shown in Fig. 3.6(b). The gating signals of the NMOS transistors in Fig. 3.6(b) are the same as those in Fig. 3.6(a). In order to obtain the correct logic for the gating signals, the tables given in Fig. 3.6(d) and Fig. 3.6(e) are utilized. The resultant circuit that generates the gating signals is shown in Fig. 3.6(c). The AND and OR operations are realized using "wired AND" and "wired OR", respectively, in order to meet timing constraints.

Following the same approach, one can show that the gating signals of the remaining sub-taps are given by  $\phi_2 = \phi_{0^{\circ}} \cdot \phi_{315^{\circ}}, \overline{\phi_2} = \phi_{135^{\circ}} + \phi_{180^{\circ}}, \phi_3 = \phi_{0^{\circ}}, \overline{\phi_3} = \phi_{180^{\circ}}, \phi_4 = \phi_{90^{\circ}} \cdot \phi_{180^{\circ}}, \overline{\phi_4} = \phi_{0^{\circ}} + \phi_{270^{\circ}}, \phi_5 = \phi_{135^{\circ}} \cdot \phi_{180^{\circ}}, \overline{\phi_5} = \phi_{0^{\circ}} + \phi_{315^{\circ}}, \phi_6 = \phi_{180^{\circ}}, \overline{\phi_6} = \phi_{0^{\circ}}.$  Fig. 3.7 provides the simplified schematics of taps 2 and 3. The head and tail currents of the taps are provided by the master current source I via current mirrors.



Figure 3.6: Implementation of tap 1.



Figure 3.7: Implementation of taps 2 and 3. The implementation of the current sources is detailed in Fig. 3.6.

# **3.3** DT-DFE with EE-taps

# 3.3.1 Architecture

It is seen in (2.5) that to compute  $c_n 4$ ,  $S_n$  is needed. The determination of  $S_n$ , however, requires the knowledge of  $D_n$ , which is not available at the time

when DFE is performed. Since  $D_n$  only has two possible values, 0 or 1, the difficulty associated with the unavailability of  $D_n$  can be resolved using loopunrolling that performs DFE for both  $D_n=1$  and  $D_n=0$  blindly first and selects the appropriate equalized result once  $D_n$  becomes available. Since only after  $S_n$  becomes available, DT-DFE can be performed, loop-unrolling is utilized in both determining  $S_n$  and performing DFT-DFE with an EE-tap. Fig. 3.8 shows the simplified schematic of the proposed DT-DFE with loopunrolling. Paths A and B perform DT-DFE with an EE-tap when a data transition is present, specifically, the result of path A is selected when data are 010101... while that of path B is selected when data are 101010... Paths C and D perform DT-DFE with a constant tap when no data transition is present, specifically the result of path C is selected when data are consecutive 1s while that of path D is selected when data are consecutive 0s. Because no prior knowledge of  $D_n$  is available at the time of executing DFE, incoming data are processed by all four paths. One can either use one CTLE followed by four gain stages or four CTLEs for this. The former is generally preferred due to its small input capacitance at the far end of the channel and the fact that the gain of CTLE at low frequencies is typically not large enough such that an additional gain stage following the CTLE is often needed in order to minimize the possibility of an erroneous decision by the slicer. Using four CTLEs overloads the channel, resulting in a small signal at the input of the CTLEs subsequently a small output of the CTLEs as well. As a result, an additional gain stage following the CTLEs is needed.

### **3.3.2** Loop unrolling for $S_n$

Let us focus on paths C and D first as transition indicator  $S_n$  generated by them is needed in paths A and B. Assume  $D_{n-1}=1$  and a "1" is transmitted, i.e.  $D_n D_{n-1}=11$ . Path C performs  $v_c^+ = v_{o,C}^+ + C_{DT+}$  and  $v_c^- = v_{o,C}^- - C_{DT+}$ . Similarly, path D performs  $v_D^+ = v_{o,C}^+ - C_{DT-}$  and  $v_c^+ = v_{o,C}^- + C_{DT+}$ where  $c_{DT+}$  and  $c_{DT-}$  are taps whose value is constant over the entire symbol interval. This differs from the EE-taps in paths A and B whose value varies



Figure 3.8: DT-DFE with an EE-tap when a data transition is present. When no data transition is present, DT-DFE with a constant tap is performed.

with the profile of the data. The reason for this is that paths C and D are for data without transitions while paths A and B are for data with transitions. Since no data transition exists,  $S_n=0$  and  $\bar{s}_n=1$  (we will show how  $S_n$  is determined shortly). DFF 6 samples  $D_{n-1}$  and outputs  $X_n=1$ . The output of path C is selected by multiplexer 2. To validate whether multiplexer 2 makes the correct selection or not, we make use of Fig. 3.9. The solid lines represent the signals before equalization while the dashed lines represent the equalized signals. Since  $D_n=1$  is transmitted, in path C,  $v_c^+$  is increased by  $c_{DT+}$  from  $v_{o,C}^+$  and  $v_c^-$  is reduced by  $c_{DT+}$  from  $v_{o,C}^-$  while in path D,  $v_D^+$  is lowered by  $c_{DT+}$  from  $v_{o,D}^+$  and  $v_D^-$  is increased by  $c_{DT+}$  from  $v_{o,D}^-$ . Since transmitted data is "1", the decision of slicer C whose input  $v_c^+ - v_c^-$  is larger than that of slicer D is selected. Similarly one can show if the transmitted data is "0", the decision of path D should be selected.



Figure 3.9: Selection of equalized data. The polarity box on the waveform indicates the polarity of the signal. The check box at the top-left corner of each sub-figure indicates whether the output of the slicer is selected ( $\sqrt{}$ ) or discarded ( $\times$ ).

Let us now show how  $S_n$  is generated. The circuit generating  $S_n$  and its operational principle are shown in Fig. 3.8(b).  $S_n$  is generated during CLK=0 such that no clock cycle is wasted. Focus on paths C and D. As mentioned earlier that DFE is performed in both paths C and D on every incoming data. Again assume "1" is transmitted. The waveforms of  $v_{o,C}^+$ ,  $v_{o,C}^-$ ,  $v_{o,D}^+$ , and  $v_{o,D}^-$  are sketched in the left sub-figures of Fig. 3.10(a). After the DFE operation, the waveforms of the results, i.e.  $v_c^+$ ,  $v_c^-$ ,  $v_D^+$ , and  $v_D^-$  are also sketched. It is seen that since the received data is "1", the output of the slicer in path C, which is "1" is selected as  $v_c^+ - v_c^- > v_D^+ - v_D^-$ . If the next transmitted bit is "0", a data transition will exist. The waveforms of  $v_{o,C}^+$ ,  $v_{o,C}^-$ ,  $v_{o,D}^+$ , and  $v_{o,D}^-$  corresponding to "0" are sketched in the right sub-figures of Fig. 3.10(a). After the DFE operation, the waveforms of the results, i.e.  $v_c^+$ ,  $v_c^-$ ,  $v_D^+$ , and  $v_D^-$  are also sketched. It is seen that since the data is "0", the output of the slicer in path D is selected as  $v_D^+ - v_D^- > v_c^+ - v_c^-$ . To flag the occurrence of the 1 $\rightarrow$ 0 transition, an XOR2 gate that takes the inputs from D<sub>1</sub> and C<sub>2</sub> is used. Similarly one can show that in order to flag the occurrence of a 0 $\rightarrow$ 1 transition, an XOR2 gate that takes the inputs from C<sub>1</sub> and D<sub>2</sub> is used.



Figure 3.10: Detection of the occurrence of  $1 \rightarrow 0$  and  $0 \rightarrow 1$  data transitions.

### 3.3.3 Loop unrolling for DT-DFE with EE-tap

Let us now examine paths A and B. It was shown in (2.6) that DFE will be performed only when a data transition is encountered. Since the value of  $S_n$ is unknown at the time of the execution of DFE, loop-unrolling is used. As shown in Figs. 3.2 and 3.3, path A performs DT-DFE with EE-tap for data 010 whereas path B performs DT-DFE with EE-tap for data 101. Since the output of the gain stage is differential, DFE with EE-tap is performed on both the non-inverting and inverting outputs of the gain stage with opposite polarities, as shown in Fig. 3.8(a). The results of paths A and B are selected by multiplexer 1 with its multiplexing signal  $Y_n$  generated by the circuit in Fig. 3.8(b). Let us consider the following cases :

- (i)  $S_n=0$  (no data transition). If  $Y_{n-1}=0$ ,  $Y_n=0$  will follow. As long as no data transition is present in data,  $Y_n$  will remain to be 0. Multiplexer 3 in this case will select the result from either paths C or D regardless of the result of paths A and B. Whether the result of path C or that of path D will be selected is determined by  $X_n$ . If  $D_{n-1}=0$ , i.e. consecutive 0s as  $S_n=0$ , the result of path D will be selected. Otherwise, the result of path C (consecutive 1s) will be selected. Paths C and D are therefore for consecutive 1s and 0s, respectively. The tap in paths C and D needs not to be edge-emphasized due to the absence of data transitions.
- (ii)  $S_n=1$  (data transitions exist). Multiplexer 3 in this case selects the output from paths A or B. Whether the result of path A or that of path B is selected is determined by  $Y_n$ . Since an EE-tap expends two UIs, for a rising edge, there are only two possibilities, namely 010 and 011. If data are 010, path A will perform the desired equalization on both the rising and falling edges. If data are 011, although path A will perform equalization on a rising edge first and then on a falling edge, the result of the DFE performed on the falling edge will not be selected as path C will be selected in this case for the consecutive 1s. Similarly for a rising edge, there are only two possibilities as well, namely 101

or 100. If data are 101, path B will perform the desired equalization on the falling and rising edges. If data are 100, although path B will perform equalization on a falling edge first and then on a rising edge, the result of the DFE performed for the rising edge will not be selected as path D will be selected for the consecutive 0s in this case. One might ask since there is only one data transition, why DFE is performed on both a rising and a falling edges? The reason for performing DFE on two consecutive transition edges of opposite polarities is to prevent the drift of the voltage of the inputs of the slicers in one direction caused by loop-unrolling operation as the polarity of the unrolled taps remains unchanged.

Since the reference voltages of DT-DFE,  $V_{ref+}$  and  $V_{ref-}$ , are data independent and are set by the low-frequency components of data, they are the output voltages of the gain stage when consecutive 1s and 0s are transmitted in a preprocessing phase prior to data transmission. Two successive approximation register analog-to-digital converters (SAR ADCs) are used to capture  $V_{ref+}$  and  $V_{ref-}$ , as shown in Fig. 3.11. Since CTLE is differentially configured and transmitted data 1 and 0 are represented by the direction of the output current of the transmitter,  $v_{o+}$  and  $v_{o-}$  of the gain stage are  $V_{ref+}$ and  $V_{ref-}$ . Two SAR ADCs can therefore be employed to obtain  $V_{ref+}$  and  $V_{ref-}$  simultaneously. Each SAR ADC consists of a voltage comparator, a SAR, and a charge-redistribution binary-weighted capacitor-array digital-totime converter (DAC). To obtain  $V_{ref+}$ , the corresponding SAR ADC ramps up the output of the DAC such that  $V_{ref+} \approx v_{in+}$  where  $v_{in+}$  is the voltage of the non-inverting terminal of the comparator. The error between  $v_{in+}$ and  $V_{ref+}$  is set by the resolution of the DAC and the sensitivity of the comparator.  $V_{ref-}$  is obtained in a similar way by digitizing  $v_{in}$ .



Figure 3.11: Determination of reference voltages  $V_{ref+}$  and  $V_{ref-}$  using 8-bit SAR ADCs.

### **3.3.4** Raised reference voltages

Since  $V_{ref}^+$  and  $V_{ref}^-$  are data-independent, they can be purposely set higher /lower than the output voltage of the gain stage when consecutive 1s and 0s are conveyed to the channel so as to obtain a larger vertical eye-opening. Raised reference voltages  $\hat{V}_{ref+}$  and  $\hat{V}_{ref-}$  can be obtained by adjusting the output voltage of the gain stage when consecutive 1s and 0s are transmitted. If incoming data are a long train of consecutive 1s and 0s, the corresponding vertical eye-opening will be  $\hat{V}_{ref+}$  and  $\hat{V}_{ref-}$ , respectively. In reality, consecutive 1s or 0s are surrounded by data with transitions and the number of consecutive 1s and 0s is rather small. For example, if data proceeding consecutive 1s are data with transitions, these high-frequency data will be attenuated by channel impairments, resulting in a small vertical eye-opening. When the consecutive 1s arrive, eye-opening at the time of the arrival of the consecutive 1s will be smaller than desired  $\hat{V}_{ref+}$  and  $\hat{V}_{ref-}$ . Since the number of consecutive 1s is typically not large, the vertical eye-opening cannot reach  $\hat{V}_{ref+}$  and  $\hat{V}_{ref-}$  at the end of consecutive 1s and 0s. To increase the vertical eye-opening of low-frequency data, DFE can be employed to force the equalized data to approach the raised reference voltages by adding tap  $c_{DT+}$  to and subtracting tap  $c_{DT-}$  from the non-inverting and inverting output nodes of the gain stage. The value of the taps is determined by minimizing the difference between the raised reference voltages and the equalized data when data transition is absent using the following sign<sup>3</sup>-LMS algorithms

$$c_{DT+,n+1} = c_{DT+,n} - \Delta_{tap} \operatorname{sign}[e_n] \operatorname{sign}[D_{n-1}] \operatorname{sign}[\overline{S}_n], \qquad (3.3)$$

$$c_{DT-,n+1} = c_{DT-,n} - \Delta_{tap} \operatorname{sign}[e_n] \operatorname{sign}[\overline{D}_{n-1}] \operatorname{sign}[\overline{S}_n].$$
(3.4)

Fig. 3.12 shows the implementation of (3.3) and (3.4). The slicers determine the sign of the difference between the raised reference voltages and the outputs of the gain stage. Two DFFs following the slicers are used for synchronization. Enabling signals  $\text{EN}_C = D_{n-1}\overline{S}_n$  and  $\text{EN}_D = D_{n-1}\overline{S}_n$  activate the charge pumps while the polarity of the errors  $\text{sign}[e_{C,n-1}]$  and  $\text{sign}[e_{D,n-1}]$ determine whether the charge pumps should source a current to or sink a current from the load. Resistor R in series with  $C_1$  is used to smooth the tuning of the tap while  $C_2 \ll C_1$  is used to filter out high-frequency disturbances at the output of the charge pumps.

The raised amount on the reference voltages should not exceed the received data vertical eye opening as it will cause divergence of tap adaptation. Consider a high  $\hat{V}_{ref+}$  or a large  $|\hat{V}_{ref+} - V_{ref+}|$  as an example. When consecutive 1s are present in data, the polarity of the error signal  $e_C$  in Fig. 3.12 will be 1 all the time, forcing  $c_{DT+}$  to rise continuously. Since path C is selected in this case and  $c_{DT+}$  is added to incoming data as a current injected to the non-inverting output node of the gain stage, the voltage of the node



Figure 3.12: Circuit that implements S<sup>3</sup>-LMS algorithms to obtain  $c_{DT+}$  and  $c_{DT-}$ .

will rise continuously. It will eventually split the two differential inputs of the the downstream slicer and cause erroneous decisions. A similar analysis can be done for a low  $\hat{V}_{ref+}$  as well.

### 3.3.5 Determination of EE-tap

Since the result of path A or that of B will be chosen when data are 010101... and 101010..., respectively, the EE-tap of the DFE in paths A and B is determined by transmitting data of two consecutive transitions of opposite directions and obtained using the following sign<sup>4</sup>-LMS algorithm that minimizes the power of the difference between the desired and equalized data measured at the center of data eyes

$$c_{EE,n+1} = c_{EE+,n}$$
  
- $\Delta_{tap} \operatorname{sign}[e_{A,n}] \operatorname{sign}[D_{n-1}] \operatorname{sign}[S_n S_{n-1}] \operatorname{sign}[Y_{n-1}]$   
- $\Delta_{tap} \operatorname{sign}[e_{B,n}] \operatorname{sign}[D_{n-1}] \operatorname{sign}[S_n S_{n-1}] \operatorname{sign}[\overline{Y}_{n-1}].$  (3.5)

Since the EE-tap  $c_{EE}$  is made of three taps per data transition with their strength set by currents  $I_1$ ,  $I_2$ , and  $I_3$ , two options are at our disposal to

construct the EE-tap : (i) Option 1 : Three LMS algorithms, each for one of the three taps. In this case, three DFE engines are needed, each with its own error signal measured at appropriate time instants. (ii) Option 2 : One LMS algorithm with a set of fixed weighting factors for the three taps and the error signal measured at the center of data eyes. In this case, only one DFE engine is needed. One difficulty encountered in deploying option 1 is that the strength of taps 1, 2, and 3 is given by  $I_1 + I_2 + I_3$ ,  $I_2 + I_3$ , and  $I_3$ , respectively. Adjusting  $I_3$  affects all three taps and adjusting  $I_2$  affects taps 2 and 3. The base for option 2 is that since the vertical eye-opening of equalized data typically follow a sinusoidal profile, the desired weight of each tap can be pre-assigned. Advantages of this approach are both reduced cost and freedom in assigning an appropriate weight for each tap. In this work, option 2 was chosen.

Fig. 3.13 shows the circuit that implements (3.5). Since paths A and B use the same tap, the adaptation of  $c_{EE}$  can be utilized using either  $v_{o,A}$  and  $v_{o,B}$ or one of them. In this study, both of the signals are used for updating  $c_{EE}$  to so as achieve fast convergence. Both of  $v_{o,A}$  and  $v_{o,B}$  weights the same except signals  $Y_{n-1}$  and  $\overline{Y}_{n-1}$  are used to separate the adaption processes. Let us use  $v_{o,A}$  as an example. Assume  $v_{o,A}^+=1$ . Since  $c_{EE}$  is updated when two consecutive data transitions of opposite directions are present, i.e.  $S_nS_{n-1}=1$ , we have  $D_{n-1}=0$ . In order to select charge pump  $\operatorname{CP}_{EE,A+}$ ,  $D_{n-1}$  should be used to select  $\operatorname{CP}_{EE,A+}$  and de-select  $\operatorname{CP}_{EE,A-}$ , i.e.  $\operatorname{EN}_{EE,A+} = S_nS_{n-1}D_{n-1}$ . Similarly if  $v_{o,A}^+=0$ ,  $D_{n-1}=0$  will follow.  $\overline{D}_{n-1}$  should be used to select  $\operatorname{CP}_{EE,A-}$ and de-select  $\operatorname{CP}_{EE,A+}$ . As a result,  $\operatorname{EN}_{EE,A-} = S_nS_{n-1}\overline{D}_{n-1}$ . Since  $c_{EE}$  is not only used for both paths A and B, it is also used for both "1" and "0", charge pumps  $\operatorname{CP}_{EE,A+}$  and  $\operatorname{CP}_{EE,A-}$  should drive the same low-pass filter whose output voltage is  $c_{EE}$ , as shown in Fig. 3.13. For the same reason, charge pumps  $\operatorname{CP}_{EE,B+}$  and  $\operatorname{CP}_{EE,B-}$  drive the same low-pass filter.



Figure 3.13: Circuit that implements S<sup>4</sup>-LMS algorithms to obtain  $c_{EE}$ .

# **3.4** Clock and Data Recovery

Equalized data are used to recover both clock and data. Two approaches, namely phase-tracking and phase-picking, are at our disposal for clock and data recovery (CDR) [44, 45]. The former uses a phase-locked loop that locks to the edges of equalized data to synchronize a locally generated clock with the incoming data whereas the latter allocates the edges of equalized data via oversampling. Phase-tracking offers the advantage of resilience to the impact of PVT uncertainty but suffers from a large latency, and is not particularly attractive when data rate is high. Phase-picking, on the other hand, features



Figure 3.14: Top-left : Edge generator. Top-right : Edge accumulator. Bottom : Phase mapping between the phase of incoming data and that of EE-tap gating signals. Legends :  $0^{\circ}$ ,  $45^{\circ}$ ,... : The phase of incoming data.  $\hat{0}^{\circ}$ ,  $\hat{45}^{\circ}$ ,... : The phase of EE-tap gating signals.

rapid edge allocation attributive to its open-loop operations but suffers from a finite phase error and is sensitive to the impact of PVT uncertainty. In this study, blind oversampling phase-picking CDR approach similar to that given in [46] is used. Each data is sampled 4 times with a phase resolution of 45°. The edge generator with its implementation shown in Fig. 3.14 locates the edges of data. In order to make to selector less sensitive to erroneous decisions of the edge generator caused by disturbances coupled into the samplers, edge accumulators implemented using counters and functioning as low-pass filters are employed. Since CDR allocates the rising edge of the data while gating signals in Fig. 3.5 have a specific phase arrangement, as shown in the figure, a phase selector that selects appropriate outputs of the digitally controlled oscillator (DCO) as the gating signals of the EE-tap is needed. Once the edge of data is located, the clock is recovered. Clock signal that is  $90^{\circ}$  away from the allocated edge will be used to recover data. In this study, edge information will be updated only when two consecutive allocations of the edge are identical (count=2). Such an arrangement is important as it minimizes the impact of any erroneous decision of the samplers.

## 3.5 Simulation Results

The proposed DT-DFE with EE-taps was designed in a IBM 65 nm 1.2 V CMOS technology and analyzed using Spectre with BSIM 4.4 device models. The channel under test is a backplane channel that has a loss of approximately -24 dB at 5 GHz [42]. Fig. 3.15 shows the time and frequency responses of the channel.



Figure 3.15: Time and frequency responses of the backplane channel. The input for the time-domain response is a 200 mV pulse of 100 ps pulse width measured at the near end of the channel.

### 3.5.1 5 Gbps Serial Link

To verify the effectiveness of the proposed DT-DFE with EE-taps, a 5 Gbps link with a backplane channel whose characteristics are depicted in Fig. 3.15 is equalized with a DT-DFE with an EE-tap. The structure of the EE-tap is shown in Fig. 3.16. The EE-tap is made of 3 taps before the data transition, 3 taps after the data transition, and a large tap at the data transition. Note

the polarity of the taps. A large tap is needed at the edge of data eyes in order to generate sharp transitions in the equalized voltages  $v_e^+$  and  $v_e^-$ . No tap is placed at the center of data eyes purposely as in this example we want to explore the impact of the EE-tap and the consequence of the absence of taps at the center of data eyes. The duration of the taps of the EE-tap is set to  $T_{sym}/8$  and the strength of the taps are set manually with a sinusoidal profile for reasons explained earlier. The gating signals are from ideal clocks. No attempt was made to raise the reference voltages. For the purpose of comparison, DS-DFE and DT-DFE with a constant tap are also used to equalize the channel.



Figure 3.16: DT-DFE with an EE-tap. The EE-tap consists of 7 taps of different strength.

The simulated results are given in Fig. 3.17. It is seen in Fig. 3.17(d,e,f) that the tap of DS-DFE is activated in every data eye while that of DT-DFE is only activated when a data transition is present. Both taps are constant over their respective time interval. The duration of the DS-DFE is set by



Figure 3.17: Comparison of DS-DFE, DT-DFE with a constant tap, and DT-DFE with an EE-tap. The light-color traces in (a,b,c) are unequalized data.

the number of consecutive 1s and 0s while that of the DT-DFE is only one symbol time. The unequalized and equalized data with DS-DFE are compared in Fig. 3.17(c). It is seen that DS-DFE improves vertical eye-opening of high-frequency data at the expense of reduced vertical eye-opening of lowfrequency data. The improvement in transition slopes is rather marginal. The unequalized and equalized data with DT-DFE of a constant tap are compared in Fig. 3.17(b). It is seen that since DT-DFE is activated when a transition is present in data (high-frequency data), the vertical eye-opening is comparable to that with DS-DFE. No reduction in the vertical eye-opening of low-frequency data is observed. Both DS-DFE and DT-DFE with a constant tap marginally improve transition slopes for reason detailed earlier. The unequalized and equalized data with DT-DFE of the EE-tap are compared in Fig. 3.17(a). It is seen that DT-DFE with the EE-tap sharply improves transition slope. It also improves the vertical eye-opening of high-frequency



Figure 3.18: Eye diagrams of unequalized and equalized data with DS-DFE, DT-DFE with a constant tap, and DT-DFE with an EE-tap.

data without sacrificing the vertical eye-opening of low-frequency data.

Table 3.1: Performance comparison of DFEs with different Settings.

|                 | CTLE | DS-DFE         | DT-DFE         | DT-DFE           |
|-----------------|------|----------------|----------------|------------------|
|                 |      | (Constant tap) | (Constant tap) | (EE-tap)         |
| V-opening [mV]  | 120  | 182 (+51.7%)   | 241 (+32.4%)   | 193 (-19.9%)     |
| H-opening [ps]  | 158  | 171 (+8.2%)    | 176 (+2.9%)    | $191 \ (+8.5\%)$ |
| Jitter [ps]     | 42.5 | 28.3 (-33.4%)  | 23.6 (-16.6%)  | 9.58 (-59.4%)    |
| Slope $[mV/ps]$ | 1.08 | 1.36 (+25.9%)  | 1.78 (+30.9%)  | 4.00 (+124.7%)   |

Fig. 3.18 compares the eye diagram of equalized data with DS-DFE, DT-DFE of a constant tap, and DT-DFE of an EE-tap. A quantitative comparison of them is provided in Tables 3.1 and 3.2. It is seen from Table 3.1 that DT-DFE with the EE-tap improves horizontal eye opening (H-opening)

|                 | CTLE | DS-DFE                        | DT-DFE         | DT-DFE   |
|-----------------|------|-------------------------------|----------------|----------|
|                 |      | $({\rm Constant}\ {\rm tap})$ | (Constant tap) | (EE-tap) |
| V-opening [mV]  | 1    | 1.52                          | 2.01           | 1.61     |
| H-opening [ps]  | 1    | 1.08                          | 1.11           | 1.21     |
| Jitter [ps]     | 1    | 0.67                          | 0.56           | 0.26     |
| Slope $[mV/ps]$ | 1    | 1.26                          | 1.65           | 3.70     |

Table 3.2: Performance comparison normalized to CTLE output.

by 8.5% over DT-DFE with the constant tap. The vertical eye opening is reduced by 19.9%, due to the absence of taps at the center of data eyes. This observation reveals the importance of having taps at the center of data eyes in order to improve vertical eye-opening. DT-DFE with the EE-tap improves edge slope by 124.7%. Table 3.2 tabulates performance improvement obtained from DT-DFE normalized to that of CTLE. It is seen that the transition slope of the equalized data by the proposed DT-DFE with the EE-tap 3.7 times that of the CTLE. The data jitter of the equalized data by the proposed DT-DFE with the EE-tap is 0.26 times that of the CTLE.

#### 3.5.2 10 Gbps Serial Link with 4x-oversampling CDR

The second example is a 10 Gbps serial link with the same channel and 4xoversampling CDR. The channel is equalized using the proposed DT-DFE with the EE-tap. The structure of the EE-tap was detailed in Figs. 3.2 and 3.3 and operation depicted in Fig. 3.8.

Fig. 3.19 (top-left) plots the eye diagram of unequalized data. Fig. 3.19 (top-right) shows the eye diagram of equalized data with no raised references. Fig. 3.19 (bottom-left) shows the eye diagram of equalized data with 20 mV raised reference voltages and phase selection in CDR switched off. Fig. 3.19 (bottom-right) shows the eye diagram of equalized data with raised 20 mV reference voltages and phase selection in CDR switched on. It is seen that the low-frequency components of the eye-diagrams of Fig. 3.19 (bottom) is 20 mV above and below those of Fig. 3.19 (top-right).



Figure 3.19: Eye diagrams of unequalized data and equalized data with 3 different settings.

A quantitative comparison of the eye diagrams is provided in Tables 3.3 and 3.4. It is seen in Table 3.3 that DT-DFE with the EE-tap improves vertical eye-opening by 146% without raising the reference voltages. When  $V_{ref+}$  and  $V_{ref-}$  are increased and decreased by 20 mV, respectively, vertical eye-opening is further increased by 60%. Also observed is that DT-DFE with the EE-tap improves data jitter by 59% without raising the reference voltages. When  $V_{ref+}$  and  $V_{ref-}$  are increased and decreased by 20 mV, respectively without phase-picking, data jitter is further improved by 13.7%. Once phase picking is performed, jitter is further improved by 42%. Table 3.4 tabulates performance improvement obtained from DT-DFE with the EE-tap normalized to CTLE. It is seen that with 20 mV increase/decrease of the reference voltages, vertical eye-opening is increased by 3.9 times, data jitters is reduced by 4.76 times, and the slope of transition edges is improved by 2.83 times.

|                 | CTLE | DT-DFE             | DT-DFE                   | DT-DFE                   |
|-----------------|------|--------------------|--------------------------|--------------------------|
|                 |      | (EE-tap)           | (EE-tap, $20 \text{ mV}$ | (EE-tap, $20 \text{ mV}$ |
|                 |      |                    | raised refs, w/o         | raised refs., w/         |
|                 |      |                    | phase-picking)           | phase-picking)           |
| V-opening [mV]  | 42.3 | 104 (+146%)        | 166 (+60%)               | 165 (0%)                 |
| Jitter [ps]     | 24.8 | 10.2 (-59%)        | 8.8 (-13.7%)             | 5.1 (-42%)               |
| Slope $[mV/ps]$ | 0.53 | $1.04 \ (+96.2\%)$ | 1.38 (+32.7%)            | 1.50 (+8.7%)             |

Table 3.3: Performance comparison of DT-DFE with EE-tap.

Table 3.4: Performance comparison normalized to CTLE output.

| -               | CTLE | DT-DFE | DT-DFE           | DT-DFE                      |
|-----------------|------|--------|------------------|-----------------------------|
|                 | OTLL |        | (EE-tap, 20  mV) |                             |
|                 |      |        | raised refs, w/o | raised refs., $\mathbf{w}/$ |
|                 |      |        | phase-picking)   | phase-picking)              |
| V-opening [mV]  | 1    | 1.46   | 3.92             | 3.90                        |
| Jitter [ps]     | 1    | 0.41   | 0.35             | 0.21                        |
| Slope $[mV/ps]$ | 1    | 1.04   | 2.60             | 2.83                        |



Figure 3.20: Output voltage of the charge pumps tuning  $c_{DT+}$ ,  $c_{DT-}$ , and  $c_{EE}$ . Top : Adaptation without raised references. Bottom : Adaptation with 20mV raised references.

Figs. 3.20 shows the adaptation process of  $c_{DT+}$ ,  $c_{DT-}$ , and  $c_{EE}$ , which are the output voltages of the charge pumps that tune  $c_{DT+}$ ,  $c_{DT-}$ , and  $c_{EE}$ , respectively. It is seen that adaptation completes in approximately 10 ns.

Fig. 3.21 plots the waveform of the transmitted data that are the inputs of



Figure 3.21: Waveforms of transmitted, received, and equalized data.

the transmitter and that of the output of the CTLE. The impact of channel impairment is evident. The waveforms of equalized data in paths A, B, C, and D are also plotted. As mentioned earlier that paths A, B, C, and D operate on each incoming data and the one that is selected after equalization operation is chosen by  $X_n$ ,  $Y_n$ , and  $S_n$ . In order to construct the eye-diagram of equalized data, the equalized data of paths A, B, C, and D and their selection are processed in Matlab and the result is plotted at the bottom of



Figure 3.22: Waveforms of transmitted, received, and recovered data.

Fig. 3.21. It is seen that the logic selects the correct equalized data.

Fig. 3.22 plots the waveform of recovered data along with transmitted and equalized data for references. Note that a delay exists between the equalized data and recovered data simply due to the propagation delay of the logic gates for CDR. It is seen that the decision selection logic selects the correct decision.

### 3.6 Summary

An adaptive DT-DFE with edge-emphasis taps and raised references was proposed. The distinct characteristics and common drawbacks of DS-DFE and DT-DFE were investigated. It was followed with an in-depth investigation of DT-DFE with EE-taps capable of improving both transition edge slope and the vertical eye-opening of equalized data simultaneously. Loop-unrolling normally used to combat the timing constraints of data-state DFE was further developed to both obtain data transition indicator and execute DT-DFE with EE-taps. The data -independent characteristics of the taps of DT-DFE reveals that the vertical eye-opening of equalized data with DT-DFE is set by the low-frequency components rather than high-frequency components of data. This distinct characteristic of DT-DFE allows the raise of the reference voltages with respect to which the error signals of DFE are defined to values beyond those set by the low-frequency components of data. As a result, the vertical eye-opening of equalized data is increased. Clock and data recovery was performed using blind 4x oversampling and phase-picking. The effectiveness of the proposed DT-DFE with EE-taps was validated using the simulation results of a 10 Gbps backplane serial link with a channel loss of 24 dB at 5 GHz. The link was designed in a TSMC 65 nm 1.2 V CMOS technology. Simulation results showed that with 20 mV raised reference voltages. DT-DFE with EE-tap increases the vertical eye-opening of unequalized data by 3.9 times, lowers data jitter by 4.86 times, and increases transition slope by 2.83 times.

# Chapter 4

# All-Digital Frequency-Locking using Bi-Directional Gated Delay Line Frequency Difference-to-Digital Converters

Frequency-locking plays a critical role in clock and data recovery (CDR) of serial data links. As data received at the receiver end are not periodic but rather random, to ensure phase-locking, the receiver oscillator needs to oscillate at the baud-rate frequency of the incoming data, achieved via a frequency-locked loop (FLL) locked to a reference clock at baud-rate frequency [47]. In this chapter, two low-power all-digital frequency-difference to digital converters suitable for clock and data recovery of serial links are presented. In this work, the proposed ADFLLs are used for providing the frequency locked gating signals for the EE-DT-DFE introduced in preceding Chapter. Section 4.1 briefly reviews techniques for frequency difference detection. Section 4.2 introduces a pre-skewed bi-directional gated delay line (BDGDL) bang-bang frequency difference-to-digital converter. The effec-

tiveness of the proposed bang-bang frequency difference detector is validated using the simulation results of a 5-GHz frequency-locked loop. Section 4.3 introduces a BDGDL-based integrating frequency difference-to-digital converter (iFDDC). The iFDDC can detect and digitize the frequency difference between two GHz signals at the same time with an ultra low of power consumption. The effectiveness of the proposed iFDDC is validated using the simulation results of a 5-GHz frequency-locked loop. The chapter is summarized in Section 4.4.

### 4.1 Frequency Difference Detection

The difference between the frequencies of two square waves can be detected using a dual-DFF phase/frequency detector. These frequency difference detectors, however, cannot operate at high frequencies due to the excessive latency of the feedback reset operation. The dual frequency-to-voltage conversion scheme proposed in [48] initially and improved in [49] achieves frequency difference detection by first performing frequency-to-voltage conversion on both the oscillator whose frequency is to be adjusted and the reference clock whose frequency is constant. The difference between the output of the frequency-to-voltage-converters (FVCs) is then sensed by a high-gain error amplifier whose output adjusts the frequency of the oscillator. Although robust and effective, this approach suffers from a steady state frequency error arising from mismatch between the frequency-to-voltage converters (FVCs) and the offset voltage of the error amplifier. In addition, the finite bandwidth of the error amplifier makes it not particularly suitable for high-speed applications. Counter-based digital frequency detectors are widely popular in all-digital FLLs due to their ease and reliable operations [50]. These frequency detectors, however, suffer from a intrinsic speed bottleneck set by the counters. High-speed frequency difference detectors, despite their criticalness to Gb/s serial links, are not available.

# 4.2 BDGDL Bang-Bang Frequency Difference Detector

#### 4.2.1 BDGDL

The BDGDL is shown in Fig. 4.1 [51]. Digital signals will propagate rightwards if gating signal  $T_{in} > 0$  or leftwards otherwise. The left-most node is connected to  $V_{DD}$ . The charge of the load capacitor at the end of kth sampling cycle is retained and carried over in its entirety to the next sampling cycle thereby allowing gating pulses to be accumulated via charge accumulation. The lower bound of the gating signal is the minimum amount of the time from the time instant at which a gating signal is applied to a gated transistor to the time instant at which the channel current of the transistor starts to flow. Simulation results show that the minimum gating signal is in the range a few picoseconds and therefore not of a concern for most applications.



Figure 4.1: Left: bi-directional gated delay cell (BDGDC). Right: bi-directional gated delay line. Legends : RS - Right-shift. LS - left-shift.

#### 4.2.2 Pre-Skewed BDGDL

The per-stage-delay of the BDGDL is lower-bound by technology. FO1 and FO4 of a typical 65 nm 1 V CMOS technology are 9 ps and 16 ps, respectively.

To lower per-stage-delay, pre-skewing effective in increasing the frequency of ring oscillators by pre-charging/pre-discharging the load capacitor of the current stage using the outputs of earlier stages, is used, as shown in Fig. 4.2. [52]. Fig. 4.3 shows the dependence of per-stage-delay on the number of preskewed inverters per stage. It is seen that per-stage-delay reduction fades away with the increase in the number of pre-skewed inverters nonlinearly. The power consumption, on the other hand, levels up with the number of preskewed inverters linearly. In this work, one pre-skewed inverter per inverter as shown in Fig. 4.2 is used so as to strike a balance between speed and power/silicon consumption



Figure 4.2: Left: pre-skewing in a delay line. Right: pre-skewed BDGDL. Pre-skewed inverters are highlighted.

#### 4.2.3 Frequency Detection

Fig. 4.4 shows the simplified schematic of the proposed bang-bang frequency detector. Two identical M-stage BDGDLs are used to digitize a logic-1 DCO pulse and a logic-1 reference pulse with the pulses to be digitized as the gating signals of the BDGDLs. The resultant digital outputs  $P_M...P_1$  and  $R_M...R_1$  are the digital representations of the respective pulse widths. With a 50% duty cycle, they also represents the frequency of the DCO and that



Figure 4.3: Dependence of per-stage-delay (left) and power (right) on the number of pre-skewed inverters.



Figure 4.4: Schematic of BDGDL frequency detector.

of the reference.  $P_M...P_1$  and  $R_M...R_1$  are stored by resettable positive-edgetriggered DFFs ( $Q_M...Q_1$  and  $S_M...S_1$ ). The BDGDLs perform right-shift when their gating signal is 1 and left-shift when it is 0. Since the DFFs are positive-edge-triggered, their output will be set only when the output of the gated stage goes from 0 to 1. It will remain unchanged when the output of the gated stage goes from 1 to 0. As a result, the left-shift operation when the gating signal is 0 does not affect the output of the DFFs. Rather, it resets the BDGDLs thereby preparing them for the next measurement cycle, as shown



Figure 4.5: Timing diagram of BDGDL frequency detector.

in Fig. 4.5. The DFFs are reset only after SAR iterations are completed. The reset command is therefore the same as the triggering signal of the SAR, which is synchronized with the reference clock. The left-shifting strength of the BDGDL is stronger compared to right-shifting one, to guarantee all signals on the delay lined are cleared during the negative pulses of REF and DCO. The duty cycle of both the DCO and that of the reference must be 50%. The frequency range of DCO is given by  $0.5f_{REF} < f_{DCO} < 2f_{REF}$ . Therefore, a complete pulse of DCO can be guaranteed in 3 cycles of the reference clock. With an additional clock cycle for DFFs and logic operations, 4 cycles of the reference clock are needed to complete frequency detection. SAR logic needs not to be faster than  $f_{REF}/4$ .

Let the per-stage-delay of the BDGDLs be  $\tau$ . The output of the BDGDLs which is the digitized version of the pulse width of the DCO and that of the reference clock can be expressed as  $\lfloor \frac{0.5T_{DCO}}{\tau} \rfloor$  and  $\lfloor \frac{0.5T_{REF}}{\tau} \rfloor$ , respectively where  $\lfloor \cdot \rfloor$  denote floor function. We therefore have

$$\sum_{i=1}^{M} Q_i = \left\lfloor \frac{1}{2\tau f_{REF}} \right\rfloor,\tag{4.1}$$

$$\sum_{i=1}^{M} S_i = \left\lfloor \frac{1}{2\tau f_{DCO}} \right\rfloor.$$
(4.2)

Also,  $M > \left\lfloor \frac{1}{2\tau f_{REF}} \right\rfloor$  is required if  $T_{DCO} > T_{REF}$ .

| Q <sub>i</sub> | S <sub>i</sub> | C <sub>i</sub> | Freq. comparison |
|----------------|----------------|----------------|------------------|
| 1              | 1              | 1              | Same frequency   |
| 1              | 0              | 0              | DCO faster       |
| 0              | 1              | 1              | DCO slower       |
| 0              | 0              | 1              | Same frequency   |

Figure 4.6: Frequency relationship implied by per-stage pulse width detection results.

Fig. 4.6 shows the truth table for single bit comparison. It can be derived that  $C_i = \overline{Q_i \cdot \overline{S_i}}$ . Any 0-bit in  $C_M \dots C_1$  indicates that the pulse width of the DCO is smaller. In this case, COMP=0 and SAR assumes 0 for the bit of current iteration and the DCO slows down. If the DCO is slower or equal to the reference clock, the pulse width of the DCO will be greater or equal to the that of the reference clock. All the bits of  $C_i$  will be 1 and COMP=1. SAR in this case will assume 1 for the bit of current iteration and the DCO speeds up.

Fig. 4.7 shows the simplified schematic of the redundant SAR logic used in this work. Fig. 4.8 shows the schematic of the 4-stage DCO and the digital controlled varactor (DCV). A simple driving-strength coarse-controlled DCV is used since this work is focusing on frequency detection and locking rather than DCO.

Let  $T_{COMP}$  be the time needed to complete frequency detection and adjustment in each iteration. Since 4 cycles of the reference clock are needed



Figure 4.7: schematic of a K-bit redundant successive approximation register.



Figure 4.8: schematic of the DCO and varactor.

for frequency detection,  $T_{COMP}$  has the minimum value of  $4/f_{REF}$ . The minimum lock time is therefore given by

$$K \cdot T_{COMP} = 4KT_{REF}.$$
(4.3)

The actual lock time is larger than  $4KT_{REF}$  due to the time required by the SAR logic and the response time of the DCO.

#### 4.2.4 Frequency Error

Fig. 4.9 shows the block diagram of the ADFLL. Bang-bang frequency detection is realized using two BDGDLs and generic logic. The output of the bang-bang frequency detector is fed to a up/down successive approximation register (SAR) whose output is the frequency control word (FCW) of the digitally controlled oscillator (DCO). For a K-bit DCO, a total of K SAR operations are performed to adjust the FCW of the DCO.



Figure 4.9: Block diagram of proposed FLL.

A frequency error  $f_E = f_{LOCK} - f_{REF}$  exists at the end of SAR iterations. This frequency error arises from (i) the non-zero per-stage-delay of the BDGDLs which generates quantization error of the BDGDLs, and (ii) the non-zero frequency tuning step of the DCO. If the frequency tuning range of the DCO is given by  $\Delta f$ , with K-bit FCW, the quantization error of the DCO is given by  $\frac{\Delta f}{2K}$  assuming  $2^K + 1$  points or  $2^K$  segments are included in the tuning range. The upper and lower bounds of the frequency error caused by the non-zero per-stage-delay of the BDGDLs, denote by  $f_e^+$  and  $f_e^-$ , respectively, satisfy

$$f_{REF} + f_e^+ = \frac{1}{2\left(\frac{1}{2f_{REF}} - \tau\right)}$$
(4.4)

$$f_{REF} - f_e^{-} = \frac{1}{2\left(\frac{1}{2f_{REF}} + \tau\right)}$$
(4.5)

from which we obtain

$$f_e^+ = \frac{2\tau f_{REF}}{1 - 2\tau f_{REF}} f_{REF},$$
(4.6)

$$f_{e}^{-} = \frac{2\tau f_{REF}}{1 + 2\tau f_{REF}} f_{REF}.$$
(4.7)



Figure 4.10: Frequency detector dead zone reflects on FD and DCO transfer characteristics, and the maximum locking error under different circumstances.

The frequency detector thus has a dead-zone from  $+f_e^+$  to  $-f_e^-$  around every DCO tuning point  $f_{DCO}$ , as shown in Fig. 4.10. Frequency comparison can not be correctly executed if  $f_{REF}$  falls inside the dead-zone. Detector will provide wrong decisions to SAR logic and DCO is thereby driven towards the wrong direction. SAR logic will therefore be working on correcting the mistake during every iteration since then. DCO will be locked to the closest adjacent tuning point to the one where error is generated. Fig. 4.10 shows the maximum error under three different circumstances. For all the conditions, the maximum frequency error can be quantized identically, as

$$f_E \le \frac{\Delta f}{2^K} + f_e^+ \tag{4.8}$$

For the FD design in this work. Since COMP=1 for all the conditions except DCO is detected as slower compared to REF. The maximum error appears when DCO falls inside the dead-zone but is actually slower than REF. The value is reduced to

$$f_E \le \frac{\Delta f}{2^K} + f_e^-,\tag{4.9}$$

as  $+f_e^+ > -f_e^-$ . It shows the maximum frequency error is the summation of quantization errors from FD and DCO. This result can be used for guiding proposed FLL design. For instance, if the minimum per-stage delay is limited for a specific technology, DCO quantization error can be reduced to much less than FD quantization error, to minimize the overall frequency error, as  $\frac{\Delta f}{2^K} \ll f_e^-$  or  $\frac{\Delta f}{2^K} \leq \frac{1}{10} \cdot f_e^-$ . In this case, we obtain

$$K \ge \log_2 \frac{10\Delta f(1 + 2\tau f_{REF})}{2\tau f_{REF}^2},$$
 (4.10)

which shows the minimum DCO bit number to minimize frequency error when  $\tau$  is given.

#### 4.2.5 Simulation Results

The ADFLL was designed in TSMC's 65 nm 1.2 V CMOS technology and analyzed using Spectre with BSIM4 device models. The ADFLL was designed for 10 Gbps data rate with  $f_{REF}$ =5 GHz (baud-rate frequency). Pre-skewed BDGDLs have per-stage-delay  $\tau$ =10 ps. Therefore, M=15 stages are used to cover the complete logic-1 pulse of the reference. DCO tuning range is give by 3.95~6.44 GHz. According to (4.10), the minimum number of bits of DCO is calculated as  $K \geq 5.78$ , therefore 6-bit DCO was implemented. Frequency quantization error of DCO is 38.9MHz which is 0.78% of  $f_{REF}$ . The FLL is directly locked to 5 GHz. 8 clock cycles are allocated for each SAR iteration instead of 4 to ensure that FD will make a correct decision. The lock time is therefore given by  $8 \times K = 48$  cycles or 9.6 ns. The frequency error is thereby dominated by the quantization error of the BDGDLs. The maximum error is given by  $f_e^-=455$  MHz, which is 9.1% of  $f_{REF}$ .



Figure 4.11: Simulation results of frequency locking process.



Figure 4.12: Locked frequency with intentionally added phase shift.

| Block  | Power Consumption | Share  |
|--------|-------------------|--------|
| FD     | 6.12 mW           | 6.88%  |
| SAR    | 130.2 uW          | 0.15%  |
| DCO    | 78.79 mW          | 88.63% |
| Buffer | 3.86 mW           | 4.34%  |

Figure 4.13: Breakdown of power consumption by key blocks.

Simulation results are shown in Figs. 4.11 and 4.12. To investigate whether the FLL is sensitive to phase or not, a phase shift in the range from 0° to 360° is introduced. Since the unit interval (UI) of a 10 Gbps data stream is 100 ps, 20 different points are tested with 10 ps step size to cover 2UI. Fig. 4.11 shows the frequency locking process of all 20 samples overlapping. Frequencies are locked to 3 different values due to different mistakes made during the SAR iteration. Fig. 4.12 shows the locked frequencies of these 20 samples. It is seen that only one situation produces a 3.2% frequency error. All other phase conditions provide frequency error less than 1.6%. All frequency locking completes in 10 ns. The frequency error is within the maximum error calculated above. Fig. 4.13 shows the breakdown of the power consumption of implemented FLL by key blocks among which DCO consumed most of the power due to the high speed.

# 4.3 Integrating Frequency Difference-to-Digital Converter

#### 4.3.1 iFDDC

The detection of the frequency difference between a reference REF and a DCO is obtained by extracting a logic-1 pulse of REF and that of DCO and comparing their width. To ensure an one-to-one mapping between the pulse width of logic-1 pulse and frequency, the duty cycle of REF and that of DCO

need to be 50%. If the duty cycle of REF or DCO is not 50%, a 50% dutycycle signal can be generated from REF or DCO using DFFs and generic logic gates. Consider Fig. 4.14. Since  $\Delta T = T_{REF} - T_{DCO} = 2 (\Delta t_1 - \Delta t_3)$ , we have

$$\Delta f = \frac{1}{T_{REF}} - \frac{1}{T_{DCO}} = -\frac{T_{REF} - T_{DCO}}{T_{REF}T_{DCO}}.$$
(4.11)

If the difference between  $T_{REF}$  and  $T_{DCO}$  is small, which is usually the case in practice,  $T_{REF}T_{DCO} \approx T_{REF}^2$  will hold. As a result, Eq.(4.11) will become

$$\Delta f \approx K_{TFV} \left( \Delta t_1 - \Delta t_3 \right). \tag{4.12}$$

where  $K_{TFV} = -2/T_{REF}^2$  is the time-to-frequency gain<sup>1</sup>. Eq.(4.12) shows that frequency difference between REF and DCO can be obtained by measuring  $\Delta t_1 - \Delta t_3$ .



Figure 4.14: Timing diagram of REF and DCO.

The schematic of the pulse extractor that exacts a logic-1 pulse of REF and that of DCO is shown in Fig. 4.15 with its timing diagram shown in Fig. 4.16. The rising edge and falling edge of REF and DCO are sensed as using rising-edge-triggered DFFs and falling-edge-triggered DFFs, respectively. Two additional falling-edge-triggered DFFs are used to capture the

<sup>&</sup>lt;sup>1</sup>If  $\Delta T$  is not sufficiently small,  $\Delta f \approx K_{TFV} \left(1 - \frac{\Delta T}{T_{REF}}\right) (\Delta t_1 - \Delta t_3).$ 

latest falling edges of REF and DCO to enable the read-out signal. The readout signal is delayed by  $\tau_{Clear}$  to generate a clear signal to reset the DFFs so as to prepare for the next measurement cycle. Both read and clear signals have pulse width  $\tau_{Clear}$  which can be used for controlling how frequently the extractor sensing the pulse width of input signal.



Figure 4.15: Schematic of pulse width extractor.

Right-shift (RS) and left-shift (LS) signals generated by the pulse width extractor are used to gate the BDGDL with its schematic shown in Fig. 4.17, together with the read-out circuit that reads the logic stage of the gated stages. The BDGDL has 32 identical stages, corresponding to the FCW (frequency control word) of the DCO. Each gated stage consists of a RS cell and a LS cell that are identical. The arrangement of the gating signals of the RS cell and LS cell differs from that in [53]. Such an arrangement is important as it ensures that when both RS and LS are present, i.e. RS=1/LS=1, the gated stage will be disabled, thereby retaining its logic state and the voltage of its load capacitor. When RS=1/LS=0, the load capacitor will be discharged via the NMOS transistor gated by RS and "1" in the BDGDL will move leftwards. Similarly when RS=0/LS=1, the load capacitor will be charged via the PMOS transistor gated by LS and "1" in the BDGDL will



Figure 4.16: Timing diagram of gating signal generation and FCW adaptation.

move rightwards. It is evident from the preceding analysis that the BDGDL only responds to the non-overlapping parts of REF and DCO in Fig. 4.16.



Figure 4.17: Bi-directional gated delay line with read-out DFFs.

It can be seen from Fig. 4.16 that the width of LS is the amount of the time that the rising edge of REF leads that of DCO while that of RS is the amount of the time that the falling edge of REF leads that of DCO. If  $T_{REF} = T_{DCO}[k], \Delta t_1[k] = \Delta t_3[k]$  will follow regardless of the value of  $\Delta t_2[k]$ . If  $\Delta t_1[k] \neq \Delta t_3[k], \Delta f[k] = K_{TFV} (\Delta t_1[k] - \Delta t_3[k])$ . For k = 1, 2, 3, ...,

$$\sum_{k=1} \Delta f[k] = K_{TFV} \sum_{k=1} \left( \Delta t_1[k] - \Delta t_3[k] \right)$$
(4.13)

will be performed by the BDGDL in the form of the charge accumulation of the load capacitor. The proposed iFDDC therefore performs the integration of frequency difference. As long as there is a frequency difference, the output of the iFDDC will vary.  $\sum_{k=1} \Delta f[k]$  is digitized by the BDGDL.

#### 4.3.2 All-Digital FLL with iFDDC

In this section, we make use of the proposed iFDDC in the frequency-locked loop shown in Fig. 4.18(a). The pulse width extractors (PW blocks) extract the logic-1 pulse of REF and that of the DCO. The highlighted block depicts the BDGDL that performs three distinct functions : (i) It is a *period difference detector* that detects the difference between the period of REF,  $T_{REF}$ , and that of the DCO,  $T_{DCO}$ , (ii) It is a *time integrator* that integrates the difference between the period of REF and that of the DCO,  $T_e = T_{REF} - T_{DCO}$ , and (iii) It is a time-to-digital converter (TDC) that converts the output of the time integrator to a thermometer code. The output of the BDGDL is the frequency control word (FCW) of the DCO. The schematic of the DCO is shown in Fig. 4.8. Frequency tuning is achieved by adjusting the charging current of the capacitors at the output nodes. The frequency of the oscillator is set to 5 GHz when  $D_{32}...D_2D_1 = \underbrace{1...1}_{16} \underbrace{0...0}_{16}$  such that its frequency can be tuned digitally upwards or downwards in 16 equal steps.

Fig. 4.18(b) shows the block diagram of the core portion of the FLL with the nonlinearity of the BDGDL and DCO not accounted for.  $K_{DL}$  is the the



Figure 4.18: (a) Frequency-locked loop with an iFDDC. (b) Block diagram of the core of frequency-locked loop with an iFDDC.

time-to-digital conversion gain of the BDGDL,  $K_{DCO}$  is the digital-to-time gain of the DCO,  $Q_{DL}$  and  $Q_{DCO}$  are the quantization error of the BDGDL and DCO, respectively. It is important to note that  $Q_{DL}$  and  $Q_{DCO}$  are time quantization errors. It is assume that the per-stage-delay  $\tau_{DL}$  of the BDGDL is significantly smaller as compared with  $T_e = T_{REF} - T_{DCO}$  such that  $T_e$  can be measured with an adequate accuracy. Should the per-stage-delay is not sufficiently small, one can use pre-skewed BDGDL to lower per-stage-delay [53, 54]. The output of  $T_{DCO}$  is given by

$$T_{DCO} = \frac{K_{DCO}K_{DL}}{D}T_{REF} + \frac{sK_{DCO}}{D}Q_{DL} + \frac{s}{D}Q_{DCO},$$

where  $D = s + K_{DCO}K_{DL}$ . It is seen that the system is first-order and stable. Also observed is that the signal transfer function is low-pass whereas the error transfer functions are high-pass, more specifically first-order noiseshaping, ensuring that signal passes through the loop without attenuation while quantization errors that fall into loop bandwidth is minimized. The error is given by

$$T_e(s) = \frac{s}{D} \left( T_{REF} - K_{DCO} Q_{DL} - Q_{DCO} \right).$$
(4.14)

Let us first concentrate on the signal and neglect the quantization errors. For a step frequency disturbance of amplitude  $\Delta T_{REF}$ :  $T_{REF}(t) =$   $\Delta T_{REF}u(t)$  where u(t) is the unit-step function, we have  $T_{REF}(s) = \frac{\Delta T_{REF}}{s}$ . As a result, we have from (4.14) :  $T_e(s) = \frac{\Delta T_{REF}}{D}$ . The steady state error is obtained from the final value theorem :  $\lim_{t\to\infty} T_e(t) = \lim_{s\to 0} sT_e(s) = 0$ . Since  $T_e(t) = T_{REF}(t) - T_{DCO}(t)$ , we have  $\lim_{t\to\infty} T_{REF}(t) = \lim_{t\to\infty} T_{DCO}(t)$ . As a result,  $\lim_{t\to\infty} \Delta f = 0$ . The steady state frequency error of the FLL is therefore zero.



Figure 4.19: Quantization error of BDGDL (a) and DCO (b). The center frequency of the DCO is set to  $f^* = 1/T^*$  at which FCW=0...01...1 with 16 0s and 16 1s.

Let us now focus on quantization errors. Since  $Q_{DL}$  and  $Q_{DCO}$  are nonzero, their impact on the steady state frequency error needs to be accounted for. Examine the impact of  $Q_{DL}$  first. The quantization error of the BDGDL is a ramping function, as shown in Fig. 4.19(a) where  $T_{max}$  is the amount of the gating time of the gated stage at which its output voltage just crosses the threshold voltage of the gated stages. The gain of the gated stage is given by  $\frac{\tau_{DL}}{T_{max}}$  and can be adjusted by varying either the load capacitance or current of the gated stage. Since  $Q_{DL}(t) = -\frac{\tau_{DL}}{T_{max}}t$  where t is gating time, we have  $Q_{DL}(s) = -\frac{\tau_{DL}}{T_{max}}\frac{1}{s^2}$ . The steady state error is obtained from

$$\lim_{t \to \infty} T_e(t) = -\lim_{s \to 0} s\left(\frac{sK_{DCO}}{D}\frac{\tau_{DL}}{T_{max}}\frac{1}{s^2}\right) = -\frac{\tau_{DL}}{K_{DL}T_{max}}.$$

The steady state error is directly proportional to  $\tau_{DL}$ . Let us now examine the impact of  $Q_{DCO}$ . Similar to  $Q_{DL}$ ,  $Q_{DCO}$  is also a ramping function, as shown in Fig. 4.19(b) where  $\Delta T_{DCO}$  is period adjustment step by FCW. Since the input of the DCO is the output of the BDGDL and the period of the DCO is adjusted every time FCW changes, the gain of the DCO is given by  $\frac{\Delta T_{DCO}}{\tau_{DL}}$ . Since  $Q_{DCO}(t) = -\frac{\Delta T_{DCO}}{\tau_{DL}}t$ , we have  $Q_{DCO}(s) = -\frac{\Delta T_{DCO}}{\tau_{DL}}\frac{1}{s^2}$ . The steady state error is obtained from

$$\lim_{t \to \infty} T_e(t) = -\lim_{s \to 0} s\left(\frac{s}{D} \frac{\Delta T_{DCO}}{\tau_{DL}} \frac{1}{s^2}\right) = -\frac{\Delta T_{DCO}}{K_{DL} K_{DCO} \tau_{DL}}.$$

It is seen that the steady state error is directly proportional to  $\Delta T_{DCO}$ .

#### 4.3.3 Simulation Results

The FLL is designed in TSMC's 65 nm 1.2 V CMOS technology and analyzed using Spectre with BSIM4 device models. The input is a 5 GHz 50% duty cycle square wave. The BDGDL has 32 stages with per-stage-delay of 30 ps. DCO frequency tuning range is  $3.95 \sim 5.95$  GHz. Fig. 4.20 shows the locking process of the FLL. The FLL establishes lock in 17 ns (85 cycles) in TT with normalized steady state error less than 0.2%. It locks to frequencies other than 5 GHz at process corners. The reason for this is the large variation of the frequency of the DCO. Fig. 4.21 shows the locking process of the FLL when the center frequency of the DCO at process corners is properly adjusted. Fig. 4.22 show the impact of the per-stage-delay of the BDGDL on the normalized steady state frequency error of the FLL. It is seen the steady state error rises when per-stage-delay increases, echoing the theoretical results on this given earlier. Fig. 4.23 shows the locking process of the FLL with different perstage-delays. The smaller the per-stage-delay, the smaller the frequency error in lock state. The fluctuation of the DCO is due to the finite resolution of the BDGDL and DCO. The power consumption of iFDDC, DCO, and buffers are 0.474 mW (0.93%), 47.98 mW (94.3%), and 2.43 mW (4.77%), respectively.



Figure 4.20: Locking process of ADFLL in nominal process conditions (TT) and process corners (FF, FS, SF, and SS).



Figure 4.21: Locking process of ADFLL at process corners with tuned DCO.



Figure 4.22: Impact of per-stage-delay of BDGDL on normalized steady state frequency error of ADFLL.



Figure 4.23: Impact of per-stage-delay of BDGDL on locking process of ADFLL.

### 4.4 Summary

A bi-directional gated delay line bang-bang frequency difference detector was proposed. The proposed frequency difference detector detects frequency difference by digitizing the logic-1 pulse of the oscillator and that of a reference clock. Up/down SAR logic driven by the output of the frequency detector was used to generate the frequency control word of the DCO. Unlike counter-based frequency detection that requires a large number of cycles in order to ensure accuracy, the proposed frequency detector only requires 4 cycles to complete frequency difference detection. Factors that contribute to the frequency error of the ADFLL were investigated. The ADFLL was tested using a 5 GHz frequency clock. Simulation results show the ADFLL achieves frequency lock in less than 10 ns with maximum frequency error of 3.2% error.

An all-digital low-power high-speed iFDDC was proposed with its effectiveness validated using a 5-GHz frequency-locked loop. The key attributives of the iFDDC including all-digital, low power consumption, high-speed operation, and built-in integration are accredited to the BDGDL. The built-in integration of the iFDDC results in zero steady state frequency error and first-order noise-shaping when the iFDDC is used in a frequency-locked loop. When the quantization errors of the BDGDL and DCO are accounted for, the steady state frequency error of the ADFLL is proportional to the quantization errors of the BDGDL and DCO. Simulation results agree well with theoretical results.

# Chapter 5

# **Conclusions and Future Work**

## 5.1 Conclusions

An adaptive DT-DFE with a sign<sup>3</sup> least-mean-square tap adaptation algorithm was developed. Commonly used DS-DFE suffers from the drawback of reduced vertical eye-opening when consecutive 1s or 0s are encountered. The proposed DT-DFE performs DFE only when a data transition is detected thereby retaining the obtained vertical eye-opening when consecutive 1s or 0s are presented in data. It boosts the eve-opening of the high-frequency components of data without attenuating the low-frequency components of data whereas DS-DFE boosts the eye-opening of the high-frequency components of data at the expense of the attenuated low-frequency components of data. The reference voltages of DS-DFE is tap-dependent whereas those of DT-DFE are tap-independent and are obtained by conveying consecutive 1s and 0s to the channel in a training phase prior to the commence of DFE and measuring the corresponding output of the continuous-time linear equalizer using a pair of successive approximation register analog-to-digital converters. The proposed DT-DFE utilizes loop unrolling to detect the occurrence of data transition so as to activate tap-tuning, launch DFE, and combat timing constraints. The performance of the proposed DT-DFE was compared with that of DS-DFE using two 5 Gbps backplane serial links designed in a TSMC 65 nm CMOS technology. Simulation resulted demonstrate that the eye-opening of the link with DT-DFE is 1.54 times that with DS-DFE. The vertical eye-opening of the link with DT-DFE activating tap adaptation on two consecutive state transitions of opposite polarities is 1.2 times that that activates tap adaptation on single state transition. The proposed DT-DFE is less sensitive to process uncertainty whereas DS-DFE is prone to process uncertainty with severely deteriorating performance.

A new adaptive DT-DFE with EE taps and raised references was developed. The characteristics and common drawbacks of data-state DFE and data-transition DFE were investigated. It was followed with an in-depth investigation of DT-DFE with EE-taps and raised references capable of improving both transition edge slope and vertical eye-opening. Loop-unrolling normally used to combat the timing constraints of data-state DFE was further developed for DT-DFE with EE-taps. The reference voltages with respect to which the error signals of DFE are defined were raised beyond that set by the low-frequency components of data to increase vertical eye-opening. Clock and data recovery was performed using blind 4x oversampling and phase-picking. The effectiveness of the proposed DT-DFE with EE-taps was validated using the simulation results of a 10 Gbps backplane serial link with a channel loss of 24 dB at 5 GHz. The link was designed in a TSMC 65 nm 1.2V CMOS technology and analyzed using Spectre from Cadence Design Systems with BSIM4.4 device models. Simulation results showed with 20 mV raised reference voltages, DT-DFE with EE-tap increased vertical eye-opening by 3.9 times, lowered data jitter by 4.86 times, and increased transition slope by 2.83 times.

A pre-skewed BDGDL bang-bang frequency detector (BBFD) with applications in frequency-locking of 10 Gb/s serial links was developed. Bang-bang frequency detection is performed using a pair of BDGDLs that digitize the logic-1 pulse of receiver oscillator and a reference clock. A redundant (SAR driven by the output of the BBFD was used to generate the FCW of the DCO of the receiver. A frequency detection decision can be made in only 4 cycles of the reference clock. The frequency error of the ADFLL utilizing the proposed BBFD was analyzed. The ADFLL was designed in a TSMC 65 nm 1.2 V CMOS and tested with a 5 GHz reference clock. Simulation results showed the ADFLL achieved frequency lock in less than 10 ns with the maximum frequency error in the lock state was within the frequency error boundaries of the BBFD.

An all-digital low-power high-speed integrating frequency difference-todigital converter (iFDDC) was developed. The iFDDC utilizes a BDGDL to detect the difference between the frequencies of two 50% duty-cycle square waves. The iFDDC possesses attractive characteristics including all-digital, low power consumption, high-speed operation, and built-in integration. The built-in integration of the iFDDC results in zero static frequency error and first-order noise-shaping when the iFDDC resides in a frequency-locked loop (FLL). When the quantization errors of BDGDL and DCO are accounted for, the static frequency error of the FLL is proportional to the quantization errors of the BDGDL and DCO. To validate the effectiveness of the iFDDC, a 5-GHz frequency-locked loop with the iFDDC was designed in a TSMC 65 nm 1.2V CMOS and analyzed using Spectre from Cadence Design Systems with BSIM4 device models. A good agreement between simulation and theoretical results were obtained. The iFDDC consumes only 474  $\mu$ W while operating at 5 GHz.

### 5.2 Future Work

In this dissertion, the drawback of conventional DS-DFE was studied and addressed with a proposed DT-DFE tehcnique. EE taps was also developed to further improve the data edge. Two power efficient ADFLLs are presented to cooperate with the 4x blind oversampling phase-picking clock recovery in the proposed EE-DT-DFE structure. Further work can be done with presented designs in the aspects of verification, theoretical exploration and design perfection. These are summarized below:

- Measurement Verification: All designs have only been validated with simulation results so far. A natural step of this work in the future is validating the EE-DT-DFE and the iFDDC with measurement results. The improvement on jitter tolerance of the link can be visualized with measurement results which is essential for EE-DT-DFE. It will further validate the function of the blind oversampling mechanism which is embedded to pick the correct phase gating signal of the EE-taps.
- Theoretical exploration on EE-tap: Different from conventional DS-DFE tap, the EE-tap activation time is not integer of UI. It will be attractive if the EE-DT-DFE can be explored with mathematical behavioral model and explored in z- or s- domain. Its characteristics can be further explored in terms of signal transferring particularly on data edge (high-frequency components) and error shaping.
- PLL design with the iFDDC: It has been introduced that the iFDDC in this work can provide frequency difference detection, integration and time-to-digital conversion. Instead of the pulse width of the REF and DCO, if the input time information to the BDGDL is replaced with phase of the REF and DCO, the iFDDC can function as integrating phase difference to digital converter (iPDDC). The actual challenge is a signal path which behaves as a resistor in a charge pump base PLL is needed to make the iPDDC function as a stable PLL. With this problem solved, a power-efficient all-digital PLL can be implemented with a single BDGDL, which can be even more attractive in communication applications.

# Bibliography

- P. Aziz, H. Kimura, A. Malipatil, and S. Kotagiri, "A class of downsampled floating tap DFE architectures with application to serial links," in *Proc. IEEE Int'l Symp. Circuits Syst.*, 2012, pp. 325–328.
- [2] F. Zhong and et al., "A 1.0625-to-14.025 Gb/s multimedia transceiver with full-rate source-series-terminated transmit driver and floating-tap decision-feedback equalizer in 40 nm CMOS," in *IEEE Int'l Solid-State Circuit Conf. Dig. Tech. Papers*, 2011, pp. 348–349.
- [3] —, "A 1.0625 14.025 Gb/s multi-media transceiver with fullrate source-series-terminated transmit driver and floating-tap decisionfeedback equalizer in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 3126–3139, Dec. 2011.
- [4] T. Carusone, "Introduction to digital I/O: constraining I/O power consumption in high-performance systems," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 4, pp. 14–22, Nov. 2015.
- [5] J. Bulzacchelli, "Equalization for electrical links: current design techniques and future directions," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 4, pp. 23–31, Nov. 2015.
- [6] Z. Zhao, J. Wang, S. Li, and J. Chen, "A 2.5-Gb/s 0.13 μm CMOS current mode logic transceiver with pre-emphasis and equalization," in *Proc. Int'l conf. ASIC*, Oct. 2007, pp. 368 –371.

- S. Kao and S. Liu, "A 20-Gb/s transmitter with adaptive pre-emphasis in 65-nm CMOS technology," *IEEE Trans. Circuits Syst. II.*, vol. 57, no. 5, pp. 319 –323, May 2010.
- [8] B. Kim and et al., "A 10-Gb/s compact low-power serial I/O with DFE-IIR equalization in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3526–3538, Dec. 2009.
- [9] J.Savoj and et al., "Design of high-speed wireline transceiver for backplane communications in 28nm CMOS," in *Proc. IEEE Custom Inte*grated Circuits Conf., 2012, pp. 1–4.
- [10] Y. Song, H. Yang, H. Li, P. Chiang, and S. Palermo, "An 8-16 gb/s, 0.65-1.05 pj/b, voltage-mode transmitter with analog impedance modulation equalization and sub-3 ns power-state transitioning," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2631–2643, Nov 2014.
- [11] A. Fiedler, R. Mactaggart, J. Welch, and S. Krishnan, "A 1.0625 Gbps transceiver with 2x-oversampling and transmit signal pre-emphasis," in *IEEE Int'l Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 1997, pp. 238–239.
- [12] J. Dally and J. Poulton, "Transmitter equalization for 4-Gbps signaling," *IEEE Micro*, vol. 17, no. 1, pp. 48–56, Jan./Feb 1997.
- [13] J. Jaussi and et al., "8-Gb/s source-synchronous I/O link with adaptive receiver equalization, offset cancellation, and clock de-skew," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 80 – 88, Jan. 2005.
- [14] M. Lee, W. Dally, R. Farjad-Rad, H. Ng, R. Senthinathan, J. Edmondson, and J. Poulton, "CMOS high-speed I/Os - present and future," pp. 454–461, Oct. 2003.
- [15] J. Winters and S. Kasturia, "Adaptive nonlinear cancellation for highspeed fiber-optic systems," J. of Lightwave Tech., vol. 10, no. 7, pp. 971–977, July 1992.

- [16] J. Ren and et al., "Precursor ISI reduction in high-speed I/O," in Symp. VLSI Circuits Dig. Tech. Papers, 2007, pp. 134–135.
- [17] K. Krishna, D. Yokoyama-Martin, A. Caffee, C. Jones, M. Loikkanen, J. Parker, R. Segelken, J. Sonntag, J. Stonick, S. Titus, D. Weinlader, and S. Wolfer, "A multi-giga-bit backplane transceiver core in 0.13-μm CMOS with a power-efficient equalization architecture," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2658–2666, Dec. 2005.
- [18] T. Beukema, M. Sorna, K. Selander, S. Zier, B. Ji, P. Murfet, J. Mason, W. Rhee, H. Ainspan, B. Parker, and M. Beakes, "A 6.4-Gb/s CMOS SerDes core with feed-forward and decision-feedback equalization," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2633 2645, Dec. 2005.
- [19] B. Leibowitz and et al., "A 7.5Gb/s 10-tap DFE receiver with first tap partial response, spectrally gated adaptation, and 2nd-order datafiltered CDR," in *IEEE Int'l Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2007, pp. 228 –599.
- [20] J. Bulzacchelli and et al., "A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90 nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2885–2900, Dec. 2006.
- [21] —, "A 28-Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3232–3248, Dec. 2012.
- [22] T. Toifl and et al., "A 2.6 mW/Gb/s 12.5 Gbps RX with 8-tap switchedcapacitor DFE in 32 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 897–910, April 2012.
- [23] —, "A 3.1 mW/Gb/s 30 Gbps quadrater-rate triple-speculation 15tap SC-DFE RX data path in 32 nm CMOS," in Symp. VLSI Circuits Dig. Tech. Papers, 2012, pp. 102–103.

- [24] Y.-C. Huang and S.-I. Liu, "A 6gb/s receiver with 32.7db adaptive dfeiir equalization," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE International,, Feb 2011, pp. 356–358.
- [25] N. Sitthimahachaikul, J. Keane, and P. Hurst, "An adaptive dfe using an iir feedback equalizer for 100base-tx ethernet," in *Circuits and Sys*tems (NEWCAS). The 2nd Annual IEEE Northeast Workshop on,, June 2004, pp. 173–176.
- [26] B. Leibowitz and et al., "A 0.28pj/b 2gb/s/ch transceiver in 90nm cmos for 10mm on-chip interconnects," in *IEEE Int'l Solid-State Cir*cuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2007, pp. 414–612.
- [27] E. M. amd D. Schinkel, E. Klumperink, E. van Tuijl, and B. Nauta, "Power efficient gigabit communication over capacitively driven rclimited on-chip interconnects," *IEEE J. Solid-State Circuits*, vol. 45, no. 2, pp. 447–457, Feb. 2010.
- [28] S. Shahramian and A. Chan Carusone, "A 0.41 pJ/Bit 10 Gb/s hybrid 2 IIR and 1 discrete-time DFE tap in 28 nm-LP CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 7, pp. 1722–1735, July 2015.
- [29] S. Shahramian, B. Dehlaghi, and A. Chan Carusone, "Edge-based adaptation for a 1 iir + 1 discrete-time tap dfe converging in 5 μs," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3192–3203, Nov 2016.
- [30] V. Stojanovic and et al., "Autonomous dual-mode (PAM2/4) serial link transceiver with adaptive equalization and data recovery," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 1012–1026, April 2005.
- [31] F. Gerfers and et al., "A 0.2-2 Gb/s 6x OSR receiver using a digitally self-adaptive equalizer," *IEEE J. Solid-State Circuits*, vol. 43, no. 6, pp. 1436–1448, June 2008.

- [32] K. Wong, E. Chen, and C. Yang, "Edge and data adaptive equalization of serial-link transceivers," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2157–2169, Sept. 2008.
- [33] S. Kim and et al., "A 6.4-Gb/s voltage-mode near-ground receiver with a one-tap data and edge DFE," *IEEE Trans. Circuits Syst. II*, vol. 61, no. 6, pp. 438–442, June 2014.
- [34] J. Lee, K. Park, K. Lee, and D. Jeong, "A 2.44-pj/b 1.62–10-Gb/s receiver for next generation video interface equalizing 23-dB loss with adaptive 2-tap data DFE and 1-tap edge DFE," *IEEE Trans. Circuits Syst. II*, vol. 65, no. 10, pp. 1295–1299, Oct 2018.
- [35] A. Anderson, "Optimum control logic for successive approximation analog-to-digital converters," *JPL Technical Report*, vol. TR32-1526, pp. 168–176, 1972.
- [36] A. Rossi and G. Fucili, "4-bit non-redundant SARs," *Electronics Letters*, vol. 32, no. 12, pp. 1055–1057, June 1996.
- [37] S. Sidiropoulos and M. Horowitz, "A 700 Mb/s/pin CMOS signaling interface using current integrating receivers," *IEEE J. Solid-State Circuits*, vol. 32, no. 5, pp. 681–690, May 1997.
- [38] T. Wang and F. Yuan, "A new current-mode incremental signaling scheme with applications to Gb/s parallel links," *IEEE Trans. Circuits Syst. I.*, vol. 54, no. 2, pp. 255–267, Feb. 2007.
- [39] T. Carusone, D. Johns, and K. Martin, Analog integrated circuit design, 2nd ed. New York: John Wiley and Sons, 2012.
- [40] A. Fayed and M. Ismail, "A low-voltage, low-power CMOS analog adaptive equalizer for UTP-5 cables," *IEEE Trans. Circuits Syst. I.*, vol. 55, no. 2, pp. 480–495, March 2008.

- [41] M. Dolan and F. Yuan, "An adaptive edge decision feedback equalizer with 4 PAM signalling," in *Proc. IEEE Mid-West Symp. Circuits Syst.*, 2017, pp. 535–538.
- [42] P. Patel and B. Barnett, "Experimental test fixture s-parameters 100 gb/s backplane study group," in IBM Corporation. http://www.ieee802.org/3/100GCU/public/channel.html, 2011.
- [43] Y. Li and F. Yuan, "Data-transition decision feedback equalizer with edge-emphasis taps and raised references," in *IEEE NEW Circuits Syst.* (NEWCAS), June 2019.
- [44] F. Yuan, CMOS current-mode circuits for data communications. New York: Springer, 2007.
- [45] B. J. Shastri and D. V. Plant, "5/10-gb/s burst-mode clock and data recovery based on semiblind oversampling for pons: Theoretical and experimental," *IEEE J. Selected Topics in Quantum Electronics*, vol. 16, no. 5, pp. 1298–1320, Sept/Oct 2010.
- [46] S. Shekhar, R. Inti, J. Jaussi, T. Hsueh, and B. K. Casper, "A low-power bidirectional link with a direct data-sequencing blind oversampling CDR," *IEEE J. Solid-State Circuits*, vol. 54, no. 6, pp. 1669–1681, June 2019.
- [47] B. Razavi, Design of integrated circuits for optical communications. New York: McGraw-Hill, 2003.
- [48] A. Djemouai, M. Sawan, and M. Slamani, "New frequency-locked loop based on CMOS frequency-to-voltage converter: Design and implementation," *IEEE Trans. Circuits Syst. II.*, vol. 48, no. 5, pp. 441–449, 2001.
- [49] H.Bui and Y.Savaria, "Design of a high-speed differential frequency-tovoltage converter and its application in a 5-GHz frequency-locked loop," *IEEE Trans. Circuits Syst. I.*, vol. 55, no. 3, pp. 766–774, 2008.

- [50] J. Bousquet, S. Aouini, N. Ben-Hamida, and J. Wolczanski, "A 32nm CMOS frequency locked loop for 20-GHz synthesis with ± 7.6 ppm resolution," in *Proc. IEEE Compound Semiconductor Integrated Circuit* Symposium, Oct 2013, pp. 1–4.
- [51] Y. Park and F. Yuan, "Low-power all-digital delta-sigma TDC with bidirectional gated delay line time integrator," in *IEEE Mid-West Symp. Circuits Syst.*, 2017, pp. 679–682.
- [52] S. Lee, B. Kim, and K. Lee, "A novel high-speed ring oscillator for multiphase clock generation using negative skewed delay scheme," *IEEE J. Solid-State Circuits*, vol. 32, no. 2, pp. 289–291, Feb. 1997.
- [53] Y. Li and F. Yuan, "A pre-skewed bi-directional gated delay line bang-bang frequency detector with applications in 10 Gbps serial link frequency-locking," in *Proc. IEEE Mid-West Symp. Circuits Syst.*, Aug. 2019, pp. 263–266.
- [54] F. Yuan and P. Parekh, "Time-based all-digital delta-sigma time-todigital converter with pre-skewed bi-directional gated delay line time integrator," *IET Circuits Devices Systems*, in press 2019.