### Ryerson University Digital Commons @ Ryerson

Theses and dissertations

1-1-2008

## Inter-Signal Timing Skew Compensation Of Source-Synchronized Parallel Links With Incremental Signaling

An Hu Ryerson University

Follow this and additional works at: http://digitalcommons.ryerson.ca/dissertations Part of the <u>Electrical and Computer Engineering Commons</u>

#### **Recommended** Citation

Hu, An, "Inter-Signal Timing Skew Compensation Of Source-Synchronized Parallel Links With Incremental Signaling" (2008). *Theses and dissertations*. Paper 1116.

This Thesis is brought to you for free and open access by Digital Commons @ Ryerson. It has been accepted for inclusion in Theses and dissertations by an authorized administrator of Digital Commons @ Ryerson. For more information, please contact bcameron@ryerson.ca.



## INTER-SIGNAL TIMING SKEW COMPENSATION OF SOURCE-SYNCHRONIZED PARALLEL LINKS WITH INCREMENTAL SIGNALING

by

## An Hu

Bachelor of Applied Science, University of Toronto, Toronto, 2005

## A thesis

presented to Ryerson University in partial fulfillment of the requirements for the degree of Master of Applied Science in the Program of Electrical and Computer Engineer

Toronto, Ontario, Canada, 2008

© An Hu 2008

PROPERTY OF RYERSON UNIVERSITY LIBRARY

## **Author's Declaration**

I hereby declare that I am the sole author of this thesis.

I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose of scholarly research.

Signature

I further authorize Ryerson University to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.

Signature

PROPERTY LIGRARY

ii

### Abstract

#### INTER-SIGNAL TIMING SKEW COMPENSATION OF SOURCE-SYNCHRONIZED PARALLEL LINKS WITH INCREMENTAL SIGNALING

© An Hu 2008

Master of Applied Science Electrical and Computer Engineering Ryerson University

This thesis deals with inter-signal timing skew compensation of source-synchronized multi-Gbytes/s parallel links with both voltage-mode and current-mode incremental signalling schemes. To compensate for the inter-signal timing skew of parallel links with voltage-mode incremental signaling, an early/late block that detects the rising and falling edges of the pulses generated by inter-signal timing skews at the far end of channels, and subsequently allocates the optimal sampling point of the sampler of each data bit to maximize the timing margins. Two cascaded delay-locked loops are employed to place the sampling clock to the optimal sampling position of each data bit. To compensate for the inter-signal timing skew of parallel links with current-mode incremental signaling, each current-mode receiver maps the direction of its channel current representing the logic state of the incoming data to two voltages of different values. The feedback at the front-end of the receiver minimizes the dependence of the input impedance of the receiver on the channel current so that data dependent impedance mismatch is minimized. Intersignal timing skews are compensated by inserting a delay line in each channel.

### Acknowledgments

I am deeply indebted to Professor Fei Yuan for granting me the honor of being a member of his research group and for his enthusiastic support, constant guidance and inspiration throughout the course of this research. I have benefited from his vast technical expertise, insight, and high-standards in research, teaching and publication.

My heartfelt gratitude goes to Professor Vadim Geurkov, Professor Lev Kirischian, and Professor Eddie Law of the Department of Electrical and Computer Engineering, Ryerson Unversity, for their participation in the defense committee.

I thank all members of the Microelectronic Circuits and Systems Research Group at Ryerson University for technical assistance.

My thanks also go to the School of Graduate Studies of Ryerson University for the financial supports.

A substitution interview housed house we complete to place the complete from a derivational of an place of a substitution of the formation of the interview house of the derivation of the complete state of the interview house the derivation of the character of the theory of the derivation of the character of the interview house of the derivation of the character of the derivation of the character of the derivative of the derivative of the derivative of the derivative of the character of the character of the derivative of the d

## Contents

| 1 | Inti | roduct  | ion                                                               |    |
|---|------|---------|-------------------------------------------------------------------|----|
|   | 1.1  | Motiv   | ation                                                             | 1  |
|   | 1.2  | Origin  | al Contributions                                                  | 3  |
|   | 1.3  |         | Organization                                                      | 5  |
| 2 | Fun  | damer   | tals of Source-Synchronized Parallel Links                        | 6  |
|   | 2.1  | Electr  | ical Signaling Schemes for Source-Synchronized Parallel Links     | 7  |
|   |      | 2.1.1   | Single-Ended Signaling Scheme                                     | 7  |
|   |      | 2.1.2   | Fully-Differential Signaling Scheme                               | 7  |
|   |      | 2.1.3   | Pseudo-Differential Signaling Scheme                              | 8  |
|   |      | 2.1.4   | Incremental Signaling Schemes                                     | 9  |
|   |      | 2.1.5   | Design of Source-Synchronized Parallel Link Interfaces - A Review | 11 |
|   | 2.2  | Desigr  | Challenges of Source-Synchronized Parallel Links                  | 12 |
|   |      | 2.2.1   | Voltage Noise                                                     | 13 |
|   |      | 2.2.2   | Inter-Signal Timing Skew                                          | 17 |
|   |      | 2.2.3   | Design of Deskew Buffers - A Review                               | 18 |
|   | 2.3  | Chapt   | er Summary                                                        | 21 |
| 3 | Inte | er-Sign | al Timine Skew Compensation of Source-Synchronized Parallel       |    |
|   | Lin  | ks witł | n Voltage-Mode Incremental Signaling                              | 22 |
|   | 3.1  | Archit  | ecture                                                            | 23 |
|   | 3.2  | Comp    | arator                                                            | 24 |

|    | 3.3   | Early/Late Block                                                  | 26 |
|----|-------|-------------------------------------------------------------------|----|
|    | 3.4   | Delay-Locked Loop                                                 | 28 |
|    | 3.5   | Deskew Block                                                      | 30 |
|    | 3.6   | Simulation Results                                                | 34 |
|    | 3.7   | Chapter Summary                                                   | 39 |
| 4  | Inte  | er-Signal Timine Skew Compensation of Source-Synchronized Paralle | 1  |
|    | Lin   | ks with Current-Mode Incremental Signaling                        | 42 |
|    | 4.1   | Architecture                                                      | 43 |
|    | 4.2   | Current-Mode Incremental Signaling                                | 44 |
|    | 4.3   | Current-Mode Transmitter                                          | 45 |
|    | 4.4   | Current-Mode Receiver                                             | 47 |
|    |       | 4.4.1 Current-to-Voltage Mapping                                  | 47 |
|    |       | 4.4.2 Swing-Independent Input Impedance                           | 48 |
|    |       | 4.4.3 Common-Mode Voltage Stabilization                           | 49 |
|    |       | 4.4.4 Input Impedance Tuning                                      | 51 |
|    | 4.5   | Voltage Comparator                                                | 53 |
|    | 4.6   | Compensation of Inter-Signal Timing Skew                          | 54 |
|    | 4.7   | Simulation Results                                                | 58 |
|    | 4.8   | Chapter Summary                                                   | 62 |
| 5  | Con   | clusions and Future Work                                          | 64 |
|    |       |                                                                   |    |
| Bi | bliog | graphy                                                            | 66 |
|    |       |                                                                   |    |

## List of Tables

| 2.1 | Logic state of parallel links with the voltage-mode incremental signaling scheme | 10 |
|-----|----------------------------------------------------------------------------------|----|
| 2.2 | Logic states of current-mode incremental signaling                               | 11 |
| 2.3 | Parallel link performance                                                        | 12 |
| 2.4 | Deskew buffer performance                                                        | 20 |
| 3.1 | Time delay range of VCDL at process corners.                                     | 33 |
| 3.2 | Timing schedule                                                                  | 38 |
| 3.3 | Performance of parallel link interface                                           | 41 |
| 4.1 | The minimum and maximum time delays of data delay blocks in nominal              |    |
|     | process conditions and at process corners.                                       | 55 |
| 4.2 | Performance of the current-mode parallel link interface                          | 60 |

## List of Figures

| 2.1  | (a) Parallel links with single-ended signaling. (b) Timing diagram                | 6  |
|------|-----------------------------------------------------------------------------------|----|
| 2.2  | (a) Single-ended signaling. (b) Fully-differential signaling                      | 8  |
| 2.3  | Pseudo-differential signaling                                                     | 8  |
| 2.4  | (a) Parallel links with the voltage-mode incremental signaling scheme. (b)        |    |
|      | Encoder and decoder.                                                              | 9  |
| 2.5  | (a) Parallel links with the current-mode incremental signaling scheme. (b)        |    |
|      | Schematic of the current-mode driver (Tx)                                         | 10 |
| 2.6  | Schematic of the reviewed parallel link interface                                 | 13 |
| 2.7  | (a) Eye diagram with large eye opening. (b) Eye diagram with small eye            |    |
|      | opening                                                                           | 14 |
| 2.8  | Intersymbol interference. (a) Transmitted signals. (b) Pulse spreading due to     |    |
|      | ISI. (c) Actual transmitted signals.                                              | 15 |
| 2.9  | Transfer function of RG-55U cables                                                | 16 |
| 2.10 | Data recovery in parallel links. (a) Without inter-signal timing skew. (b)        |    |
|      | With inter-signal timing skew                                                     | 18 |
| 2.11 | Per-pin skew compensation of parallel links with a single-ended signaling scheme. | 19 |
| 3.1  | Parallel links with voltage-mode incremental signaling and per-pin skew com-      |    |
| 0.1  | pensation.                                                                        | 23 |
| 2.0  |                                                                                   |    |
| 3.2  | Comparator output $D''[n]$ for different $T_{skew}$ .                             | 24 |
| 3.3  | Timing diagram of Early/Late block.                                               | 25 |

| 3.4  | Schematic of comparator. Transistor sizes: $W_{1,2} = 5\mu m$ , $W_{3-6} = 27\mu m$ ,                                                 |    |
|------|---------------------------------------------------------------------------------------------------------------------------------------|----|
|      | $W_{7,8} = 50\mu \text{m}, W_{9,10} = 5\mu \text{m}, W_{11,12} = 10\mu \text{m}, W_{13} = 40\mu \text{m}.$ $L = 0.13\mu \text{m}$ for |    |
|      | all transistors. Biasing: $I_{SS} = 2mA$ , $V_{b1} = 0.8V$ , $V_{b2} = 0.6V$                                                          | 26 |
| 3.5  | (a) Generic architecture of Gilbert cell. (b) Characteristic curve of $V_{out}$ versus                                                |    |
|      | $V_{cont}$ when $V_{in} > 0$ and $V_{in} < 0$                                                                                         | 27 |
| 3.6  | Simulated transient response of comparator. Left: $T_{skew} = 60ps$ . Right:                                                          |    |
|      | $T_{skew} = 310ps.$                                                                                                                   | 27 |
| 3.7  | Simulated waveforms of $D''[n]$ with various signal skews.                                                                            | 27 |
| 3.8  | Schematic of Early/Late block.                                                                                                        | 28 |
| 3.9  | Timing diagram of control voltages. T1 is the time instant at which the                                                               |    |
|      | Early/Late block is enabled to detect the rising and falling edges of $D''[n]$ .                                                      |    |
|      | T2 is the time instant at which DLL1 is enabled, and Vfb[n] starts to align                                                           |    |
|      | with VE[n]. T3 is the time instant at which DLL2 is enabled where Vfb[n] is                                                           |    |
|      | aligned with VE[n] and MCLK'[n] starts to align with VM[n]. T4 is the time                                                            |    |
|      | instant at which MCLK'[n] and VM[n] are aligned and DLL2 is disabled. T5                                                              |    |
|      | marks the start of the data transmission phase and the end of the calibration                                                         |    |
|      | phase                                                                                                                                 | 28 |
| 3.10 | Simulated waveforms of the voltages of the Early/Late block. Left: $T_{skew} =$                                                       |    |
|      | 60 <i>ps</i> . Right: $T_{skew} = 310ps$ . $T_1 = 1.5ns$                                                                              | 29 |
| 3.11 | (a) Typical DLL architecture. (b) DLL model                                                                                           | 29 |
| 3.12 | Schematic of Deskew block. When $V_{c2}$ =Logic-0, the output of the MUX is $V_{cal}$ .                                               |    |
|      | When $V_{c2}$ =Logic-1, the output of the MUX is from the preceding charge pump.                                                      |    |
|      | The same holds for $V_{c3}$ . $V_{c4}$ is set to Logic1 when VM[n] and MCLK'[n] are                                                   |    |
|      | aligned. Transistor sizes: $W_{1,2} = 12\mu \text{m}, W_3 = 6\mu \text{m}, W_4 = 4\mu \text{m}, W_5 = 10\mu \text{m},$                |    |
|      | $W_6 = 4.5 \mu \text{m}, W_7 = 5 \mu \text{m}, W_8 = 2.3 \mu \text{m}, W_9 = 2.07 \mu \text{m}.$ $L = 0.13 \mu \text{m}$ for all      |    |
|      | transistors. $C_1 = C_2 = 4pF$ . $V_{cal} = 0.61V$                                                                                    | 31 |
|      |                                                                                                                                       |    |

| 3.13 | (a) Schematic of PFD. (b) Schematic of charge pump. Transistor sizes:                                                                        |    |
|------|----------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | $W_{1,3,5} = 2.5\mu \text{m}, W_{2,4,6} = 1\mu \text{m}, W_{7,11} = 12.5\mu \text{m}, W_{8,12} = 7.5\mu \text{m}, W_{9,13} = 3\mu \text{m},$ |    |
|      | $W_{10,14} = 5\mu \text{m}.$ $L = 0.13\mu \text{m}$ for all transistors.                                                                     | 32 |
| 3.14 | Dependence of the delay of VCDL on $V_{ctrl1}$ at process corners                                                                            | 32 |
| 3.15 | Maximum phase shift. (a) $2T_{bit}$ to be subtracted from Vfb[n]. (b) $2T_{bit}$ to be                                                       |    |
|      | added to $Vfb[n]$                                                                                                                            | 33 |
| 3.16 | Simulated transient response of VL[n], VM[n] and Vfb[n] for various Vctrl1[n].                                                               |    |
|      | Top: Vctrl1[n]=0. Middle: Vctrl1[n]= $V_{cal}$ . Bottom: Vctrl1[n]=1.2V                                                                      | 34 |
| 3.17 | Four-channel parallel link with voltage-mode incremental signaling and per-                                                                  |    |
|      | pin skew compensation. $L_1 = 0.13m$ , $L_2 = 0.11m$ , $L_3 = 0.1m$ , $L_4 = 0.15m$ .                                                        |    |
|      | $R = 50\Omega$ .                                                                                                                             | 35 |
| 3.18 | (a) Schematic of driver. Transistor sizes: $W_1 = 1\mu m$ , $W_2 = 2\mu m$ , $W_3 = 4\mu m$ ,                                                |    |
|      | $W_4 = 8\mu m, W_5 = 16\mu m, W_6 = 32\mu m, W_7 = 64\mu m, W_8 = 128\mu m. L =$                                                             |    |
|      | 0.13 $\mu$ m for all transistors. (b) Microstrip line configuration. $H_1 = 200 \mu m$ ,                                                     |    |
|      | $H_2 = 500 \mu m, W = 734 \mu m, \epsilon_r = 5$                                                                                             | 35 |
| 3.19 | Timing diagram of the input signals.                                                                                                         | 37 |
| 3.20 | Simulated receiver signals before $T_5$                                                                                                      | 37 |
| 3.21 | Simulated receiver signals after $T_5$                                                                                                       | 38 |
| 3.22 | Simulation results of deskewing process for $MCLK'[1]$ with respect to $D''[1]$                                                              |    |
|      | for 60ps skew                                                                                                                                | 39 |
| 3.23 | Simulation results of deskewing process for MCLK'[1] with respect to $D''[1]$                                                                |    |
|      | for 310ps skew                                                                                                                               | 40 |
| 3.24 | Waveforms of MCLK'[1], MCLK'[2], $D''[1]$ , and $D''[2]$ after deskew                                                                        | 41 |
| 4.1  | Parallel links with current-mode incremental signaling and inter-signal timing                                                               |    |
|      | skew compensation                                                                                                                            | 44 |
| 4.2  | Timing diagram of comparator without inter-signal timing skew                                                                                | 45 |
| 4.3  | Timing diagram of the incremental signaling scheme when inter-signal timing                                                                  |    |
|      | skews exist.                                                                                                                                 | 46 |
|      |                                                                                                                                              |    |

| 4.4  | Simplified schematic of current-mode transmitter.                                                                                                  | 46 |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.5  | Simplified schematic of receiver. Transistor sizes: $W_1 = 116 \mu m$ , $W_2 = 90 \mu m$ ,                                                         |    |
|      | $W_{3,5} = 60 \mu \text{m}, W_{4,6} = 90 \mu \text{m}, W_7 = 43 \mu \text{m}, W_8 = 3 \mu \text{m}, W_9 = 3.5 \mu \text{m}, W_{10} =$              |    |
|      | $40\mu m$ , $W_{11} = 20\mu m$ . $L = 0.13\mu m$ for all transistors. DC biasing: $V_{G9} =$                                                       |    |
|      | $0.73V, V_{G7} = 0.8V, V_{G3} = 0.94V, V_{G2} = 0.45V, V_{G4,6} = 0.65V, V_{G11} = 0.6V.$                                                          | 47 |
| 4.6  | Simulated waveforms of $v_L$ , $v_S$ and the comparator output $D'[n]$ under TT                                                                    | 48 |
| 4.7  | Simulated frequency response of $v_L$ and $v_S$                                                                                                    | 48 |
| 4.8  | Top: Simulated receiver input impedance with the feedback amplifier when                                                                           |    |
|      | $i_{in} = 0, \pm 2mA$ . Bottom: Dependence of the receiver input impedance on the                                                                  |    |
|      | input current with and without the feedback amplifier.                                                                                             | 49 |
| 4.9  | Simplified schematic of receiver with replica biasing                                                                                              | 50 |
| 4.10 | Simulated waveform of $v_L$ and $v_S$ with replica-biasing. Top: $v_S$ . Bottom: $v_L$ .                                                           | 51 |
| 4.11 | Top: Simulated frequency dependence of the input impedance of receiver.                                                                            |    |
|      | Bottom: Dependence of the input impedance of the receiver on $V_{tune}$ at 1 GHz.                                                                  | 52 |
| 4.12 | Simulated waveforms of $v_L$ and $v_S$ with replica-biasing. Top: $v_S$ . Bottom: $v_L$ .                                                          | 53 |
| 4.13 | Simulated input impedance of the receiver at different process corners                                                                             | 53 |
| 4.14 | Simplified schematic of voltage comparator. Transistor sizes: $W_{1,2} = 25 \mu m$ ,                                                               |    |
|      | $W_{3,4} = 12.5 \mu \text{m}, \ W_{5,6} = 2.5 \mu \text{m}, \ W_{7,8} = 10 \mu \text{m}, \ W_{9,10} = 2 \mu \text{m}, \ W_{11} = 35 \mu \text{m}.$ |    |
|      | $L = 0.13 \mu \text{m}$ for all transistors. DC biasing: $V_{G11} = 0.45 V$                                                                        | 54 |
| 4.15 | Inter-signal timing skew compensation scheme. (a) MCLK' and all received                                                                           |    |
|      | training data are delayed by $T_{min}$ at $t = T_1$ where training data are conveyed                                                               |    |
|      | to all channels. (b) MCLK' is further delayed by one bit time and all delayed                                                                      |    |
|      | training data bits are further delayed to align up with MCLK". The amount                                                                          |    |
|      | of the time delay of each data bit is determined by respective delay-locked loop.                                                                  | 55 |
|      |                                                                                                                                                    |    |

- 4.16 (a) Simplified schematic of sampling clock delay block. (b) Simplified schematic of data deskew block. Circuit parameters:  $W_{1,2} = 12 \mu m$ ,  $W_3 = 6 \mu m$ ,  $W_4 =$  $4\mu m, W_5 = 10\mu m, W_6 = 4.5\mu m, W_7 = 5\mu m, W_8 = 2.3\mu m, W_9 = 2.07\mu m.$  $L = 0.13 \mu \text{m}$  for all transistors. C=500 fF. Circuit parameters of charge pump :  $W_{1,3,5} = 2.5 \mu \text{m}, W_{2,4,6} = 1 \mu \text{m}, W_{7,11} = 12.5 \mu \text{m}, W_{8,12} = 7.5 \mu \text{m}, W_{9,13} = 3 \mu \text{m},$  $W_{10,14} = 5\mu m$ .  $L = 0.13\mu m$  for all transistors. The transmission gate is used to balance the time delay of the inverter inserted in UP signal path. . . . .
- 4.17 Timing diagram of  $V_{c1-c3}$ .  $T_1$  is the time instant at which MCLK and training data D[k], k=1,2,...,N, are conveyed to the channels.  $T_2$  marks the start of the deskew process during which D" [k], k=1,2,...,N, are phase-aligned with MCLK".  $T_3$  marks the end of deskew process and the start of generating the replica of the final deskewing control voltage.  $T_4$  marks the onset of the data transmission phase. The total deskew time includes that lock time of the delay

56

57

4.18 Simplified schematic of voltage replication circuit.  $W = 3\mu m$ .  $L = 0.13\mu m$ for all transistors. 58

4.19 Top: Simulated output voltage of counter. Middle: Simulated output voltage of comparator. Bottom : Simulated voltage of MUX connected to VCDL. . . 59

| 4.20 | Simplified schematic of 4-channel parallel link with current-mode incremen-                                                                        |    |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------|----|
|      | tal signaling and inter-signal timing skew compensation . Channel length:                                                                          |    |
|      | $L_0=0.1$ m, $L_1=0.06$ m, $L_2=0.14$ m, $L_3=0.16$ m, $H_1 = 200 \mu$ m, $H_2 = 500 \mu$ m,                                                       |    |
|      | $W = 734 \mu \mathrm{m}, \epsilon_r = 5. \ldots \ldots$ | 60 |
| 4.21 | Design Layout.                                                                                                                                     | 61 |

- 4.22 Simulated waveforms of  $I_{in}[0]$ ,  $I_{in}[1]$ ,  $I_{in}[2]$ , MCLK', D'[1] and D'[2]. . . . . 62
- 624.24 Simulated waveforms of D"[1]. D"[2], and MCLK" when the deskew process is completed. 62 . . . . . .

## Abbreviations

ADC - Analog to Digital Converter ASIC - Application-Specified Integrated Circuits BER - Bit Error Rate BSIM - Berkeley Short-Channel IGFET Model CP - Charge Pump CMOS - Complementary Metal-Oxide Semiconductor DFF - D Flip-Flop DRAM - Dynamic Random Access Memory DLL - Delay-Locked Loop FSM - Finite State Machine IC - Integrated Circuit I/O - Input/Output ISI - Inter-Symbol Interference LVDS - Low-Voltage Differential Signaling PCB - Printed Circuit Board PD - Phase Detector PFD - Phase Frequency Detector PLL - Phase-Locked Loop TG - Transmission Gate **TIA - Transimpedance Amplifier** TSPC - True Single Phase Clocking VCDL - Voltage-Controlled Delay Line VCO - Voltage-Controlled Oscillators



# Chapter 1 Introduction

## 1.1 Motivation

The exponential growth of the speed and integration level of digital integrated circuits (ICs) requires the communication bandwidth between ICs to increase accordingly. To maintain a balanced system, the communication input/output (I/O) bandwidth of ICs must scale with the integration level [1]. Traditionally, the bus structure has been employed as interconnections between systems. A global bus clock is distributed to all ICs to synchronize the transmission and reception of data. The demand for high I/O bandwidth has led to the use of point-to-point parallel links [2]. Compared to the bus paradigm, point-to-point parallel links offer the advantage of flexibility in physical architecture design and superiority in communication bandwidth for inter-chip data communications. Point-to-point parallel links have been widely used in short-distance communication applications such as multiprocessor interconnections, networking and communication switches, and consumer products with extensive multimedia applications [2].

Parallel links can adopt different signaling schemes including single-ended, fully-differential, pseudo-differential, and incremental signaling. Parallel links with single-ended signaling, although requiring only one physical conductor per channel, suffer from the worst signal integrity as noise coupled to the channels will directly reduce the timing margins and alter the amplitude of the received data. Parallel links with fully differential signaling offer the best signal integrity, however, at the cost of two physical conductors per channel. Pseudodifferential signaling is a compromise between performance and hardware cost by sharing a common reference channel among a group of channels. The signal integrity of each channel in the group differs due to the different physical distance between the channel and the common reference channel. Voltage-mode incremental signaling scheme proposed in [3] is an elegant signaling scheme specifically tailored to provide both superior signal integrity and a low hardware cost for parallel links. An N-bit parallel link with this signaling scheme requires only N + 1 physical conductors. Common-mode rejection is achieved by amplifying the signal difference of physically adjacent conductors with differentially configured receivers. To recover the transmitted data signals, encoders at the transmitter end and decoders at the receiver end are required.

A limiting factor of parallel link bandwidth is the inter-signal timing skew that is caused by the misalignment of the data at the receiver. Inter-signal timing skew is due to the mismatches between the electrical length of channels, switching noise, the mismatches of termination impedance, and the device mismatches of transmitters and receivers. Intersignal timing skew reduces the timing margin at the receiver. It was shown in [2][4][5] that inter-signal timing skew of parallel links with single-ended signaling can be compensated effectively using per-pin skew compensation. In this approach, the phase difference between the received master clock and each data bit is measured individually. This phase difference is then used to adjust each of the sampling clock that is a phase-shifted version of the received master clock such that the data bit is sampled at the center of its data eye. The preceding per-pin skew compensation is carried out in a calibration phase, which is performed before data transmission takes place. Calibration signals sent along the channels with the master clock MCLK in the calibration phase are square waves. At the receiver end, a master clock is generated using a phase-locked loop (PLL) with the transmitted MCLK as the reference. Inter-signal timing skew leads to a phase difference between the master clock and each data bit. A phase detector is used to quantify this difference and controls a voltage-controlled delay line (VCDL) which adds/subtracts a phase from MCLK such that the master clock aligns with the data. The preceding per-pin skew compensation has been widely adopted in

industry to combat pair-to-pair signal skew where skew amount up to one bit time can be compensated [6][7]. This per-pin deskewing approach, however, cannot be used for parallel links with voltage-mode incremental signaling as the logic state of each received data in this case is determined from the signals of both the channel itself and its neighboring channels. Thus, the objective of the thesis is to apply the per-pin skew compensation approach to the source-synchronized parallel link interfaces with both voltage-mode and current-mode incremental signaling schemes.

## 1.2 Original Contributions

This work focused on the design of current-mode transmitter and receiver of parallel link interfaces with incremental signaling scheme and inter-signal timing skew compensation. A new inter-signal timing skew compensation technique for parallel links with voltage-mode incremental signaling was proposed. The proposed technique employs an Early/Late block to detect the rising and falling edges of adjacent undesired pulses due to inter-signal timing skews, and subsequently allocates the optimal sampling point of the samplers to maximize the timing margins with a deskew block. Two delay-locked loops (DLLs) are employed to place the sampling clock of each data eye to its optimal sampling position. The skew compensation range is quantified from the delay range of the DLLs. The effectiveness of the proposed deskewing method was validated using a 1 Gbytes/s parallel link implemented in UMC-0.13 $\mu$ m 1.2V CMOS technology with four microstrip channels on a FR4 substrate [8]. The research papers related to the voltage-mode incremental signaling parallel link designs are as follow:

- A. Hu and F. Yuan, "Inter-signal timing skew compensation of parallel links with voltage-mode incremental signaling," IEEE Trans. Circuits and Syst. I - Regular Papers. Accepted for publication in July, 2008.
- A. Hu and F. Yuan, "Inter-signal timing skew compensation of parallel links with voltage-mode incremental signaling," Proc. IEEE Int'l Symp. Circuits and Syst.,

pp.1740-1743, Seattle, May 2008.

Also, a new inter-signal timing skew compensation technique for parallel links with current-mode incremental signaling is proposed. New current-mode transmitters and receivers are proposed to minimize the signal-dependent impedance mismatch and the effects of  $V_{DD}$  fluctuations. Both the transmitters and receivers of the parallel links are currentmode configured such that the intrinsic advantages of current-mode signaling are preserved. We show that each receiver maps the direction of its channel current, which represents the logic state of the incoming data, to two voltages of different values. The logic states of the transmitted data are recovered by the voltage comparators. The use of feedback in the front-end of the receiver eliminates the dependence of the receiver input impedance on the direction of the channel current so that signal-dependent impedance mismatch is minimized. The use of replica-biasing techniques minimizes the effect of supply voltage fluctuation on the performance of the front-end. Inter-signal timing skew compensation circuitry has been added to combat signal timing skews. Timing skews are eliminated by inserting a VCDL for each channel whose time delay is determined by the phase difference between the transmitted master clock and the output of the recovering comparator. The research papers related to the current-mode incremental signaling parallel link designs are as follow:

- A. Hu and F. Yuan, "A new parallel link interface with current-mode incremental signaling and per-pin skew compensation," Analog Integrated Circuits and Signal Processing (MWSCAS/NEWCAS Special Issue). Accepted for publication in July, 2008.
- A. Hu and F. Yuan, "Inter-signal timing skew compensation of parallel links with current-mode incremental signaling," IET - Circuits, Devices, and Systems. Submitted in May 2008.
- A. Hu and F. Yuan, "Current-mode parallel link interface with an incremental signaling scheme and inter-signal timing skew compensation," Proc. IEEE Mid-West Symp. Circuits and Syst., Knoxville, TN. Accepted for publication in May 2008.

- A. Hu and F. Yuan, "Parallel links with current-mode incremental signaling and per-pin skew compensation," Microelectronics Journal (MWSCAS/NEWCAS Special Issue). Submitted in April 2008 (Invited).
- A. Hu and F. Yuan, "A new parallel link interface with current-mode incremental signaling and per-pin skew compensation," Proc. IEEE Mid-West Symp. Circuits and Syst., pp.1457-1460, Montreal, August 2007.

## 1.3 Thesis Organization

This thesis is organized as follows: Chapter 2 presents the typical architecture and different signaling schemes of source-synchronized parallel links, and examines the design challenges of the parallel link interface including voltage noise and inter-signal timing skew. The causes of voltage noise and inter-signal timing skew are examined. Chapter 3 describes the designed inter-signal timing compensation technique for the voltage-mode incremental signaling parallel link interface. The designs of the receiver and the deskew circuitry are described in details. The simulation results of a 2-bit parallel link interface are also presented. Chapter 4 describes the designed current-mode parallel link interface with the incremental signaling scheme and per-pin deskew technique. The designs of transmitter, receiver, voltage comparator, and the deskew circuitry are introduced. The simulation results and the layout of the design are also presented. The thesis conclusion and future work is presented in Chapter 5.

## Chapter 2

## Fundamentals of Source-Synchronized Parallel Links

A generic architecture of parallel link interfaces is shown in Fig. 2.1(a). The inputs consist of the master clock (MCLK) and N-bit parallel input data  $(D[1] \sim D[n])$ .  $D[1] \sim D[n]$ are required to be synchronized with MCLK as shown in Fig. 2.1(b).  $D[1] \sim D[n]$  drive the transmitters (Tx) and send  $D'[1] \sim D'[n]$  to the channels. The receiver (Rx) recovers  $D[1] \sim D[n]$  from  $D'[1] \sim D'[n]$  and provides a matching impedance to the channel. MCLK is recovered with a DLL/PLL and intentionally delayed by 90°, as shown in Fig. 2.1(b). As a result, MCLK' is relocated to the center of the data eye and the timing margins are maximized.



Figure 2.1: (a) Parallel links with single-ended signaling. (b) Timing diagram.

Another class of parallel link is bidirectional interfaces that offer high bandwidth and more pin savings [4][9]-[12]. In this architecture, two transceivers send and receive data between each other simultaneously. The signals transmitted through the channels are usually multilevel data streams. The transmitted signals at each transceiver are recovered with multiple local reference voltages.

## 2.1 Electrical Signaling Schemes for Source-Synchronized Parallel Links

The signaling schemes of a parallel link are the encoding and decoding schemes of the transmitted signals  $D'[1] \sim D'[n]$  at the transmitter and the receiver. Single-ended signaling scheme, fully-differential signaling scheme, pseudo-differential signaling scheme, and incremental signaling scheme have been developed for parallel links [3][13]. In this section, these signaling schemes are briefly examined. A more detailed description is available in [13]. An emphasis is given to the incremental signaling scheme.

### 2.1.1 Single-Ended Signaling Scheme

A typical structure of the single-ended signaling scheme is shown in Fig. 2.2(a). To transmit N-bit data, N physical conductors are required. The two logic states of D[n] can be easily mapped to two different voltage levels  $(V_{high} \text{ and } V_{low})$  for D'[n]. To recover the transmitted signals,  $D'[1] \sim D'[n]$  are compared with a local reference voltage  $V_{ref}$ .  $V_{ref}$  is usually set to the average of  $V_{high}$  and  $V_{low}$  in order to ensure 50% duty cycle of the recovered signals  $D[1] \sim D[n]$  at the output of the voltage comparator. Since  $V_{ref}$  is provided by the receiver instead of the channel, noises added to  $D'[1] \sim D'[n]$  cannot be suppressed by the receiver.

#### 2.1.2 Fully-Differential Signaling Scheme

Parallel links with a fully-differential signaling scheme are shown in Fig. 2.2(b). To transmit N-bit data, 2N physical conductors are required. The transmitter maps D[n] to two different voltage levels carried by D'[na] and D'[nb]. When D[n] is logic-0, the voltage level of D'[na]is lower than that of D'[nb] and vice versa. At the receiver end,  $D[1] \sim D[n]$  are recovered by comparing the voltage difference between D'[na] and D'[nb]. The major advantage of



Figure 2.2: (a) Single-ended signaling. (b) Fully-differential signaling.

this signaling scheme is that D[n] is obtained from the difference between D'[na] and D'[nb] thus common-mode noise is effectively suppressed.

#### 2.1.3 Pseudo-Differential Signaling Scheme

The pseudo-differential signaling scheme is derived from the single-ended signaling scheme, as shown in Fig. 2.3. To transmit N-bit data, N+1 physical conductors are required. The reference voltage comes from the channel instead of the receiver thus common-mode noise rejection is provided but only to a certain extent as the physical distance between D'[n] and the reference signal varies. The number of the conductors per group is typically limited to four.





## 2.1.4 Incremental Signaling Schemes

## A. Voltage-Mode Incremental Signaling Scheme

The voltage-mode incremental signaling scheme is capable of common-mode noise rejection and requires minimum number of physical conductors. The architecture of parallel link interfaces with the voltage-mode incremental signaling is shown in Fig. 2.4(a) [3]. N-bit  $D[1] \sim D[n]$  are sent to the encoders and mapped into  $D'[1] \sim D'[n+1]$ .  $D'[1] \sim D'[n+1]$ are conveyed to the channels. To reject the common-mode noise, comparators at the receiverend evaluate the signal difference between adjacent channels (D'[n] and D'[n+1]) and output the result

$$D''[n] = D'[n+1] - D'[n].$$
(2.1)

To recover the logic state of the transmitted signals  $D[1] \sim D[n]$  from  $D''[1] \sim D''[n+1]$ , decoders circuitry are required. The voltage-mode incremental signaling scheme employs N+1 physical conductors to transmit N-bit of data, and offers the advantage of commonmode noise rejection as the fully-differential signaling scheme does.



Figure 2.4: (a) Parallel links with the voltage-mode incremental signaling scheme. (b) Encoder and decoder.

The implementation of the encoder and decoder proposed in [3] is shown in Fig. 2.4(b). The encoder is implemented with cascaded XOR2 gates

$$D'[n+1] = D'[n] \oplus D[n].$$
(2.2)

As a result, D'[n] is mapped to either logic-1 or logic-0. A sample scenario of D[n] consisting of 10 bits along with the corresponding D'[n] and D''[n] is shown in Table 2.1. By using peak detectors for the decoder circuitry, -1 for D''[n] will be corrected to 1.

| n                                                      | 1 | 2 | 3  | 4 | 5  | 6 | 7 | 8 | 9  | 10 | 11 |
|--------------------------------------------------------|---|---|----|---|----|---|---|---|----|----|----|
| $\begin{array}{c} D[n] \\ D'[n] \\ D''[n] \end{array}$ | 0 | 0 | 1  | 1 | 1  | 0 | 1 | 0 | 1  | 1  |    |
| D'[n]                                                  | 0 | 0 | 0  | 1 | 0  | 1 | 1 | 0 | 0  | 1  | 0  |
| D''[n]                                                 | 0 | 0 | -1 | 1 | -1 | 0 | 1 | 0 | -1 | 1  |    |

Table 2.1: Logic state of parallel links with the voltage-mode incremental signaling scheme

#### **B.** Current-Mode Incremental Signaling Scheme

A current-mode incremental signaling was proposed in [14]. As compared to voltage-mode signaling scheme, current-mode signaling offers the advantages including higher bandwidth and better signal integrity [14]. The architecture and the schematic of the receiver are shown in Fig. 2.5(a) and Fig. 2.5(b), respectively. Each transmitter Tx converts the input signal D[n] to a pair of currents D'[na] and D'[nb]. When D[n] is logic-1, Tx supplies J to D'[na] and sinks J from D'[nb], and vice versa.



Figure 2.5: (a) Parallel links with the current-mode incremental signaling scheme. (b) Schematic of the current-mode driver (Tx).

As a result, except the first and last channel, the net current D''[n] in each channel depends on the logic states of two adjacent data D[n] and D[n-1]. The relation is summarized as

$$D''[n] = \begin{cases} 0, \text{ if } D[n-1] \oplus D[n] = 0, \\ -2J, \text{ if } D[n-1] = 1 \text{ and } D[n] = 0, \\ 2J, \text{ if } D[n-1] = 0 \text{ and } D[n] = 1. \end{cases}$$
(2.3)

At the receiver end, the direction of the current through  $R_n$  determines the logic state of D[n]. For any three arbitrary consecutive bits (i.e. D[n-1], D[n] and D[n+1]), eight possible logic combinations exist. The corresponding values for D''[n] and D''[n+1], and the current through  $R_n$  are summarized in Table 2.2. Except the first and last scenario, it is evident that the logic state of D[n] matches the direction of the current through  $R_n$ . When the three consecutive bits D[n-1], D[n] and D[n+1] are either 000 or 111, the current through  $R_n$  is supplied with channel current of D''[n-2].

| n                     | 1 | 2  | 3   | 4  | 5   | 6   | 7   | 8 |
|-----------------------|---|----|-----|----|-----|-----|-----|---|
| D[n-1]                | 0 | 0  | 0   | 0  | 1   | 1   | 1   | 1 |
| D[n]                  | 0 | 0  | 1   | 1  | 0   | 0   | 1   | 1 |
| D[n+1]                | 0 | 1  | 0   | 1  | 0   | 1   | 0   | 1 |
| D''[n]                | 0 | 0  | 2J  | 2J | -2J | -2J | 0   | 0 |
| D''[n+1]              | 0 | 2J | -2J | 0  | 0   | 2J  | -2J | 0 |
| Current through $R_n$ | 0 | -J | J   | J  | -J  | -J  | J   | 0 |

 Table 2.2: Logic states of current-mode incremental signaling

## 2.1.5 Design of Source-Synchronized Parallel Link Interfaces - A Review

In this section, several published parallel link interface are reviewed. In [15], the author proposed a high-speed parallel link architecture with pseudo-differential signaling, as shown in Fig. 2.6(a). The signal from each channel is encoded into 3 different signal levels, and the transmitted signals are recovered by comparing the voltage difference between the channels and Vref. In [16], the transmitter of the proposed parallel link interface will supply or sink currents from the channel based on the logic states of the input signals, as shown in Fig. 2.6(b). At the receiver end, termination resistors convert the current signals into the voltage signals and behave as transimpedance amplifiers (TIAs). The transmitted signals are recovered by comparing the voltage difference between the local reference signal and the transmitted signals. In [17], a four-channel parallel link interface capable of transmitting three data is proposed, as shown in Fig. 2.6(c). The signal level transmitted through each channel depends on two of the three transmitted data. To recovered the first and last transmitted data at the receiver end, signals from two of the four channels are used. To recover the second data, signals from all four channels are required. In [18], the proposed architecture is also capable of transmitting three data using four channels, as shown in Fig. 2.6(d). The logic states of three data lead to eight possible combinations. Different logic combinations result four different signals transmitted through the channel where each signal can be either level high, center or low. To recover the transmitted data, the channel signals are compared with each other with voltage comparators. The performance of these designed are summarized in Table 2.3.

| Ref. | Total Power | Technology   | Data Rate   | Total Area            |
|------|-------------|--------------|-------------|-----------------------|
| [15] | N/A         | $0.25 \mu m$ | 1.8Gbytes/s | $3.1 \times 3.1 mm^2$ |
| [16] | 3.3mW       | $0.25 \mu m$ | 1.1Gbytes/s | N/A                   |
| [17] | 450mW       | $0.25 \mu m$ | 4Gbytes/s   | N/A                   |
| [18] | 17.1mW      | $0.18 \mu m$ | 4.2Gbytes/s | $3 \times 1.3 mm^2$   |

Table 2.3: Parallel link performance

## 2.2 Design Challenges of Source-Synchronized Parallel Links

This section describes the nonidealities of the transmitted signals over wireline channels. Signals transmitted through the channels will be affected by the voltage noise and inter-signal timing skew. In section 2.2.1, the causes and effects of the voltage noises are discussed. In section 2.2.2, the impact of inter-signal timing skews on wireline communications are discussed. Also, the per-pin skew compensation technique designed to combat inter-signal timing skew



Figure 2.6: Schematic of the reviewed parallel link interface

is described. In section 2.2.3, several published novel deskew designs are reviewed.

### 2.2.1 Voltage Noise

Nonidealities in high-speed electrical signaling can be visualized with the eye diagram in Fig. 2.7. The voltage margin refers to the difference between the maximum/minimum signal level and the reference voltage. Sources that can reduce the voltage margins include channel attenuation induced inter-symbol interference (ISI), impedance mismatches, fabrication offsets, and power supply noise [2].

#### **Channel Attenuation and ISI**

Channel attenuation induced ISI greatly contributes to the reduced voltage margins. ISI refers to the distortion of the received signal where the distortion is manifested in the tem-



Figure 2.7: (a) Eye diagram with large eye opening. (b) Eye diagram with small eye opening.

poral spreading and consequent overlap of individual pulses to the degree that the receiver cannot reliably distinguish between the changes of state [19]. For a five-bit signal stream 10110, as illustrated in Fig. 2.8(a), the finite rise and fall times of the transmitted signals lead to the undesired signal spreading such that the pulses overlap with each other, as shown in Fig. 2.8(b). Consequently, the signal levels are reduced at Bit two and Bit three, consequently their voltage margins are reduced as shown in 2.8(c).

ISI is affected by the low-pass and high-pass effects of channels [20]. The low-pass effects refer to the attenuation of the high-frequency components for the transmitted signals. On the other hand, the output dc levels for a long run of consecutive ones or zeros will drift due to the high-pass effects. Another source that contributes to ISI is the channel's series resistance caused by the skin effects [21]. The high-frequency current of a conductor does not flow uniformly throughout the cross area of the conductor. The magnetic field within the conductor forces the current of the conductor to flow only in a shallow band just underneath the surface of the conductor. The redistribution of the current increases the resistance of the conductor. This increase in resistance is called skin effect [8]. Due to the skin effect, the resistance increases at high frequencies because at higher frequencies, the current travels closer to the conductor surface, reducing the area of current flow. The signal attenuation for a RG-55U cable is shown in Fig. 2.9[21]. It is evident that the high-frequency components suffer more attenuations than the lower frequency components.

To combat wire and parasitic losses, linear equalizers have been adopted [22]-[25]. Linear equalizers are high-pass filters. The magnitude of high-frequency components is amplified to combat the low-pass effect of the channels in order to extend the usable bandwidth of the channels [26]. Another approach is to implement the filter in the digital domain by feeding



Figure 2.8: Intersymbol interference. (a) Transmitted signals. (b) Pulse spreading due to ISI. (c) Actual transmitted signals.

the input to an analog-to-digital converter (ADC) and postprocessing the ADC output with a high-pass filter [21]. Advanced techniques such as decision feedback equalization, multilevel modulation and adaptive interference cancellation have also been adopted to combat signal attenuation [26].



Figure 2.9: Transfer function of RG-55U cables.

#### **Impedance** Mismatch

Near-channel and far-channel signal reflections also lead to decreased voltage margins. Since the reflection coefficient is given by

$$\Gamma = \frac{Z_L - Z_o}{Z_L + Z_o},\tag{2.4}$$

where  $Z_L$  and  $Z_o$  are the load impedance and the channel characteristic impedance. Thus, the load impedance must match the channel characteristic impedance to minimize the signal reflections.

#### Offsets

Transistors mismatches in the transmitter and receiver circuitry can induce fixed voltage offsets whose magnitudes are independent of transmission signal swing but rather are determined by the transistor sizes and process parameters. Transmitter mismatches cause the actual output signal swing to deviate from the nominal swing. Receiver mismatches increase the minimum transmit signal swing required for an accurate signal detection. Offset-cancellation techniques commonly used in opamp designs can be applied to reduce mismatch-induced effects [2].

#### **Power Supply Noise**

Power supply noise is induced by switching large currents in short durations across the parasitic inductance in power distribution network, and is also called  $\frac{di}{dt}$  noise. Power supply noise is also caused by the large output drivers switching large currents. Techniques such as minimizing the inductance of power distribution networks, employing constant-current drivers, and keeping the total current drawn from the supply constant have been used to reduce power supply noise [2].

### 2.2.2 Inter-Signal Timing Skew

The timing margins refer to the available time duration allocated to the receiver for sampling. Data in parallel links are recovered by sampling each received data bit at the center of the data eye such that the timing margins for sampling in both directions are maximized. If all channels have the same electrical length, the optimal position of the sampling clock for each data bit is achieved by phase-shifting the master clock transmitted along with the data by half of the bit time, as shown in Fig. 2.10(a). However, mismatches in the transmitter and receiver circuitry, in the transmitter and receiver clock distributions, and in the interconnect wires (cables, printed-circuit board traces, package traces, and connectors) lead to the different arrival times of the parallel data at the receiver end. This is known as the inter-signal timing skew [2][4][5][27], as illustrated graphically in Fig. 2.10(b). Intersignal timing skew reduces the timing margins of parallel links and deteriorates the bit error rate.

Although timing skews encountered in clock distributions have been studied extensively and various compensation schemes have been proposed [28]-[32], only a few studies on intersignal timing skew of parallel links are available. It was shown in [4][5] that the effect of the inter-signal timing skew on parallel links with single-ended voltage-mode signaling can be eliminated by using the per-pin skew compensation technique where each received data bit



Figure 2.10: Data recovery in parallel links. (a) Without inter-signal timing skew. (b) With inter-signal timing skew.

has its own sampling clock positioned at the center of the data eye. In this approach, the phase difference between the received master clock and each data bit is measured individually. The phase difference is then used to adjust the position of the sampling clock for each data bit. The readjusted sampling clocks are phase-shifted versions of the received master clock such that the data bit is sampled at the center of its data eye, as shown in Fig. 2.11. Per-pin skew compensation is typically carried out in a calibration phase prior to data transmission. At the receiver end, a master clock is generated using a PLL with MCLK as its reference. Inter-signal timing skew leads to a phase difference between the master clock and each data bit. A phase detector is used to quantify this phase difference and controls a VCDL, which adds or subtracts time delays from MCLK such that MCLK'[n] is shifted to the center of the data eye for D'[n]. Per-pin skew compensation has been widely used in industry to combat inter-signal time skew up to one bit time [6][7].

### 2.2.3 Design of Deskew Buffers - A Review

The details of several novel deskew architectures are presented in this section. In [4], a single-ended parallel link interface was designed with per-pin deskew capability. The deskew process is performed in the calibration mode where the signal coming from the channel is compared with the receiver clock. DLL is used to generate six differential clocks with 30° spacings. Phase interpolator takes two adjacent clocks, and further divides the phase difference into 15 equal spaced signals where 2° separates two adjacent signals. As a result,



Figure 2.11: Per-pin skew compensation of parallel links with a single-ended signaling scheme.

90 different phase incremental steps spanning through the entire period are made available for the deskew purpose. The selection of the signal to be compared with the channel training clock is controlled by a finite state machine (FSM).

In [5], the authors proposed a bit-to-bit skew control technique for inter-chip data communications. At the receiver side, the phase difference between adjacent channel signals are compared by a phase comparing circuitry. The phase information between two adjacent signals are represented by CUP and CDN signals. If one signal leads the other, CUP is set to logic one and vice versa. CUP/CDN are sent back to the receiver front-end through a separate channel and are used to control a delay counter that controls a variable delay line. The variable delay line will apply a corresponding timing delay to the transmitted signal until the skew between adjacent signals are compensated. Both the transmitted and received signals are sampled with D flip-flops (DFFs).

In [27], the authors proposed a technique to deskew a pair of differential printed circuit

board (PCB) traces. The timing information of one PCB trace is regarded as the reference, and the deskew circuitry tries to eliminate the phase difference between the other channel and the reference channel. The VCDL is implemented based on the inverter chain. The timing delay can be controlled by tuning the output time constant of each inverting stage.

In [33], the authors proposed a resistance-based distributed deskew technique. The reference clock is sent to a chain of transmission gates (TGs) where the propagation time of each TG results in time-delayed reference clock signals. Each time-delayed signal is selected by digital control signals and compared with the data. A phase detector continuously compares the phase difference between the selected signal and the data until the two signals are in phase. The selected signal is then used to sample the data.

In [34], the authors proposed a novel deskew buffer for low-voltage differential signaling (LVDS) application implemented with a skew detector and a programmable delay block. The skew detector compares the phase difference between the two channels, and generates either Early, Good or Late signals to account for the three different scenarios where one signal leads, aligns with or lags the other signal. The programmable delay block can apply various timing delay on the input signal by changing the capacitive loading.

The power consumption, capable deskew range, technology and the data rate of the above designs are summarized in Table 2.4. Compensation accuracy refers to the timing skew after the compensation.

| Ref. | Total Power | Deskew Range | Technology           | Data Rate  | Compensation Accuracy |
|------|-------------|--------------|----------------------|------------|-----------------------|
| [4]  | 85.8mW      | 833ps        | $0.35 \mu m$         | 2.4Gbits/s | N/A                   |
| [5]  | N/A         | N/A          | $0.2 \mu \mathrm{m}$ | 5Gbits/s   | 100ps                 |
| [27] | N/A         | 1ns          | $0.25 \mu m$         | 500Mbits/s | 12.5ps                |
| [33] | N/A         | 60ps         | $0.10 \mu m$         | 2.4Gbits/s | 10ps                  |
| [34] | 300mW       | 3ns          | $0.35 \mu m$         | 250Mbits/s | 100ps                 |

Table 2.4: Deskew buffer performance

## 2.3 Chapter Summary

In this chapter, the basic architecture and different signaling schemes of source-synchronized parallel links are examined. The nonidealities in the parallel link design, the voltage noise and inter-signal timing skews, are reviewed. Voltage noise are caused by ISI, impedance mismatches, offsets, and power supply noise. Inter-signal timing skews refer to the misalignment of the data at the receiver, and reduces the sampling margin. To combat inter-signal timing skew, per-pin deskew has been adopted in industry to combat signal skews up to one bit time.

remental Simaling

## Chapter 3

# Inter-Signal Timine Skew Compensation of Source-Synchronized Parallel Links with Voltage-Mode Incremental Signaling

In this chapter, the inter-signal timing skew compensation technique for parallel links with voltage-mode incremental signaling is described. The proposed technique employs an Early/Late block to detect the rising and falling edges of adjacent undesired pulses due to inter-signal timing skews, and subsequently allocates the optimal sampling point of the samplers to maximize the timing margins with a deskew block. Two DLLs are employed to place the sampling clock of each data eye to its optimal sampling position. The skew compensation range is quantified from the delay range of the DLLs. In section 4.1, the overall architecture of the parallel link interface is described. In section 4.2, the structure and the simulation results of the receiver is presented. In section 4.3, the design of the Early/Late block is introduced. Section 4.4 provides a brief overview of the delay-locked loop. Section 4.5 presents the design of the inter-signal timing skew compensation circuitry. In section 4.6, the effectiveness of the proposed deskewing method is validated using a 1 Gbytes/s parallel link implemented in UMC-0.13 $\mu$ m 1.2V CMOS technology with four microstrip channels on a FR4 substrate [8]. The chapter is concluded in section 4.7.

### 3.1 Architecture

The configuration of parallel links with voltage-mode incremental signaling and inter-signal timing skew compensation is shown in Fig. 3.1.  $V_{c1-c5}$  are used to control the Early/Late blocks and Deskew blocks. To transmit N-bit data, N+1 channels are required for data and clock. The operation of the system consists of two phases, namely calibration phase and data transmission phase, which are selected by  $V_{c5}$ . Inter-signal timing skew compensation is performed in the calibration phase by setting  $V_{c5}$  to logic-0. Square-wave signals with 50% duty cycle are sent to all channels. Inverter drivers are used at the near end of the channels to ensure the rail-to-rail swing of the transmitted voltage signals. Shunt termination is employed at the far end of the channels.



Figure 3.1: Parallel links with voltage-mode incremental signaling and per-pin skew compensation.

Inter-signal timing skew between D'[n] and D'[n+1] leads to an undesired  $T_{skew}$  portion in the output of the comparator D''[n], as shown in Fig. 3.2, where  $T_{skew}$  is the width of D''[n]. The timing margin is reduced from  $\frac{T_{bit}}{2}$  without skew to  $\frac{T_{bit} - T_{skew}}{2}$  with skew. As a result, the optimal sampling location of MCLK'[n] is at the middle of two adjacent  $T_{skew}$  pulses, as shown in Fig. 3.2. To move MCLK'[n] to this new optimal location, the Early/Late block in Fig. 3.1 first generates an early signal VE[n] and an late signal VL[n] from D"[n]. VE[n] is aligned with the rising edge of D"[n] while VL[n] is aligned with the falling edge of D"[n], as shown in Fig.3.3. The phase difference between VE[n] and VL[n] is  $T_{bit} + T_{skew}$ . The Deskew block in Fig. 3.1 uses VE[n] as the reference signal, and adds a phase delay to VL[n] or subtract a phase delay from VL[n] using a DLL until VL[n] is aligned with VE[n]. The mid-way through the phase shift applied to VL[n] marks the location that has an equal distances from two adjacent  $T_{skew}$  pulses. This signal is denoted by VM[n]. Since  $T_{skew}$  varies for different channels, the location of VM[n] varies. MCLK needs to be aligned with VM[n] to create MCLK'[n] for each channel. D"[n] of each channel can be sampled with MCLK'[n] to optimize the timing margins.



Figure 3.2: Comparator output D''[n] for different  $T_{skew}$ .

Fig. 3.3 illustrates two different phase shifts applied to VL[n]. Due to the difference in the delay range of the VCDL caused by process variations and temperature drift, VL[n] could be aligned with any arbitrary rising edge of VE[n]. In Fig. 3.3(a), the rising edge of VL[n] aligns with the second rising edge of VE[n] while in Fig. 3.3(b), it aligns with the third rising edge of VE[n]. In both cases, VM[n] falls in the middle between two adjacent pulses of D"[n]. It can be verified that VM[n] will always fall in the middle of two adjacent pulses of D"[n] as long as VL[n] aligns with the N-th rising edge of VE[n].

#### 3.2 Comparator

The implementation of the differential comparators employs Gilbert cell and differential amplifier as shown in Fig. 3.4. The generic architecture of Gilbert cell is shown in Fig. 3.5.



Figure 3.3: Timing diagram of Early/Late block.

The first differential pair (DP1) consists of transistors M1, M3 and M6 with the voltage gain  $g_{m3,6}R_{out}$ . The second differential pair (DP2) is formed by M2, M4 and M5 with the voltage gain  $-g_{m4,5}R_{out}$ .  $V_{out}$  depends on the gains of both differential pairs

$$V_{out} = (g_{m3,6}R_{out} - g_{m4,5}R_{out})V_{in}.$$
(3.1)

The biasing current  $I_{SS}$  is split between the two differential pairs, and  $V_{cont}$  determines the current allocated for each differential pair. Since  $g_m$  is proportional to the biasing current, which is controlled by  $V_{cont}$ ,  $V_{in}$  and  $V_{cont}$  together determines the polarity and magnitude of  $V_{out}$ . Positive  $V_{cont}$  will make DP1 predominant to  $V_{out}$  polarity since the current distribution is in favor of DP1, and  $V_{in}$  and  $V_{out}$  will have the same polarity as illustrated by the right-half plane of Fig. 3.5(b). Negative  $V_{cont}$  leads to DP2 receiving more biasing current, and  $V_{in}$  and  $V_{out}$  having the opposite polarity according to the left-half plane in Fig. 3.5(b). The flat portion in Fig. 3.5 stems from the large magnitude of  $V_{cont}$  that turns off M1 or M2 such that the entire  $I_{SS}$  is steered to DP1 or DP2. Further increasing or decreasing of  $V_{cont}$  will not affect  $V_{out}$ . When  $V_{cont}$  reaches zero,  $I_{SS}$  is split equally between DP1 and DP2. The gain of DP1 and DP2 cancels with each other, resulting in  $V_{out} = 0$ .

The differential comparator in Fig. 3.4 takes on a slight modification of Fig. 3.5, where  $V_{in+}$  is connected to  $V_{cont+}$  and  $V_{in-}$  is connected to  $V_{cont-}$ . This architecture ensures the same polarity for  $V_{in}$  and  $V_{cont}$  which always leads to positive  $V_{out}$ , as shown by the upper-half plane of Fig. 3.5(b). The opposite polarities of D'[n] and D'[n + 1] will always lead to positive  $V_{out}$ . When the signal level of D'[n] equals to that of D'[n + 1], the biasing current is split equally between the two differential pairs.



Figure 3.4: Schematic of comparator. Transistor sizes:  $W_{1,2} = 5\mu m$ ,  $W_{3-6} = 27\mu m$ ,  $W_{7,8} = 50\mu m$ ,  $W_{9,10} = 5\mu m$ ,  $W_{11,12} = 10\mu m$ ,  $W_{13} = 40\mu m$ .  $L = 0.13\mu m$  for all transistors. Biasing:  $I_{SS} = 2mA$ ,  $V_{b1} = 0.8V$ ,  $V_{b2} = 0.6V$ .

The differential amplifier converts the differential signal  $V_{out}$  into single-ended D"[n] signal. Opposite signal polarities for D'[n] and D'[n + 1] lead to positive  $V_{out}$ , which results in logic high for D"[n]. When D'[n] and D'[n + 1] have the same polarity, the common-mode voltage level of  $V_{out}$  forces D"[n] to logic low. This is achieved through sizing transistor and  $I_{SS}$ .

The simulated transient response of the comparator is shown in Fig. 3.6. Two scenarios for  $T_{skew}$  at 60ps and 310ps are demonstrated. The minimum detectable skew is set by the sensitivity of the comparator. Simulated waveforms of D"[n] with signal skew from 50 ps to 65 ps with step 5 ps are shown in Fig. 3.7. When the signal skews is reduced to 50 ps, the amplitude of D"[n] is reduced to less than half of  $V_{DD}$ . The minimum detectable signal skew is estimated to be 55 ps.

## 3.3 Early/Late Block

The Early/Late block shown in Fig. 3.8 generates VE[n] and VL[n] from D"[n] triggered by  $V_{c1}$ . Fig. 3.10 shows the simulation results of VE[n] and VL[n]. The logic-0 state of  $V_{c1}$  sets







Figure 3.6: Simulated transient response of comparator. Left:  $T_{skew} = 60ps$ . Right:  $T_{skew} = 310ps$ .



Figure 3.7: Simulated waveforms of D''[n] with various signal skews.

VE[n] and VL[n] to logic-0 and logic-1 initially.  $V_{c1}$  is set to logic-1 at  $T_1$ , as shown in Fig. 3.9. The first falling edge after the arrival of  $V_{c1}$  alters the selection of the multiplexers. The two inverters force VE[n] and VL[n] to change their respective logic states upon the arrival of the positive and negative edges of D"[n]. It is evident that VE[n] and VL[n] carry the correct timing information of the rising and falling edges of D''[n].



Figure 3.8: Schematic of Early/Late block.



Figure 3.9: Timing diagram of control voltages. T1 is the time instant at which the Early/Late block is enabled to detect the rising and falling edges of D''[n]. T2 is the time instant at which DLL1 is enabled, and Vfb[n] starts to align with VE[n]. T3 is the time instant at which DLL2 is enabled where Vfb[n] is aligned with VE[n] and MCLK'[n] starts to align with VM[n]. T4 is the time instant at which MCLK'[n] and VM[n] are aligned and DLL2 is disabled. T5 marks the start of the data transmission phase and the end of the calibration phase.

The DFFs in Fig. 3.8 are implemented using true single phase clocking (TSPC) logic gates to take the advantages of their simple configurations subsequently high speed [35]. Another advantage of using TSPC-based DFFs is that both rising-edge triggered and falling-edge triggered TSPC-DFFs can be triggered by D''[n] without using its complement.

#### 3.4 Delay-Locked Loop

DLLs are widely used in clock generation, serial links, clock de-skew buffers, dynamic random access memory (DRAM) interfaces, high-speed microprocessors and application-specified



Figure 3.10: Simulated waveforms of the voltages of the Early/Late block. Left:  $T_{skew} = 60ps$ . Right:  $T_{skew} = 310ps$ .  $T_1 = 1.5ns$ .

integrated circuits (ASICs) [36][37]. DLL can be regarded as a control system which adjusts phase rather than frequency. Therefore, they cannot perform frequency multiplication. DLLs do not suffer from phase-error accumulation and stability is easier to sustain. A generic DLL structure is shown in Fig. 3.11(a).



Figure 3.11: (a) Typical DLL architecture. (b) DLL model.

The open-loop response of the DLL can be expressed as

$$\frac{\Phi_{out}}{\Phi_{in}}\left(s\right) = \frac{I_{cp}F_{ref}}{sC}K_{vcdl},\tag{3.2}$$

where  $K_{vcdl}$  is the gain of the VCDL and  $F_{ref}$  is the frequency of the input signal.  $\Phi_{in}$ and  $\Phi_{out}$  are the phase of the input and output signals. The closed-loop response can be expressed as

$$\frac{\Phi_{out}}{\Phi_{in}} = \frac{H(s)}{1 + H(s)}.$$
(3.3)

Thus,

$$\frac{\Phi_{out}}{\Phi_{in}} = \frac{1}{1 + \frac{sC}{I_{cp}F_{ref}K_{vcdl}}} = \frac{1}{1 + \frac{s}{\omega_N}},$$
(3.4)

where  $\omega_N = I_{cp} F_{ref} K_{vcdl}$ , and is the loop bandwidth.

#### 3.5 Deskew Block

The Deskew blocks in Fig. 3.1 shift MCLK to the middle point between two adjacent D''[n]pulses for each channel. The schematic of the Deskew block is shown in Fig. 3.12. It consists of two cascaded DLLs. The simplified schematic of the phase-frequency detector (PFD) and that of the charge pump are shown in Fig. 3.13. The charge pump and the current-starved VCDL were used in [38]. The function of DLL1 is to align VL[n] with VE[n], as shown in Fig. 3.3, and generate VM[n]. The output of VCDL1, denoted by VM[n], is fed to PFD2 of DLL2. VM[n] taken from the middle of VCDL1 will always have an equal phase difference to VL[n] and Vfb[n]. Two pairs of static inverters are inserted before and after VM[n] to sharpen the waveforms. The minimum and maximum phase delays are delivered by VCDL1 when Vctrl1[n] is set to 0 and  $V_{DD}$  respectively. The dependence of the delay of VCDL1 on Vctrl1[n] at process corners is shown in Fig. 3.14. Table 3.1 tabulates  $T_{min}$ ,  $T_{max}$ ,  $T_{cal}$ , Range1, and Range2 of VCDL1 at process corners, where  $T_{min}$  and  $T_{max}$  are the minimum and maximum phase delays of VCDL1,  $T_{cal}$  is the phase delay when Vctrl1[n] is set to  $V_{cal}$ , Rangel is the difference between  $T_{cal}$  and  $T_{min}$ , and Rangel is the difference between  $T_{max}$ and  $T_{cal}$ . It should be noted that current-starve VCDLs implemented in TSMC-0.18 $\mu$ m CMOS in [38] has a cycle-to-cycle jitter of 10.42 ps. Similar VCDLs were used in [36][37][39] with measured cycle-to-cycle jitter 8 ps, 14 ps, and less than 25 ps, respectively. These

observations reveal that the minimum deskew range or the sensitivity of the deskew network is set by the sensitivity of the comparators, which was shown to be 50 ps earlier, rather than the jitter of the VCDLs.



Figure 3.12: Schematic of Deskew block. When  $V_{c2}$ =Logic-0, the output of the MUX is  $V_{cal}$ . When  $V_{c2}$ =Logic-1, the output of the MUX is from the preceding charge pump. The same holds for  $V_{c3}$ .  $V_{c4}$  is set to Logic1 when VM[n] and MCLK'[n] are aligned. Transistor sizes:  $W_{1,2} = 12\mu m$ ,  $W_3 = 6\mu m$ ,  $W_4 = 4\mu m$ ,  $W_5 = 10\mu m$ ,  $W_6 = 4.5\mu m$ ,  $W_7 = 5\mu m$ ,  $W_8 = 2.3\mu m$ ,  $W_9 = 2.07\mu m$ .  $L = 0.13\mu m$  for all transistors.  $C_1 = C_2 = 4pF$ .  $V_{cal} = 0.61V$ .

 $V_{c2}$  is set to logic-0 initially to force Vctrl1[n] equal to  $V_{cal}$ . By setting Vctrl1[n] to  $V_{cal}$ , the phase delay of VCDL1 can be added to or subtracted from  $T_{cal}$  and the maximum delay



Figure 3.13: (a) Schematic of PFD. (b) Schematic of charge pump. Transistor sizes:  $W_{1,3,5} = 2.5\mu \text{m}, W_{2,4,6} = 1\mu \text{m}, W_{7,11} = 12.5\mu \text{m}, W_{8,12} = 7.5\mu \text{m}, W_{9,13} = 3\mu \text{m}, W_{10,14} = 5\mu \text{m}. L = 0.13\mu \text{m}$  for all transistors.



Figure 3.14: Dependence of the delay of VCDL on  $V_{ctrl1}$  at process corners.

ranges in both directions given by Range1 and Range2 can be obtained. When  $V_{c2}$  is set to logic-1 at  $T_2$ , as shown in Fig. 3.9, the charge pump will start to take control of Vctrl1[n] and DLL1 will align Vfb[n] with VE[n]. The maximum delay range of VCDL1 is set to be  $2T_{bit}$  in both directions, as illustrated in Fig. 3.15 where Vfb[n] needs to be adjusted by  $2T_{bit}$ in order to align with VE[n]. The delay ranges are set by the number of the delay stages. From Table 3.1, both Range1 and Range2 exceed 2 ns at all process corners.

The location of VM[n] with different Vctrl1[n] is shown in Fig. 3.16. The phase difference

| Corner           | SS     | FS    | TT    | SF    | FF    |     |
|------------------|--------|-------|-------|-------|-------|-----|
|                  | [ns]   | [ns]  | [ns]  | [ns]  | [ns]  |     |
| $T_{min}$        | 7.345  | 5.661 | 5.603 | 5.619 | 4.529 |     |
| T <sub>cal</sub> | 18.73  | 10.48 | 9.9   | 9.338 | 6.545 |     |
| T <sub>max</sub> | 26.76  | 13.52 | 13.61 | 13.58 | 8.587 |     |
| Range1           | 11.385 | 4.819 | 4.297 | 3.719 | 2.016 | -29 |
| Range2           | 8.03   | 3.04  | 3.71  | 4.242 | 2.042 |     |
| Stage Delay      | 0.324  | 0.131 | 0.133 | 0.133 | 0.068 |     |

 Table 3.1: Time delay range of VCDL at process corners.



Figure 3.15: Maximum phase shift. (a)  $2T_{bit}$  to be subtracted from Vfb[n]. (b)  $2T_{bit}$  to be added to Vfb[n].

between Vfb[n] and VL[n] is 5.603 ns, 9.9 ns, and 13.61 ns, as shown in Table 3.1 when Vctrl1[n] is set to 0,  $V_{cal}$ , and  $V_{DD}$ , respectively. It is evident that the phase difference between VL[n] and VM[n], and that between VM[n] and Vfb[n] are identical.

Since VM[n] is generated from VE[n] and VL[n], periodic D"[n] in Fig. 3.2 is available only in the calibration phase where training data are sent to all channels. When random data are transferred through the channels in the data transmission phase, D"[n], VE[n], VL[n] and VM[n] will all be affected. Therefore, the location of VM[n] needs to be preserved before the start of the data transmission phase. DLL2 of each channel will shift its MCLK until it is aligned with VM[n]. Vctrl2[n] is pre-set to  $V_{cal}$  and MCLK is initially shifted by  $T_{cal}$ . By setting  $V_{c3}$  to logic-1 at  $T_3$ , as shown in Fig. 3.9, the phase adjustment of MCLK is triggered where  $V_{c4}$  is still set to logic-0 and the charge pump is connected to PFD2. When



Figure 3.16: Simulated transient response of VL[n], VM[n] and Vfb[n] for various Vctrl1[n]. Top: Vctrl1[n]=0. Middle: Vctrl1[n]= $V_{cal}$ . Bottom: Vctrl1[n]=1.2V.

MCLK'[n] aligns with VM[n],  $V_{c4}$  is set to logic-1 at  $T_4$ . The inputs to the charge pump are forced to the ground. Vctrl2[n] will remain unchanged thereafter. As a result, the phase delay applied to MCLK in the calibration phase is carried over to the data transmission phase where MCLK'[n] is no longer affected by VM[n].

#### 3.6 Simulation Results

To assess the performance of the proposed inter-signal timing skew compensation technique, a 2-bit four-channel parallel link with voltage-mode incremental signaling has been designed in UMC-0.13µm 1.2V CMOS technology, and the schematic is shown in Fig. 3.17. The inputs consist of two data bits D[1] and D[2] and the master clock MCLK. The schematic of the driver (Tx) is shown in Fig. 3.18(a). The channels are modeled with microstrip lines on a FR4 substrate, and the channel configuration is shown in Fig. 3.18(b). Intersignal timing skews are introduced by varying the length of the microstrip lines. Since the signal propagation velocity  $\nu$  equals to  $\nu = \frac{c}{\sqrt{\epsilon_r}}$  where c is the speed of light [8], the signal propagation delay  $T_D$  through the channel can be estimated as

$$T_D = \frac{L_n}{\nu},\tag{3.5}$$

where  $L_n$  refers to the channel length.  $T_D$  values for  $L_{1-4}$  are calculated to be 0.969ns, 0.820ns, 0.745ns and 1.118ns respectively. Therefore, the skews between  $L_2$  and  $L_3$  is 75ps, and 373ps between  $L_3$  and  $L_4$  while the measured values are 60ps and 310ps. The parallel link was analyzed using SpectreRF from Cadence Design Systems with BSIM3V3 device models.



Figure 3.17: Four-channel parallel link with voltage-mode incremental signaling and per-pin skew compensation.  $L_1 = 0.13m$ ,  $L_2 = 0.11m$ ,  $L_3 = 0.1m$ ,  $L_4 = 0.15m$ .  $R = 50\Omega$ .



Figure 3.18: (a) Schematic of driver. Transistor sizes:  $W_1 = 1\mu m$ ,  $W_2 = 2\mu m$ ,  $W_3 = 4\mu m$ ,  $W_4 = 8\mu m$ ,  $W_5 = 16\mu m$ ,  $W_6 = 32\mu m$ ,  $W_7 = 64\mu m$ ,  $W_8 = 128\mu m$ .  $L = 0.13\mu m$  for all transistors. (b) Microstrip line configuration.  $H_1 = 200\mu m$ ,  $H_2 = 500\mu m$ ,  $W = 734\mu m$ ,  $\epsilon_r = 5$ .

The timing diagram of the input signals (D[1], D[2], MCLK and the training clocks) is shown in Fig. 3.19. Before  $T_5$ , the system operates in the calibration phase where  $V_{c5}$  is set to logic-0. Training data are sent to the channels. Since the length of channel three is shorter than that of channels two and four, D'[2] leads D'[1] and D'[3]. The dashed signals represent the phase aligned signals where no signal skews exist. The signal skews lead to the spikes in D''[1] and D''[2] represented as the gray areas. The simulated waveforms of the far-end signals before  $T_5$  are shown in Fig. 3.20. After  $T_5$ , the system operates in the data transmission mode, and D[1] and D[2] are sent to the channels instead of training clock signals where both D[1] and D[2] are in phase with MCLK. The frequencies of MCLK and D[1] are 500MHz (1ns  $T_{bit}$ ) and 250MHz for D[2]. The received D'[2] represents D[1], and D'[3] represents the XOR result of D[1] and D[2]. When signals skews are not presented, D'[2] and D'[3] are out of phase by one bit time. Inter-signal timing skews has added additional phase delay to D'[3]. D"[2] results from the signal difference between D'[2] and D'[3], and signals skews have created undesired spikes and dips marked by the gray areas. The deskewed MCLK'[1]and MCLK'[2] fall in the middle between two adjacent spikes in the calibration phase. In the data transmission phase, MCLK'[1] is shifted to approximately the center of the data eye of D''[1]. MCLK'[2] is shifted to avoid sampling the spikes and the dips. The simulated waveforms of the received signals when the system operates in the data transmission mode are shown in Fig. 3.21. The spikes at 141ns, 145ns and 149ns, and the dips at 143ns 147ns are due to inter-signal timing skews.

As the VCDL delay range varies at process corners,  $T_{1-5}$  in Fig. 3.9 are selected accordingly based on the required locking time of the DLL with the values shown in Table 3.2.

The deskew process of MCLK'[1] with respect to D"[1] is shown in Fig. 3.22. The Early/Late block is triggered by  $V_{c1}$  at 3.5ns, and VE[n] and VL[n] are continuously supplied to the downstream Deskew block. Both Vctrl1[1] and Vctr2[1] are initially set to  $V_{cal}$  or 0.61V. DLL1 is turned on at 15ns by setting  $V_{c2}$  to logic-1, and Vfb[1] gradually aligns with VE[n]. At 65ns,  $V_{c3}$  is set to logic-1 to start up DLL2, and MCLK'[1] gradually aligns with





VM[1] generated by DLL1.  $T_4 = 115$ ns marks the time instance where the charge pump is disconnected from the PFD. Vctrl2[1] is held constant thereafter. Similar deskew process of MCLK'[2] with respect to D"[2] is shown in Fig. 3.23. It is evident that both MCLK'[1] and



Figure 3.21: Simulated receiver signals after  $T_5$ .

| Corner    | $T_1$  | $T_2$  | $T_3$  | $T_4$   | $T_5$  |  |
|-----------|--------|--------|--------|---------|--------|--|
| SS 3.5 ns | 3.5 ns | 35  ns | 200 ns | 400 ns  | 410 ns |  |
| FS        | 3.5 ns | 25  ns | 75 ns  | 125 ns  | 135 ns |  |
| TT        | 3.5 ns | 15 ns  | 65 ns  | 115 ns  | 125 ns |  |
| SF        | 3.5 ns | 15 ns  | 65 ns  | 115  ns | 125 ns |  |
| FF        | 3.5 ns | 15  ns | 100 ns | 200 ns  | 210 ns |  |

Table 3.2: Timing schedule

MCLK'[2] fall approximately in the middle of two adjacent  $T_{skew}$ . After  $T_5$ , MCLK'[1] and MCLK'[2] are used to sample D"[1] and D"[2] respectively where the relocated MCLK'[1] and MCLK'[2] have optimized the timing margins, as shown in Fig. 3.24. The rising and falling edges of MCLK'[1] and MCLK'[2] sample 10101010... and 11001100... which corresponds to the transmitted D[1] and D[2] signals.

Table 3.3 summarizes the performance of this parallel link. The power consumption is for the calibration phase. In the data transmission phase, the switching frequencies of D[1]and D[2] will be less than that of MCLK thus the power consumption is expected to be less.



Figure 3.22: Simulation results of deskewing process for MCLK'[1] with respect to D"[1] for 60ps skew.

### 3.7 Chapter Summary

An inter-signal timing skew compensation technique for parallel links with voltage-mode incremental signaling has been proposed and its implementation details have been presented. Skew compensation is performed in the calibration phase where identical training data are sent to both the data and clock channels. Inter-signal timing skew compensation is attained by phase-shifting the master clock transmitted in a dedicated clock channel that are in



Figure 3.23: Simulation results of deskewing process for MCLK'[1] with respect to D"[1] for 310ps skew.

parallel with data channels by an amount that is determined by the inter-signal timing skew of adjacent channels. The proposed deskew technique employs Early/Late blocks to detect the rising and falling edges of the output of comparators whose inputs are the voltages of the adjacent channels at the far end. The D-flipflops of the Early/Late blocks are implemented using TSPC logic to take the advantage of its simple configuration, subsequently a high operation speed and no-need for inverting triggering signals. Two cascade delay-locked loops are employed to place the sampling clock of each channel to the optimal sampling



Figure 3.24: Waveforms of MCLK'[1], MCLK'[2], D"[1], and D"[2] after deskew.

| Technology                         | UMC-0.13µm 1.2V            |  |  |
|------------------------------------|----------------------------|--|--|
| Data rate                          | 1 Gbytes/s                 |  |  |
| Comparator transistor area         | $318 \times 0.13 \mu m^2$  |  |  |
| Early/Late block transistor area   | $117 \times 0.13 \mu m^2$  |  |  |
| Deskew block transistor area       | $4272 \times 0.13 \mu m^2$ |  |  |
| Comparator power consumption       | 4.253 mW                   |  |  |
| Early/Late block power consumption | 0.682 mW                   |  |  |
| Deskew block power consumption     | 17.212 mW                  |  |  |

Table 3.3: Performance of parallel link interface

position of each data bit such that the timing margins are maximized. The upper bound of the deskew range is set by the time delay of the delay-locked loops whereas the lower bound of the deskew range is set by the sensitivity of the comparators. The simulation results of a 1 Gbytes/s 4-channel parallel link with microstrip channels on a FR4 substrate have demonstrated that the proposed deskewing method is capable of compensating for up to  $\pm 1$ ns inter-signal timing skew.

## Chapter 4

# Inter-Signal Timine Skew Compensation of Source-Synchronized Parallel Links with Current-Mode Incremental Signaling

In this chapter, a new inter-signal timing skew compensation technique for parallel links with current-mode incremental signaling is proposed. New current-mode transmitters and receivers are proposed to minimize the signal-dependent impedance mismatch and the effects of  $V_{DD}$  fluctuations, and provide tunable matching input impedance. Both the transmitters and receivers of the parallel links are current-mode configured such that the intrinsic advantages of current-mode signaling are preserved. We show that each receiver maps the direction of its channel current, which represents the logic state of the incoming data, to two voltages of different values. The logic states of the transmitted data are recovered by the voltage comparators. The use of feedback in the front-end of the receiver eliminates the dependence of the receiver input impedance on the direction of the channel current so that signal-dependent impedance mismatch is minimized. The use of replica-biasing techniques minimizes the effect of supply voltage fluctuation on the performance of the front-end. Intersignal timing skews are eliminated by inserting a VCDL for each channel whose time delay is determined by the phase difference between the transmitted master clock and the output of

the recovering comparator. Section 5.1 introduces the overall architecture of the parallel link interface. Section 5.2 describes the current-mode incremental signaling scheme. Section 5.3, 5.4, and 5.5 present the design of the transmitter, receiver, and the voltage comparator. Section 5.6 introduces the inter-signal timing skew compensation circuitry. Section 5.7 presents the simulation results of a 2-bit sample parallel link interface. The chapter is concluded in section 5.8.

## 4.1 Architecture

The architecture of parallel links with inter-signal timing skew compensation and currentmode incremental signaling is shown in Fig. 4.1. The inputs of the parallel link consist of N-bit parallel data  $D[1] \sim D[n]$  and a master clock MCLK. The system can operate in either the transmission mode or the deskew mode. Logic-1 and logic-0 of the control signal  $V_{c3}$  are used to select either the transmission mode in which data and clock are transferred through the channels, or the deskew mode in which identical square-wave training signals are sent to both the clock channel and the data channels. The deskew operation is required before data transmission takes place, and is typically performed in a calibration phase where the time delay of each Deskew block is determined and carried over into the data transmission phase. The current-mode driver Tx converts the incoming parallel data into channel currents such that a one-to-one mapping between the logic state of the incoming data and the direction of the channel current exists. The receiver Rx converts the received channel current into a pair of voltage signals  $v_S$  and  $v_L$  that have different amplitudes of swings, and provides a tunable matching impedance to the channel so as to minimize signal reflection at the interface between the channel and the receiver. The voltage comparator amplifies the signal difference between  $v_L[n]$  and  $v_S[n+1]$  to rail-to-rail, and recovers the logic states of the transmitted data. The inter-signal timing skew is compensated by delaylocked loop (DLL) based deskewing blocks that add or subtract an appropriate amount of time delay to  $D'[1] \sim D'[n]$  such that  $D''[1] \sim D''[n]$  are all phase aligned with MCLK". As a result, all N-bit data can be sampled with a global sampling clock MCLK".



Figure 4.1: Parallel links with current-mode incremental signaling and inter-signal timing skew compensation.

#### 4.2 Current-Mode Incremental Signaling

A one-to-one mapping exists between the logic state of the input data D[n] and the direction of the corresponding channel current  $i_{in}[n]$ . D[n] supplies or sinks a 2 mA current from the channel through the current-mode transmitter Tx, as shown in Fig. 4.4. The magnitude of the channel current is set to 2 mA to comply with the LVDS standards [40][41]. The receiver Rx shown in Fig. 4.5 converts  $i_{in}[n]$  into two different signals  $v_S[n]$  and  $v_L[n]$ , as shown in Fig. 4.2. The swings of  $v_L[n]$  is larger than that of  $v_S[n]$ . To recover D[n],  $v_L[n]$ and  $v_S[n + 1]$  are sent to the non-inverting and inverting inputs of the downstream voltage comparator, as shown in Fig. 4.14. It can be easily verified that D'[n] corresponds to the signal difference between  $v_L[n]$  and  $v_S[n + 1]$ , and the logic state of D'[n] corresponds to the logic state of D[n].

When inter-signal timing skews exist between  $D[1] \sim D[3]$ , as shown in Fig. 4.3, the phase difference between D[1] and D[2] is carried over into the phase difference between



Figure 4.2: Timing diagram of comparator without inter-signal timing skew.

 $v_L[1]$  and  $v_S[2]$ . Similarly, the phase difference between D[2] and D[3] is carried over to the phase difference between  $v_L[2]$  and  $v_S[3]$ . The voltage comparators compare the signal difference between  $v_L[1]$  and  $v_S[2]$ , and that between  $v_L[2]$  and  $v_S[3]$ , and recover D'[1] and D'[2]. It is evident that the phase difference between D[1] and D[2] corresponds to the phase difference between D'[1] and D'[2] while the logic states of the recovered signals D'[1]and D'[2] are not distorted. An advantage of the proposed incremental signaling scheme is that inter-signal timing skews only lead to timing skews between the recovered signals D'[1] - D'[n] and do not affect their logic state. Consequently, the design of the downstream deskew circuitry can be greatly simplified.

### 4.3 Current-Mode Transmitter

The schematic of the current-mode transmitter Tx is shown in Fig. 4.4(a) where the logic state of D[n] determines if the transmitter supplies the current to or sinks the current from the channel. If D[n] is Logic-1,  $M_2$  is off. Therefore,  $i_{D2}$ ,  $i_{D3} = 0$ , as shown in Fig. 4.4(b). The transmitter supplies  $i_{in}[n]$  to the channel via  $M_1$  and the magnitude of  $i_{in}[n]$  is determined by the biasing voltage  $V_b$  and the size of  $M_1$ . If D[n] is Logic-0, as shown in Fig.



Figure 4.3: Timing diagram of the incremental signaling scheme when inter-signal timing skews exist.

4.4(c),  $M_{1-3}$  are all turned on and  $i_{D1} + i_{in}[n] = i_{D2}$ . The width of  $M_{2,3}$  is made larger than the width of  $M_1$  such that the transmitter sinks  $i_{in}[n]$  from the channel. As the current is supplied to the channel, the drain voltage of  $M_1$  increases and vice versa. The magnitude of the swing depends on the characteristic impedance of the channel and the magnitude of  $i_{in}[n]$ .  $V_b$  is set such that  $M_1$  remains in the saturation region regardless of the swing at the drain of  $M_1$ . As a result, the current drained by  $M_1$  from  $V_{DD}$  is relatively constant. The effect of switching noise due to D[n] switching is minimized.



Figure 4.4: Simplified schematic of current-mode transmitter.

#### 4.4 Current-Mode Receiver

The schematic of the receiver is shown in Fig. 4.5. The receiver Rx performs the following tasks : (1) converts the received channel current  $i_{in}$  into a pair of voltage signals  $v_S$  and  $v_L$  with different magnitudes, and (2) provides a tunable matching impedance to the channel to minimize signal reflection at the interface between the channel and the receiver.



Figure 4.5: Simplified schematic of receiver. Transistor sizes:  $W_1 = 116\mu m$ ,  $W_2 = 90\mu m$ ,  $W_{3,5} = 60\mu m$ ,  $W_{4,6} = 90\mu m$ ,  $W_7 = 43\mu m$ ,  $W_8 = 3\mu m$ ,  $W_9 = 3.5\mu m$ ,  $W_{10} = 40\mu m$ ,  $W_{11} = 20\mu m$ .  $L = 0.13\mu m$  for all transistors. DC biasing:  $V_{G9} = 0.73V$ ,  $V_{G7} = 0.8V$ ,  $V_{G3} = 0.94V$ ,  $V_{G2} = 0.45V$ ,  $V_{G4,6} = 0.65V$ ,  $V_{G11} = 0.6V$ .

#### 4.4.1 Current-to-Voltage Mapping

When  $i_{in}$  flows into the receiver, the drain voltages of  $M_2$  and  $M_7$  increase thus  $i_{D1}$  increases. Consequently,  $i_{D3}$  and  $i_{D5}$  decrease since  $i_{D1} + i_{D3} + i_{D5} = i_{D7}$  and  $i_{D7}$  is approximately constant. As a result, the voltage levels of  $v_S$  and  $v_L$  will rise. When  $i_{in}$  flows out of the receiver, the voltages at the drain of  $M_2$  and  $M_7$  will decrease. The currents of  $M_{3,4}$  and  $M_{5,6}$  will increase thus the voltage levels of  $v_S$  and  $v_L$  will decrease. The common-source amplifier formed by  $M_{10,11}$  takes the drain voltage of  $M_7$  as the input, and controls the gate voltage of  $M_5$ . As a result, the swings of  $v_L$  is amplified and becomes much larger than that of  $v_S$  such that the voltage difference between  $v_S$  and  $v_L$  can be easily picked up by the downstream voltage comparator. Fig. 4.6 plots the simulated waveforms of  $v_L$  and  $v_S$ . The difference between  $v_L$  and  $v_S$  is approximately 0.2V.

The dominant poles of the signal path from  $i_{in}$  to  $v_S$  and  $v_L$  are at the outputs  $v_S$  and  $v_L$ , and the frequency response is shown in Fig. 4.7. The bandwidth of the receiver exceeds



Figure 4.6: Simulated waveforms of  $v_L$ ,  $v_S$  and the comparator output D'[n] under TT.



Figure 4.7: Simulated frequency response of  $v_L$  and  $v_S$ .

#### 4.4.2 Swing-Independent Input Impedance

The feedback amplifier  $M_{8,9}$  minimizes the variation of the input impedance of the receiver due to the change of the direction of its input current. The source-follower configuration consisting of  $M_{8,9}$  ensures that  $v_{G1}$  tracks  $v_{in}$  closely. When  $i_{in}$  changes from flowing into the receiver to flowing out of the receiver,  $v_{D2}$  drops and  $v_{G1}$  drops by the same amount ideally. As a result, the change in the gate-source voltage of  $M_1$  is relatively small, and  $g_{m1}$ is approximately constant. Consequently, the input impedance of the receiver given by

$$R_{in} \approx \frac{1}{(1-\beta)g_{m1}} \tag{4.1}$$

at low frequencies remains unchanged where  $\beta$  is the voltage gain of the source follower  $M_{8,9}$ . A side effect of the feedback amplifier  $M_{8,9}$  is the increase of the input impedance since  $\beta < 1$ . The increase in the input impedance can be offset by increasing the width of  $M_{1,2}$ .

Fig. 4.8(top) plots the input impedance of the receiver with the feedback amplifier when the input current of the receiver is 0 and  $\pm 2$  mA. Fig. 4.8(bottom) compares the dependence of the receiver input impedance on the input current with and without the feedback amplifier. It is evident that the input impedance of the receiver is reduced from over 80  $\Omega$  without the feedback amplifier to 60  $\Omega$  with the feedback amplifier when the input current flows away from the receiver. The effect of the feedback amplifier on the input impedance is relatively small when current flows into the receiver.



Figure 4.8: Top: Simulated receiver input impedance with the feedback amplifier when  $i_{in} = 0, \pm 2mA$ . Bottom: Dependence of the receiver input impedance on the input current with and without the feedback amplifier.

#### 4.4.3 Common-Mode Voltage Stabilization

Since the difference between  $v_L$  and  $v_S$  will be sensed by the downstream voltage comparator and the output of the comparator represents the logic state of the received data, it becomes

#### PROPERTY OF RYERSON UNIVERSITY LIBRARY

critical to ensure that the voltage difference between  $v_L$  and  $v_S$  is not affected by  $V_{DD}$  fluctuation. Replica biasing is a well-known technique to suppress common-mode disturbances [42][43]. This technique is employed to stabilize  $v_S$  and  $v_L$ . The simplified schematic of the receiver with replica-biasing is shown in Fig. 4.9.



Figure 4.9: Simplified schematic of receiver with replica biasing.

 $V_{DD}$  fluctuations directly affect  $V_{GS2,4,6,11}$ . As a result, the biasing currents in these four branches are altered. The drain voltages of  $M_{2,4,6,11}$  will change accordingly. The variation of  $V_{DD}$ , denoted as  $v_{dd}$ , is amplified by  $M_{4,6}$  which behave as common-gate amplifiers with the voltage gain of

$$\frac{v_{S,L}}{v_{dd}} \approx g_{m4,6}(r_{o4,6}||r_{o3,5}). \tag{4.2}$$

The width of transistors  $M_{1'-11'}$  is only 10% of the width of  $M_{1-11}$  thus the extra silicon area of the replica-biasing circuitry is insignificant. The feedback amplifiers  $A_{1-4}$  are implemented with single-ended output differential pair, and the total area of  $A_{1-4}$  is approximately 30%

PRIMERTY OF RYERSON UNIVERSITY LIGRARY of the total area of  $M_{1-11}$ . The feedback amplifiers  $A_{2,3}$  establish a negative feedback loop such that

$$\frac{v_{S,L}}{v_{dd}} \approx \frac{1}{A_{2,3}}.\tag{4.3}$$

As a result, the changes in  $v_S$  and  $v_L$  due to  $V_{DD}$  fluctuations are greatly reduced. The waveforms of  $v_S$  and  $v_L$  with replica-biasing for  $V_{DD}$  at 1.1 V, 1.2 V and 1.3 V are shown in Fig. 4.10. The range of  $v_S$  and  $v_L$  variations due to  $V_{DD}$  fluctuations is controlled within  $\pm 100$  mV. Also, the variations of  $v_L$  are larger than that of  $v_S$  since the gate voltage of  $M_5$ is controlled by the feedback amplifier  $M_{10,11}$  instead of fixed biasing voltage.



Figure 4.10: Simulated waveform of  $v_L$  and  $v_S$  with replica-biasing. Top:  $v_S$ . Bottom:  $v_L$ .

#### 4.4.4 Input Impedance Tuning

The advantage of the active impedance matching is that the input impedance can be continuously tuned by the external voltage  $V_{tune}$ . As  $V_{tune}$  varies, the drain voltage of  $M_2$  tracks  $V_{tune}$  closely. An increase of  $V_{tune}$  will result in a decrease in  $v_{G2'}$ , which increases the biasing current in the first branch.  $v_{GS1'}$  is forced to increase in order to accommodate the excess current. As a result,  $g_{m1}$  increases and the input impedance drops. Similarly, the decreasing of  $V_{tune}$  will lead to the increase of the input impedance. The tunability of the input impedance of the receiver by varying  $V_{tune}$  is shown in Fig. 4.11 where approximately  $\pm 15\Omega$  input impedance variations are achieved when  $V_{tune}$  is changed from 0.7V to 1.1V.



Figure 4.11: Top: Simulated frequency dependence of the input impedance of receiver. Bottom: Dependence of the input impedance of the receiver on  $V_{tune}$  at 1 GHz.

When  $V_{tune}$  varies, it will also affect  $v_S$  and  $v_L$ . As the current in the first branch increases, the currents in  $v_S$  and  $v_L$  branches will decrease. Without amplifiers  $A_{2,3,4}$ ,  $v_S$  and  $v_L$  will rise to decrease  $i_{D4,6}$ . Amplifiers  $A_{2,3,4}$  are employed to stabilize  $v_S$ ,  $v_L$  and  $v_{G5}$ . As the currents in  $v_S$  and  $v_L$  branches decrease,  $i_{D4,6}$  decrease through the increasing of  $v_{G4,6}$  and vice versa. The waveforms of  $v_S$  and  $v_L$  when  $V_{tune}=0.7$  V, 0.9 V and 1.1 V with replica biasing are shown in Fig. 4.12.

The simulated input impedance at process corners FF (fast nMOS/fast pMOS), FS (fast nMOS/slow pMOS), SF (slow nMOS/fast pMOS), SS (slow nMOS/slow pMOS) and TT is shown in Fig. 4.13.  $V_{tune}$  has been adjusted for each of the process corner to ensure that the input impedance is approximately 50  $\Omega$ .  $V_{tune}$  values for SS, SF, FS, FF and TT were 1.2 V, 0.865 V, 0.865 V, 0.7 V and 0.9 V respectively. Fig. 4.13 demonstrates that the proposed receiver is capable of providing a 50 $\Omega$  matching impedance at the process corners.



Figure 4.12: Simulated waveforms of  $v_L$  and  $v_S$  with replica-biasing. Top:  $v_S$ . Bottom:  $v_L$ .





#### 4.5 Voltage Comparator

The schematic of the voltage comparator used for LVDS in [41][44] is shown in Fig. 4.14. This voltage comparator has several advantages including good CMRR, higher bandwidth and a low insensitivity to  $V_{DD}$  fluctuation. From Fig. 4.6, the common-mode voltage level of  $v_S$  and  $v_L$  varies from 0.6 V to 0.9 V. As a result, the input of the comparator is implemented with NMOS instead of PMOS to accommodate the common-mode input voltage.



Figure 4.14: Simplified schematic of voltage comparator. Transistor sizes:  $W_{1,2} = 25\mu m$ ,  $W_{3,4} = 12.5\mu m$ ,  $W_{5,6} = 2.5\mu m$ ,  $W_{7,8} = 10\mu m$ ,  $W_{9,10} = 2\mu m$ ,  $W_{11} = 35\mu m$ .  $L = 0.13\mu m$  for all transistors. DC biasing:  $V_{G11} = 0.45V$ .

#### 4.6 Compensation of Inter-Signal Timing Skew

The inter-signal timing skew compensation scheme is depicted in Fig.4.15. To phase-align D'[1], D'[2], ..., D'[N] with the global sampling clock MCLK", clock-like training data are sent to the channels by the transmitters. MCLK' and D'[k], k=1,2,...,N are delayed by  $T_{min}$ , the minimum time delay of the data delay blocks when the control voltages of these blocks are set to 0, as shown in Fig.4.15(a). MCLK' is then further delayed by one bit time  $T_b$  (1 ns for 1 Gbytes/s data rate) to generate MCLK", as shown in Fig.4.15(b). This guarantees that MCLK" will be the most lagging signal as compared with D'[k], k=1,2,...,N such that D'[k], k=1,2,...,N, can be phase-aligned with MCLK" by delaying them using delay-locked loops. To do so, MCLK" is used as the global reference of the data delay blocks shown in Fig.4.16(b). Phase alignment is completed when D"[1], D"[2], ..., D"[N] are phase-aligned in order to phase-align MCLK" and D"[k], k=1,2,...,N, is two bit times.

The voltage-controlled delay line of the data deskew blocks provides the minimum time delay of  $T_{min}$  when  $V_{ctrl} = 0$ . The maximum delay  $T_{max}$  of the delay lines occurs when  $V_{ctrl} = V_{DD}$ . The difference between  $T_{max}$  and  $T_{min}$  is the maximum variable time delay that the data delay blocks can provide. Clearly  $T_{max} - T_{min} > 2T_b$  is required in order to guarantee phase alignment.  $T_{max}$  and  $T_{min}$  in the nominal process conditions and at process



Figure 4.15: Inter-signal timing skew compensation scheme. (a) MCLK' and all received training data are delayed by  $T_{min}$  at  $t = T_1$  where training data are conveyed to all channels. (b) MCLK' is further delayed by one bit time and all delayed training data bits are further delayed to align up with MCLK". The amount of the time delay of each data bit is determined by respective delay-locked loop.

corners are tabulated in Table 4.1. It is evident that  $T_{max} - T_{min}$  of the data delay blocks exceeds 2 ns at all process corners, guaranting the data delay blocks will delay the incoming data by at least 2 bit times (2 ns). The clock delay block consists of two sections, namely Part I and Part II. Part I of the clock delay block delays MCLK' by  $T_{min}$  (2.82 ns in the nominal process conditions) and Part II of the clock delay block further delays MCLK' by one bit time.

| Table 4.1:   | The minimum      | and maximum | time delays | of data | delay | blocks in | n nominal | process |
|--------------|------------------|-------------|-------------|---------|-------|-----------|-----------|---------|
| conditions a | nd at process co | orners.     |             |         |       |           |           |         |

|                                                                                                                          | SS    | FS   |      | SF   |      |
|--------------------------------------------------------------------------------------------------------------------------|-------|------|------|------|------|
| $T_{min}$ (ns)                                                                                                           | 3.67  | 2.83 | 2.82 | 2.81 | 2.27 |
| $T_{max}$ (ns)                                                                                                           | 13.21 | 6.87 | 6.87 | 6.85 | 4.29 |
| $ \begin{array}{c} T_{min} \ (\mathrm{ns}) \\ T_{max} \ (\mathrm{ns}) \\ T_{max} - T_{min} \ (\mathrm{ns}) \end{array} $ | 9.54  | 4.04 | 4.05 | 4.04 | 2.02 |

The timing diagrams of the control voltages  $V_{c1-c3}$  are shown in Fig.4.17. At  $t = T_1$ ,  $V_{c1} = 1$  is set. Parallel data D[1], D[2], ..., D[N] are disconnected from the channels. 50% duty-cycle training data are applied to both the data channels and the clock channel.  $V_{ctrl} = 0$  is set and the minimum delay  $T_{min}$  is applied to D'[1], D'[2],...,D'[N]. At  $t = T_2$ ,  $V_{c2} = 1$  is







Figure 4.16: (a) Simplified schematic of sampling clock delay block. (b) Simplified schematic of data deskew block. Circuit parameters:  $W_{1,2} = 12\mu \text{m}$ ,  $W_3 = 6\mu \text{m}$ ,  $W_4 = 4\mu \text{m}$ ,  $W_5 = 10\mu \text{m}$ ,  $W_6 = 4.5\mu \text{m}$ ,  $W_7 = 5\mu \text{m}$ ,  $W_8 = 2.3\mu \text{m}$ ,  $W_9 = 2.07\mu \text{m}$ .  $L = 0.13\mu \text{m}$  for all transistors. C=500 fF. Circuit parameters of charge pump :  $W_{1,3,5} = 2.5\mu \text{m}$ ,  $W_{2,4,6} = 1\mu \text{m}$ ,  $W_{7,11} = 12.5\mu \text{m}$ ,  $W_{8,12} = 7.5\mu \text{m}$ ,  $W_{9,13} = 3\mu \text{m}$ ,  $W_{10,14} = 5\mu \text{m}$ .  $L = 0.13\mu \text{m}$  for all transistors. The transmission gate is used to balance the time delay of the inverter inserted in UP signal path.

set.  $V_{ctrl}$  is routed to the charge pump and the deskew process starts. The deskew process continues until  $t = T_3$  at which D"[1], D"[2],...,D"[N] are all phase-aligned with MCLK". At  $t = T_3$ ,  $V_{c3} = 1$  is set and the voltage replication block is enabled. The function of this block is to copy the obtained optimal control voltage  $V_{ctrl}$  of each channel and hold it in the following data transmission phase. As shown Fig.4.18, at  $t = T_3$ , the counter of the voltage replication block starts. The driving clock of the counter is MCLK". The voltage across the resistor  $V_r$  increases in a piecewise constant fashion. Because  $V_r = 0$  when  $t \leq T_3$ , the output of the comparator is 0 and the MUX selects  $V_{ctrl}$  as its output, which controls the downstream voltage-controlled delay line. When  $V_r$  reaches  $V_{ctrl}$ , i.e.  $V_r = V_{ctrl}$ , the output of the comparator flips and the MUX routes  $V_r$  to the downstream VCDL. The counter is disabled and its output remains unchanged, ensuring that  $V_r = V_{ctrl}$  will remain unchanged during data transmission. The power consumption of the voltage replication circuit can be lowered by increasing the resistance of R. As an example, assume  $V_{ctrl} = 0.6$ V and a 5-digit voltage-replication circuit is used, and  $R = 10k\Omega$ . At the maximum current  $2^5I = 32I$ , we have  $32I \times R = 0.6$ V from which we have  $I \approx 2\mu$ A. The power consumption in this case is  $P = IV_{DD} = 2.4\mu$ W. The control voltage replication time is determined by the value of the control voltage  $V_{ctrl}$ , the voltage incremental step RI, and the time delay of the comparator and multiplexer. The larger the value of  $V_{ctrl}$ , the longer the voltage replication time.



Figure 4.17: Timing diagram of  $V_{c1-c3}$ .  $T_1$  is the time instant at which MCLK and training data D[k], k=1,2,...,N, are conveyed to the channels.  $T_2$  marks the start of the deskew process during which D"[k], k=1,2,...,N, are phase-aligned with MCLK".  $T_3$  marks the end of deskew process and the start of generating the replica of the final deskewing control voltage.  $T_4$  marks the onset of the data transmission phase. The total deskew time includes that lock time of the delay locked loops and the control voltage replication time.



Figure 4.18: Simplified schematic of voltage replication circuit.  $W = 3\mu m$ .  $L = 0.13\mu m$  for all transistors.

#### 4.7 Simulation Results

To validate the effectiveness of the proposed timing skew compensation technique, a 2-bit parallel link consisting of 3 channels as shown in Fig.4.20 is implemented in UMC-0.13 $\mu$ m 1.2 V CMOS technology. The channels are modeled as micros-trip lines on a FR4 substrate. No mutual coupling between the channels is considered. Inter-signal timing skews are introduced by varying the length of the channels. The data rate is 1 Gbytes/s. The parallel link is analyzed using SpectreRF from Cadence Design Systems with BSIM3V3 device models. The layout of the design is shown in Fig. 4.21.

The waveforms of the input currents at the receiver end of the channel and those of MCLK', D'[1] and D'[2] are shown in Fig.4.22. The timing skew between MCLK' and D'[1],



Figure 4.19: Top : Simulated output voltage of counter. Middle : Simulated output voltage of comparator. Bottom : Simulated voltage of MUX connected to VCDL.

and that between MCLK' and D'[2] are approximately 300 ps. It is observed that the timing skew between  $I_{in}[0]$ ,  $I_{in}[1]$ , and  $I_{in}[2]$  are the same as those between MCLK', D'[1] and D'[2].

The deskew processes of D'[1] and D'[2] are shown in Fig.4.23.  $T_2$  and  $T_3$  are set to 10 ns and 60 ns, respectively. When  $t < T_1$ ,  $V_{c1} = 1$  is set. Clock-like training data are conveyed to all channels. The control voltages of the voltage-controlled delay lines are set to zero. MCLK and data bits are all delayed by  $T_{min}$ . At  $t = T_1$ ,  $V_{c2} = 1$  is set. D"[1] and D"[2] start to align with MCLK" gradually. This process ends at  $t = T_3$  at which D"[1] and D"[2] are perfectly aligned with MCLK". At  $t = T_3$ ,  $V_{c3} = 1$  is set and the obtained optimal control voltages of the two voltage-controlled delay lines are copied and held indefinitely by the voltage replication circuits. The waveforms of MCLK", D"[1], and D"[2] after the deskew process is completed are shown in Fig.4.24.

The performance summary of the design is shown in Table 4.2. The power consumption



Figure 4.20: Simplified schematic of 4-channel parallel link with current-mode incremental signaling and inter-signal timing skew compensation. Channel length:  $L_0=0.1$ m,  $L_1=0.06$ m,  $L_2=0.14$ m,  $L_3=0.16$ m,  $H_1 = 200 \mu$ m,  $H_2 = 500 \mu$ m,  $W = 734 \mu$ m,  $\epsilon_r = 5$ .

is measured in the calibration phase only. In the data transmission phase, the switching frequencies of D[1] and D[2] will be smaller than that of MCLK thus the power consumption is expected to be lower.

| Technology           | UMC-0.13µm 1.2V                   |  |  |
|----------------------|-----------------------------------|--|--|
| Data rate            | 1 Gbytes/s                        |  |  |
| Input current        | $\pm 2mA$                         |  |  |
| Output voltage swing | 1.2 V peak-to-peak                |  |  |
| Deskew range         | 1 ns                              |  |  |
| Driver area          | $254~\mu m 	imes 0.13~\mu m$      |  |  |
| Receiver area        | $651~\mu m 	imes 0.13~\mu m$      |  |  |
| Comparator area      | $139~\mu m 	imes 0.13~\mu m$      |  |  |
| MCLK Delay area      | $647~\mu m 	imes 0.13~\mu m$      |  |  |
| Deskew Block area    | $495 \ \mu m \times 0.13 \ \mu m$ |  |  |
| Power Consumption    | 19.22 mW                          |  |  |

| m 11 40    | DC          | C . 1  | 1            | 11 1     | 1. 1           |
|------------|-------------|--------|--------------|----------|----------------|
| Table 4.2: | Performance | of the | current-mode | parallel | link interface |





Figure 4.22: Simulated waveforms of  $I_{in}[0]$ ,  $I_{in}[1]$ ,  $I_{in}[2]$ , MCLK', D'[1] and D'[2].







Figure 4.24: Simulated waveforms of D"[1]. D"[2], and MCLK" when the deskew process is completed.

#### 4.8 Chapter Summary

A new inter-signal timing skew compensation technique and its CMOS implementation for current-mode parallel links with incremental signaling has been proposed. It is shown that the logic state of each incoming data is represented by the direction of the output current of the corresponding current-mode driver. The current-mode receiver maps each received channel current to two voltages of different values and recovers the logic state using voltage comparators. The feedback introduced in the front-end of the receiver eliminates the common drawback of current-mode receivers that the input impedance of current-mode receivers varies with the direction of the received channel currents, minimizing signal reflection at the interface of the channel and the receiver. Simulation results at process corners have also demonstrated that the proposed receiver can provide  $50\Omega$  input impedance over a large frequency range at all process corners. Inter-signal timing skews are compensated using DLLs so that data are aligned with the received master clock and only one master sampling clock is needed to sample all parallel data bits. Implemented in UMC-0.13 $\mu$ m CMOS and analyzed using SpectreRF from Cadence Design Systems with BSIM3V3 device models, the simulation results of the 1 Gbytes/s 4-channel parallel link with microstrip channels on a FR4 substrate have demonstrated that the proposed deskew method is capable of compensating for up to 1 ns inter-signal timing skew.

the mapping close is defined to the restor of the data equrelative file times, interview there is an exception between the point of the state of the entry of the times, interview there is near the termination proposed and the terminative entry is anoth incretional lighting at one of adjacent proposed and the state of the mapping of the time with well-state proposed increment of a granting of the state of the terminative of partial time with well-state proposed increment of a granting of the terminative of partial time with well-state of adjacent proposed and adjacent well at the state of the time of the state of adjacent partial particle is the terminative comparison where upping are the well-state indicated the termination and the terminative of the time of the state of adjacent the termination and the state of the state to the termination. Also, a new current works particle link at the termination of the state to the termination of the termination of the termination of the termination of the state approxime termination. Also, a new current works particle link at the termination of termination o

# Chapter 5

### **Conclusions and Future Work**

Source-synchronized parallel links have the advantage of higher data bandwidth. A number of signaling schemes including single-ended, fully-differential, pseudo-differential and incremental are available. The advantages of the incremental signaling scheme over other signaling schemes include the low number of physical conductors and the capability to reject common-mode noise. A limiting factor of parallel link bandwidth is inter-signal timing skew. Inter-signal timing skew is caused by the mismatch between the electrical lengths of channels. To combat inter-signal timing skew, per-pin deskewing has been used in industry. In this approach, the sampling clock of each channel is phase-adjusted individually such that the sampling clock is shifted to the center of the data eye.

In this thesis, inter-signal timing skew compensation techniques for voltage-mode and current-mode incremental signaling schemes have been proposed and validated. The deskewing circuitry for parallel link with voltage-mode incremental signaling employs an Early/Late block to detect the rising and falling edges of adjacent pulses generated at the output of the comparator whose inputs are the voltages of adjacent channels, and subsequently to allocate the optimal sampling point. Two delay-locked loops are employed to place the sampling clock to the optimal position. Also, a new current-mode parallel link interface with an incremental signaling scheme and inter-signal timing skew has been presented. The current-mode driver supplies or sinks a current from the channel according to the logic state of the input data. The current-mode receiver converts the channel current into two signals  $V_S$  and  $V_L$  whose swings are largely different. In addition, it provides proper termination impedance.  $V_S$  and  $V_L$  are then fed to a differential voltage comparator to recover the data. Inter-signal timing skews are compensated with DLLs where data signals are forced to align with the sampling clock. To assess the effectiveness of the two designs, several parallel links consisting of the aforementioned building blocks have been implemented in UMC  $0.13\mu$ m, 1.2V CMOS technology and analyzed using SpectreRF from Cadence Design Systems with BSIM3.3V device models. The effectiveness of the proposed inter-signal timing skew compensation has been validated.

The followings can be carried out by future researchers to improve the quality of the designs presented in this thesis:

LT INTERNET

- For the current-mode parallel link interface, the input impedance variation is controlled within ±10Ω, which is still around ±%20 variation with respect to the nominal value. As a result, a more effective solution to stablize the input impedance is to be developed.
- The channel models used in the simulation results do not consider cross-talk, the mutual inductance and the mutual capacitance between adjacent channels. More realistic channel models can be used to verify the robustness of the designs.
- Digital delay-locked loop as opposed to the analog approach can be carried out as the digital approach eliminates the requirement of the loop filter capacitor. Also, new current-mode and voltage-mode incremental signaling schemes, and alternative intersignal timing deskew compensation techniques can be developed.

# Bibliography

- S. Sidiropoulos, *High performance inter-chip signaling*, Ph.D dissertation, Stanford University, 2002.
- [2] E. Yeung, Design of high-performance and low-cost parallel links, Ph.D dissertation, Stanford University, 2002.
- [3] A. Carusone, K. Farzan, and D. A. Johns, "Differential signaling with a reduced number of signal paths," *IEEE Trans. Circuits and Syst. II - Analog and Digital Signal Processing*, Vol. 48, No. 3, pp. 294-300, March 2001.
- [4] E. Yeung, and M. Horowitz, "A 2.4 Gb/s/pin simultaneous bidirectional parallel link with per-pin skew compensation," *IEEE J. Solid-State Circuits*, Vol. 35, No. 11, pp. 1619-1628, Nov. 2002.
- [5] T. Sato, Y. Nishio, T. Sugano, and Y. Nakagome, "A 5-GByte/s data-transfer scheme with bit-to-bit skew control for synchronous DRAM," *IEEE J. Solid-State Circuits*, Vol. 34, No. 5, pp. 653-660, May 1999.
- [6] National semiconductor channel link design guide, National Semiconductor Corp., Santa Clara, CA, June 2006.
- [7] Rambus XDR technology summary, Rambus Inc., Delaware, CA, 2005.
- [8] H. Johnson, and M. Graham, High-speed signal propagation advanced black magic, Upper Saddle River, N.J. : Prentice-Hall, 2002.

- R. Mooney, C. Dike, and S. Borkar, "A 900 Mb/s bidirectional signaling scheme," *IEEE J. Solid-State Circuits*, Vol. 30, No. 12, pp. 1538-1542, Dec. 1995.
- [10] B. Casper, A. Martin, J. E. Jaussi, J. Kennedy, and R. Mooney, "An 8-Gb/s simultaneous bidirectional link with on-die waveform capture," *IEEE J. Solid-State Circuits*, Vol. 38, No. 12, pp. 2111- 2120, Dec. 2003.
- [11] Y. Fujimura, T. Takahashi, S. Toyoshima, K. Nagashma, J. Baba, and T. Matsumoto, "1.2Gbps/pin simultaneous bidirectional transceiver logic with bit deskew technique," *Proc. Symp. on VLSI Circuits, Digest of Technical Papers*, pp. 58-59, 2002.
- [12] Y. Kim, S. Shin, and S. Kang, "A 4-Gb/s/pin current mode 4-level simultaneous bidirectional I/O with current mismatch calibration," Proc. IEEE Int'l Symp. Circuits and Syst., pp. 1007-1010, Greece, 2006.
- [13] T. Wang, A low-voltage CMOS current-mode incremental signaling scheme for highspeed parallel links, MASc Thesis, Department of Electrical and Computer Engineer, Ryerson University, 2006.
- [14] T. Wang, and F. Yuan, "A new current-mode incremental sSignaling scheme with applications to Gb/s parallel links," *IEEE Trans. Circuits and Syst. I Regular Papers*, Vol. 54, No. 2, pp. 255-267, Feb. 2007.
- [15] J. Sim, and W. Namgoong, "Multilevel differential encoding with precentering for highspeed parallel link transceiver," *IEEE J. Solid-State Circuits*, Vol. 40, No. 8, pp. 1688-1694, Aug. 2005.
- [16] F. Hatori, S. Kousai, and Y. Unekawa, "Shared data line technique for doubling the data transfer rate per pin of differential interfaces," *Proc. IEEE Int'l Solid-State Circuits Conf.*, pp. 501-504, San Francisco, Apr. 2001.
- [17] S. Choi, H. Lee, and H. Park, "A three-data differential signaling over four conductors

with pre-emphasis and equalization: A CMOS current-mode implementation," *IEEE J.* Solid-State Circuits, Vol. 41, No. 3, pp. 633-641, Mar. 2006.

- [18] S. Zogopoulos, and W. Namgoong, "A low power 4.2 Gb/s/pin parallel link using threelevel differential encoding," Proc. Symp. on VLSI Circuits, Digest of Technical Papers, pp. 77-78, 2006.
- [19] H. Kim, F. Bien, Y. Hur, S. Chandramouli, J. Cha, E. Gebara, and J. Laskar, "A 0.25μm BiCMOS feed foward equalizer using active delay line for backplane communication," *Proc. IEEE Int'l Symp. Circuits and Syst.*, pp. 193 - 196, New Orleans, 2007.
- [20] B. Razavi, Design of integrated circuits for optical communications, McGraw-Hill, 2001.
- [21] C. Yang, and S. Sidiropoulos, "High-speed electrical signaling: overview and limitations," *IEEE Micro*, Vol. 18, No. 15, pp. 12-24, Jan. 1998.
- [22] P. Sun, and M. Hayat, "A Linear equalizer for high-speed APD-based integrate-anddump receivers," *IEEE Communications Letters*, Vol. 9, No. 12, pp. 1073-1075, Dec. 2005.
- [23] R. Schober, W. Gerstacker, and J. Huber, "Adaptive linear equalization combined with noncoherent detection for MDPSK signals," *IEEE Trans. Communications*, Vol. 48, No. 5, pp. 733-738, May. 2000.
- [24] X. Ma, and W. Zhang, "Performance analysis for MIMO systems with lattice-reduction aided linear equalization," *IEEE Trans. Communications*, Vol. 56, No. 2, pp. 309-318, Feb. 2008.
- [25] R. Schober, W. Gerstacker, and J. Huber, "Adaptive linear equalization combined with noncoherent detection for MDPSK signals," *IEEE Trans. Communications*, Vol. 48, No. 5, pp. 733-738, May 2000.
- [26] W. Ellersick, Data converters for high speed CMOS links, Ph.D dissertation, Stanford University, 2001.

- [27] A. Atrash, and B. Butka, "A technique to de-skew differential PCB traces," Proc. Int'l Symp. Circuits Syst., Vol. 2, pp. 565-568, Vancouver, 2004
- [28] J. Rabaey, Digital Integrated Circuits: A Design Perspective, Pearson Education, New Jersey, 2nd edition, 2003.
- [29] S. Liu, J. Lee, and H. Tsao, "Low-power clock-deskew buffer for high-speed digital circuits," *IEEE J. Solid-State Circuits*, Vol. 34, No. 4, pp. 554-558, Apr. 1999.
- [30] C. Yang and S. Liu, "A one-wire approach for skew-compensating clock distribution based on bidirectional techniques," *IEEE J. Solid-State Circuits*, Vol. 36, No. 2, pp. 266-272, Feb. 2001.
- [31] G. Dehng, J. Hsu, C. Yang, and S. Liu, "Clock-deskew buffer using a SAR-controlled delay-locked loop," *IEEE J. Solid-State Circuits*, Vol. 35, No. 8, pp. 1128-1136, Aug. 2000.
- [32] C. Dike, N. Kurd, P. Patra, and J. Barkatuhh, "A design for digital, dynamic clock deskew," in *Proc. Symp. VLSI Circuits, Digest of Technical Papers*, pp. 21- 24, Jun. 2003.
- [33] L. Wang, "An energy-efficient skew compensation technique for high-speed skewsensitive signaling," Proc. IEEE Int'l Symp. Circuits and Syst., Vol. 2, pp. 1658-1661, Kobe, Japan, 2005.
- [34] G. Torralba, V. Angelov, V. Gonzlez, V. Lindenstruth, and E. Sanchis, "A VLSI for deskewing and fault tolerance in LVDS links," *IEEE Trans. Nuclear Science*, Vol. 53, No. 3, pp. 801 - 809, Jun. 2006.
- [35] N. Weste, and D. Harris, CMOS VLSI design: A circuits and dystems perspective, 3rd ed., Boston : Pearson Education, 2005

- [36] C. Chuang, and S. Liu, "A 0.5-5-GHz wide-range multiphase DLL with a calibrated charge pump," *IEEE Trans. Circuits and Syst. II - Express Briefs*, Vol. 54, No. 11, pp. 939-943, Nov. 2007.
- [37] R. Yang, and S. Liu, "A 2.5 GHz all-digital delay-locked loop in 0.13 μm CMOS technology," *IEEE J. Solid-State Circuits*, Vol. 42, No. 11, pp. 2338-2347, Feb. 2001.
- [38] R. Weng, T. Su, C. Liu, and Y. Kuo, "A CMOS delay-locked loop based frequency mmultiplier for wide-range operation," Proc. IEEE Conf. Electron Devices and Solid-State Circuits, pp. 419-422, Dec. 2005.
- [39] J. Kwak, C. Kwon, K. Kim, S. Lee, and J. Kih, "Low cost high performance registercontrolled digital DLL for 1 Gbps × 32 DDR SDRAM," Proc. Symp. VLSI Circuits, Digest Technical Papers, pp. 283-284, 2003.
- [40] LVDS Owner's Manual, National Semiconductor, National Semiconductor Corporation, Santa Clara, CA, 2004.
- [41] C. Chen, J. Wang, and T. Sun, "A novel mini-LVDS receiver in 0.35-μm CMOS," Proc. IEEE Int'l SOC Conf., pp. 65-68, Sept. 2006.
- [42] G. Palmisano, and R. Salerno, "A replica biasing for constant-gain CMOS open-loop amplifiers," Proc. Int'l Symp. Circuits Syst., Vol. 2, pp. 363-366, Monterey, CA, June. 1998.
- [43] M. Sanduleanu, and J. Frambach, "1GHz tuning range, low phase noise, LC oscillator with replica biasing common-mode control and quadrature outputs," Proc. European Solid-State Circuits Conf., pp. 506-509, Sept. 2001.
- [44] A. Boni, A. Pierazzi, and D. Vecchi, "LVDS I/O interface for Gb/s-per-Pin operation in 0.35-μm CMOS," IEEE J. Solid-State Circuits, Vol. 36, No. 4, pp. 706-711, Apr. 2001.

BL-G-G