# Multi-Voltage Domain Power Distribution Network for Optimized Ultra-Low Voltage Clock Delivery

MD Shazzad Hossain and Ioannis Savidis

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104

Abstract—In this paper, the co-design of the clock and power delivery networks is proposed for ultra-low power IoT applications operating in sub-threshold. A distributed, multi-voltage domain and hierarchical power distribution network is proposed to deliver current to the clock buffers, registers, and combinational circuits in local clock distribution networks. The variation of the clock skew, setup time, hold time, and clock-to-q delay are analyzed under process and supply voltage variation. The effect on timing due to supply and process variation is analyzed for a target operating voltage and frequency of, respectively, 250 mV and 2 MHz in a 130 nm CMOS technology. The minimum clock period, skew, and insertion delay are reduced to, respectively,  $0.74\times$ ,  $0.52\times$ , and  $0.79\times$  when optimized sub-threshold buffers are implemented, as compared and normalized to a clock network that includes non-optimized buffers. In addition, the co-designed clock and power networks were resilient to as much as 10% variation in the supply voltage when the proposed multi-voltage domain and distributed power distribution network is used with the optimized clock buffers.

*Index Terms*—sub-threshold computing, clock buffers, clock distribution, distributed power delivery, power supply variation, process variation, ultra-low power computation

#### I. INTRODUCTION

Energy efficiency and ultra-low power consumption have become key requirements for many battery operated internet of things (IoT) applications. For circuits implemented for remote IoT and sensing applications with limited charging and battery capabilities, ultra-low power consumption and robust operation are highly desired, while operating speed is not as critical. Therefore, ultra-low power circuits are beneficial for IoT applications to reduce power consumption for computation, while improving portability and cost by extending the battery lifetime.

One of the primary methods to reduce power consumption and increase energy efficiency is through supply voltage scaling, where sub- and near-threshold circuits provide the optimum power-delay point [1-4] and minimum energy point [1,5,6]. Prior research on sub-threshold (sub-V<sub>t</sub>) logic and memory circuits has shown proper functionality and high energy-efficiency at voltages equal to or less than 350 mV [1]. For example, an FIR filter has been designed to operate at 85 mV and 240 Hz in a 130 nm technology [7]. A sub- $V_t$ processor was implemented for sensor networks in a 130 nm technology with a 130 mV supply voltage, while consuming 11 nW [8]. A sub- $V_t$  SoC was implemented in a 130 nm technology for wireless electrocardiogram (ECG) monitoring with a 280 mV supply, while consuming 2.6  $\mu$ W of power [9]. In addition, an adaptive 32b sub- $V_t$  processor was implemented that dissipates 27.2 pJ per instruction, while operating at 325 mV [10]. In the work described in this paper, through SPICE

analysis of an inverter, 140 mV is determined as the minimum functional supply voltage that permits 100 KHz operation in a 130 nm CMOS technology. Despite extensive research on developing cores and memories for sub-threshold computing, the analysis of power and clock delivery and the interaction between the power and clock networks at voltages less than 350 mV has not been addressed by the research community.

In this paper, clock-power co-design is proposed for deep sub-threshold circuits operating at 250 mV. A distributed, multi-voltage domain, and hierarchical power distribution network (PDN) is proposed to deliver power to the components of the sub-threshold clock distribution network. The skew variation and timing conditions of the circuit are evaluated for both a conventional single domain power distribution network and the proposed distributed, multi-domain, and hierarchical power distribution network. In addition, the negative impacts of power supply noise on skew variation and timing are characterized for a sub-threshold voltage of 250 mV by using both conventional and optimized clock buffers.

The key contributions of this paper are 1) the use of separate power domains for logic, register, and buffer PDNs to improve robustness to power supply noise at sub-threshold, 2) clockpower co-design for sub-threshold operation at 250 mV, and 3) distributed and variable voltage local clock distribution networks (for clock buffers) to improve noise immunity at subthreshold, where optimized buffers are used for sub-threshold operation.

The rest of the paper is organized as follows: The challenges of sub-threshold power and clock distribution are discussed in Section II. The proposed clock-power co-design methodology is presented in Section III-A. The selection of the voltage regulator topology to optimize the energy efficiency of the circuit is discussed in Section III-B. Simulation results are described in Section IV. Some concluding remarks are provided in Section V.

## II. CHALLENGES OF SUB-THRESHOLD POWER AND CLOCK DISTRIBUTION

Sub-threshold operation imposes some fundamental challenges, including a reduced  $I_{on}/I_{off}$  ratio of the transistors and increased sensitivity to random dopant fluctuation, which degrades robustness to noise and increases threshold voltage variation. Due to an exponential relationship between the transistor gate to source voltage  $V_{GS}$ , drain to source voltage  $V_{DS}$ , and threshold voltage  $V_t$  with the drain current in sub-threshold, a significant degradation in the delay is possible for a 5% variation in the supply voltage (assuming a 250 mV supply). Therefore, the impact of *IR* drop,  $L \cdot di/dt$ 



Figure 1: Effect of supply voltage scaling on buffer and interconnect delay.

switching noise, and process variation is much more severe for circuits operating in sub-threshold as compared to circuits operating at a near-threshold or nominal voltage, even though the magnitude of transient switching noise (di/dt) in subthreshold circuits is smaller.

Conventionally, in super-threshold clock distribution networks (CDN), clock buffers are placed to mitigate clock skew and satisfy timing constraints as the interconnect delay is the dominant component of the total delay of a clock signal [11,12]. However, in a sub-threshold clock distribution network, the logic or buffer delay dominates the total delay of a clock signal due to the significant increase in the delay of the transistors. The primary challenges of low voltage clock distribution networks are clock uncertainty and power overhead [13]. The various sources of clock uncertainty include clock generation circuits, device variation, interconnect variation, and power supply noise [13]. An exponential increase in transistor delay and sensitivity to process, voltage, and temperature (PVT) variation in sub-threshold operation imposes new challenges to mitigate clock skew, slew, and jitter, particularly for circuits operating at a voltage of less than 300 mV [14,15]. In addition, the drive strengths of clock buffers are reduced by an order of magnitude in sub-threshold, which significantly affects the skew and increases the delay. The delay of a clock buffer and a 200 µm interconnect are characterized with SPICE simulation in a 130 nm CMOS technology between 200 mV and 1.2 V and the results are shown in Fig. 1. The propagation delay of the buffer and the interconnect at a 200 mV supply voltage is, respectively, 229 ns and 1.53 ps, which are used to normalize all other results at higher supply voltages. Unlike the buffer delay, the interconnect delay does not change with supply voltage, as shown in Fig. 1. Therefore, implementing a deeper clock network and meeting timing constraints becomes more challenging when the clock buffers are operating in subthreshold [15].

Prior work has explored clock distribution networks operating in sub-threshold. A clock network designed with multiple supply voltages was proposed in [15] to reduce

#### IGSC 2018

clock skew, where the supply voltage was reduced to 340 mV. An unbuffered clock tree operating at 300 mV was proposed to minimize skew, slew, and energy consumption [14,16]. In addition, a slew aware clock tree was developed that operates at 300 mV, where a larger and smaller slew was implemented in, respectively, the root and leaf nodes of the tree by dynamically controlling nodal capacitances [17,18]. A capacitive boosting based interconnect technique was proposed in [19] to reduce clock skew for a 400 mV supply voltage. The use of separate power distribution networks for buffers at different levels of the clock network was proposed to reduce the clock jitter of high performance systems [20]. In addition, a DC-DC converter was implemented for sub-threshold circuits to deliver power at 250 mV [21]. However, no prior work addressed the impact of power supply noise on the skew and timing of the clock distribution network in sub-threshold. In addition, clock distribution below 300 mV is not addressed in prior work.

### III. CLOCK-POWER CO-DESIGN FOR ULTRA-LOW VOLTAGE CIRCUITS

The timing of digital circuits is dependent on the implemented combinational logic and the clock distribution network of the circuit, which includes clock buffers, global interconnect, and flip-flops. A subsection of a clock distribution network is shown in Fig. 2, where all components are supplied current through a single power distribution network. The static and transient noise induced on the power distribution network affect all components of the clock distribution network. Prior research used clock data compensation techniques, where voltage noise on the clock network is allowed to match the peak voltage noise on the datapath [22–24]. However, the technique is not applicable to sub- $V_t$  clock networks due to the significant increase in the sensitivity of the circuit to variation.





In a sub-threshold CDN, the propagation delay of the clock buffers, combinational circuits, and flip-flops significantly increase due to a minor voltage drop in the power network, which affects the timing profile of the sub-threshold



Figure 3: Circuit model of the co-designed clock-power networks.

CDN. Therefore, there is a greater and direct effect of power supply noise on the performance and stability of the subthreshold clock network. In addition, local and global process variation disproportionally affect the delay of the clock buffers, combinational logic, and flip-flops across the circuit. At subthreshold voltages, process variation results in increased skew variation and slew degradation, which causes timing failure. Therefore, robust circuit and system level techniques are required for sub-threshold clock distribution networks to meet the timing requirements of the circuit under the combined effects of power supply noise and process variation.

### A. Multi-Voltage Domains and Distributed Power Delivery for Local Clock Distribution Networks

A clock-power co-design methodology is proposed for ultralow voltage circuits operating in the range of 200 mV to 300 mV. For this paper, only 250 mV operation is described. Instead of using a single power domain to deliver current to combinational logic, registers, and clock buffers, a design time reconfiguration of the sub- $V_t$  power distribution network is proposed to ensure timing constraints are met at sub-threshold voltages. The sub-threshold PDN is implemented as a multivoltage domain, distributed, and hierarchical network. The primary objectives of the co-design methodology are to 1) improve the stability and performance of the CDN in subthreshold, and 2) mitigate the effects of supply noise on the clock network.

The circuit model that includes the co-designed clock and power networks developed to analyze the effect of power supply noise on skew variation is shown in Fig. 3. The clock and power networks are represented by, respectively, dashed and solid lines. An off-chip crystal oscillator is assumed, which drives an on-chip phase-locked loop (PLL). The PLL drives a global clock distribution network, which is then connected to local CDNs. The PLL is supplied current through either offor on-chip voltage regulators. The global clock network is supplied current through a global PDN, which isolates noise propagation from the switching activities of the local clock domains, registers, and logic circuits. The synchronous circuit blocks (e.g. flip flops), local clock buffers, and combinational logic circuits are supplied current through separate PDNs labeled as, respectively, Register PDN, Buffer PDN, and Logic PDN in Fig. 3.

A DC-DC buck converter is considered for the first-stage conversion as a switching regulator drives higher current loads with higher efficiency as compared to linear and switched-capacitor voltage regulators [11]. The three first-stage on-chip voltage regulators (OCVR) supplying current to the logic, register, and buffer circuits are labeled as, respectively, VR1 [logic], VR1 [reg], and VR1 [buffer] in Fig. 3 and convert a battery voltage (3.3 V to 20 V) to an intermediate voltage between 0.6 V to 1.2 V. Further details regarding the on-chip voltage regulators is provided in Section III-B.

High-to-low level shifters (LS) are used to produce output voltages between 0.2 V and 0.4 V [25]. The use of LS circuits reduces the area and power overhead as a fewer number of voltage regulators are now required to generate output voltages between 0.2 V and 0.4 V. The input and output voltage of the regulators and the output supply voltage of the level shifters are controlled by the power control unit (PCU).

A distributed and multi-voltage domain power delivery system to deliver current to two local clock distribution networks and sub-circuits labeled as *Local CDN 1* and *Local CDN* 2 is shown in Fig. 3. Separate power distribution networks are assumed for each local clock network. The local CDNs consist of only sub- $V_t$  clock buffers, while sub-circuit blocks 1 and 2 consist of combinational circuits, registers, gates used for clock gating, and level shifters. However, in this paper, the sub-circuits connected with each local CDN include only registers and combinational logic. Each local clock domain includes a dedicated PDN for buffers. The clock depth and the number of implemented buffers is limited when delivering the clock signal in sub-threshold due to more stringent timing constraints. In addition, the sub-circuit block connected with each local CDN includes one or two PDNs, each providing current to combinational logic and registers through level shifters (see Fig. 3). The output voltage of each LS is set individually to any value between 0.2 V and 0.4 V, which enables distributed and multi-voltage domain operation in subthreshold. Note that when a power domain is isolated through a separate PDN, the respective ground network is also isolated to ensure perfect noise isolation.

The proposed characteristics of the clock-power co-designed circuit are 1) distributed and multi-voltage domain power delivery to the circuits close to the clock leafs (local PDN), where current to each local PDN is individually supplied by an LS, 2) isolated logic, buffer, and register PDNs to reduce the power supply noise seen by the clock buffers and registers, where the isolated PDNs are operated with independent subthreshold supply voltages, 3) operating the logic, register, and combinational logic within a local clock distribution network at a similar sub-threshold voltage if the timing constraints and target frequency requirements are satisfied for the maximum noise in the supply voltage rail, 4) operating the logic, buffer, and register PDNs associated with a local CDN in different sub-threshold voltages for fine grained tuning between delay, power, and noise margins, and 5) merging any two PDNs among the three if the effect of power supply noise on the clock network is minimal for the given circuit block, which also reduces the design complexity.

Two scenarios are illustrated in Fig. 3: 1) local CDN 1 and sub-circuit block 1 with three separate PDNs for buffers, logic, and registers, and 2) local CDN 2 and sub-circuit block 2, where a single PDN is used for buffers and combinational logic and all registers are supplied current through a second PDN. The circuit model includes both global and local on-chip clock distribution networks. However, in this paper, the analysis is limited to the local clock distribution network.

### B. Voltage Regulator Selection for the Proposed Clock-Power Network

In the clock-power co-designed circuit shown in Fig. 3, three on-chip buck converters are used to generate an output voltage of 0.6 V to isolated PDNs that supply current to the logic, register, and buffers. The three regulators are not adding additional power loss as the power dissipation of the switching regulators is directly proportional to the output current for a given input and output voltage as described by (6).

The basic circuit diagram of a switching DC-DC buck converter is shown in Fig. 4, where MOSFETs are used as

#### IGSC 2018

switches Q1 and Q2 [11]. The primary components of the power dissipation of a buck converter are the conduction loss of the inductor given by (1), the MOSFET conduction loss on the high-side MOSFET Q1 and low-side MOSFET Q2 given by, respectively, (2) and (3), and the MOSFET switching loss, which is not provided in this paper as switching losses are highly dependent on switching frequency [26,27].  $I_o$ ,  $I_L$ ,  $I_{Q1}$ , and  $I_{Q2}$  represent the current through, respectively, the output node of the buck converter, the inductor L, the Q1 switch, and the Q2 switch.



Figure 4: Basic circuit representation of a switching DC-DC buck converter.

$$P_L = I_L^2 \cdot R_L$$
  

$$\approx I_o^2 \cdot R_L \tag{1}$$

$$P_{Q1} = I_{Q1}^2 \cdot R_{Q1}$$
$$= \frac{V_o}{V_{in}} \cdot I_L^2 \cdot R_{Q1}$$
(2)

$$P_{Q2} = I_{Q2}^{2} \cdot R_{Q2}$$
  
=  $(1 - \frac{V_{o}}{V_{in}}) \cdot I_{L}^{2} \cdot R_{Q2}$  (3)

The total power dissipated by the MOSFET switches is given by

$$P_{MOS} = P_{Q1} + P_{Q2}$$

$$P_{MOS} = I_L^2 \cdot \left(\frac{V_o}{V_{in}} \cdot R_{Q1} + (1 - \frac{V_o}{V_{in}}) \cdot R_{Q2}\right)$$

$$P_{MOS} = I_o^2 \cdot M, \qquad (4)$$
where  $M = \left(\frac{V_o}{V_{in}} \cdot R_{Q1} + (1 - \frac{V_o}{V_{in}}) \cdot R_{Q2}\right).$ 

The total power dissipation of the buck converter is, therefore, given by

$$P_{Buck} = P_L + P_{MOS} + P_{other}$$
(5)

$$\approx I_o^2 \cdot R_L + I_o^2 \cdot M$$

$$\approx I_o^2(R_L + M),\tag{6}$$

where  $V_o/V_{in}$  is the output to input voltage ratio of the buck converter,  $R_L$  is the DC resistance of the inductor L,  $I_o$  is the output load current of the regulator, and  $R_{Q1}$  and  $R_{Q2}$ are the on-time drain-to-source resistances of MOSFET Q1



Figure 5: Conversion efficiency and regulator loss of a DC-DC buck converter that generates an output voltage of 0.6 V from input voltages of 5 V and 12 V for current loads of 5 A and 15 A.

To further analyze the regulator loss and efficiency for variation in current demand, an industry standard switching DC-DC buck converter (Analog Devices ADP1851) is simulated using ADIsimPE. The converter takes an input voltage  $V_{in}$  in the range of 2.75 V to 20 V and generates an output voltage between 0.6 V and 90% of  $V_{in}$ , for a maximum output current load of 25 A [28]. The buck converter is simulated with a 5 A output current load to emulate the current demand of logic, registers, and buffers supplied current through isolated PDNs, while a 15 A output current is used to emulate a single bulk regulator delivering current through a single PDN to the logic, registers, and buffers. The converter efficiency and total power loss of the regulator is shown in Fig. 5 for output currents between 0.01 A and 15 A, while generating an output voltage of 0.6 V from input voltages of 5 V and 12 V. The total power loss of the buck converter for an input voltage of 5 V (12 V) is 0.37 W (0.26 W) and 1.4 W (1.15 W) for, respectively, a 5 A and 15 A output current load. Therefore, for similar input and output voltages, the regulator power loss  $P_{Buck}$  given by (5) increases by  $3.8 \times$  when the output current increases to 15 A from 5 A. However, no significant change in conversion efficiency is observed between a regulator designed to supply an output load current of 5 A and a regulator supplying 15 A as the power loss of the buck converter minimally impacts the efficiency. Therefore, using three separate buck converters for logic, registers, and buffers does not incur additional power loss, while maintaining a high conversion efficiency.

#### **IV. SIMULATION RESULTS**

All SPICE simulations are performed in a 130 nm CMOS technology. Characterization of the variation in the delay of

flip-flops due to process variation at different supply voltages is described in Section IV-A. The timing characteristics of a flipflop operating in sub-threshold are discussed in Section IV-B. The effect of power supply noise on the clock skew and circuit timing is analyzed in Section IV-C.

#### A. Delay Variation of Flip Flops Under Supply Variation

The variation in flip-flop (FF) delay at sub-threshold voltages is analyzed for three different process corners (tt, ss, and ff). A flip-flop topology based on transmission gates is implemented and simulated in SPICE. The variation in flip-flop parameters including setup time  $(t_s)$ , hold time  $(t_h)$ , and clockto-q delay  $(t_{clk-to-q})$  is analyzed for supply voltages between 200 mV and 1.2 V, although simulation results are provided in Fig. 6 for up to 450 mV. The average variation of the flip-flop parameters for process corners in deep sub-threshold (250 mV) is more than 500× greater than the variation of the parameters in super-threshold operation (1.2 V). The average variation in flip-flop parameters is determined by 1) taking the difference in delay between the *tt* and *ss* case and setting the value to x, 2) taking the difference between the *tt* and *ff* case and setting the value to y, and 3) calculating the average variation in flip-flop parameters as (x + y)/2. As a result, power supply noise of a few millivolts drastically changes the timing characteristics of flip-flops operating in sub-threshold, which, therefore, requires a power delivery system that is robust to noise.



B. Timing Constraints and Skew Variation in Sub-Threshold Circuits

A local clock branch that includes two sequentially adjacent flip-flops is used to model a single node of a clock distribution network for SPICE simulation performed at 250 mV, as shown in Fig. 7. The circuit is used to characterize the clock skew and the effects of power supply voltage variation on the timing of sequential circuits.

The clock insertion point is labeled as M. The interconnect (*wire* in Fig. 7) impedance is represented as an equivalent  $\pi$ -model to accurately represent the clock network. A 200  $\mu$ m long interconnect is used in each path. Two series connected clock buffers (*B* in Fig. 7) that operate at 250 mV are inserted

midway along each path, which splits the interconnect into two 100 µm long segments, one before and the other after the buffers. Four  $\pi$ -segments each representing 25 µm of length are used to model each 100 µm wire segment. The flip-flops are optimally resized to operate at 250 mV. The sheet resistance  $R_S$  of a M3 interconnect in a 130 nm technology is 0.0584 ± 0.0217  $\Omega$ /square. In addition, the capacitance per unit length *C* of the interconnect is 1.8 to 2.2 pF/cm as given in the 2013 ITRS [29]. The wire width and *C* considered for the analysis are, respectively, 0.2 µm and 2 pF/cm, giving a resistance  $R_{\pi}$  and capacitance  $C_{\pi}$  of, respectively, 7.3  $\Omega/\pi$ -segment and 4.125 fF/ $\pi$ -segment.

The effects of power supply noise on the clock skew and the overall timing margins are analyzed for a supply voltage of 250 mV. If the launching and capturing registers are defined as, respectively, *m* and s, then the insertion delays of the *m* ( $t_{mi}$ ) and *s* ( $t_{si}$ ) registers are used to calculate the clock skew ( $\delta$ ), as given by (7) and (8). For the analysis in this paper,  $\delta_{adjusted}$  is used, as  $\delta_{worst}$  is overly pessimistic.

$$\delta_{worst} = t_{si}(ss) - t_{mi}(ff) \tag{7}$$

$$\delta_{adjusted} = [t_{si}(ss) - t_{mi}(ff) + t_{si}(ss) - t_{mi}(tt)]/2$$
(8)

The longest (or minimum clock period) and shortest (or race condition) path delay constraints must be met to assure the proper timing of the circuit. The constraints to determine the minimum clock period and to avoid a race condition are given by, respectively, [13]

$$T \ge t_{clk-to-q,max} + t_{logic,max} + t_s - \delta$$
, and (9)

$$\delta < t_{clk-to-q,min} + t_{logic,min} - t_h.$$
(10)



Figure 7: Circuit model used to analyze the effect of supply noise on timing.

A PDN with a single voltage domain and the proposed PDN with up to three voltage domains are simulated to analyze the effects of supply voltage variation on the delay of the components of the sub-threshold clock distribution network [13]. In addition, the typical corner is used to characterize the setup time  $t_{s,tt}$ , hold time  $t_{h,tt}$ , and clock-to-q delay  $t_{clk-to-q,tt}$  of the register, as the primary objectives of this work are to analyze 1) the effects of power supply noise on the skew, and 2) the effect on timing due to noise induced skew variation. The buffers are sized to produce symmetric rise and fall times in the operating mode each is optimized for. Therefore, two different sizing ratios are used for the nominal and sub-threshold buffers: 1) nominal buffers include two inverters with

a PMOS width  $W_p$  of 3.6 µm and an NMOS width  $W_n$  of 1.2  $\mu$ m, and 2) sub-V<sub>t</sub> buffers include two inverters resized to optimize the sub-threshold delay with a PMOS width  $W_p$  of 2.4  $\mu$ m and an NMOS width  $W_n$  of 2.4  $\mu$ m. The total area for both the nominal and sub-threshold buffers is the same for isoarea comparison. Note that the P/N ratio of the sub- $V_t$  buffers is not the ideal ratio for different process corners and voltages, but is kept constant across all simulations, providing results and intuition on the effect non-optimized buffers have on the timing characteristics of the circuit. The DC behavior of the nominal and sub- $V_t$  inverters is shown in Fig. 8. The sub- $V_t$ inverter exhibits symmetric behavior at a supply voltage of 250 mV for the typical process corner, while the voltage transfer curve (VTC) of the nominal inverter (non-optimized for subthreshold operation) is shifted to the right as a stronger PMOS response is exhibited. The 25 mV ( $V_{dd}/10$ ) shift from the sub- $V_t$  to nominal VTC implies an increase in the fall time of the inverter. A chain of four inverters is simulated with nominal and sub-Vt inverters to further characterize the delay variation at a sub-threshold voltage of 250 mV, with results indicating a delay of 290 ns and 204 ns, respectively.



Figure 8: DC behavior of the nominal and sub- $V_t$  inverters.

The optimal logic depth for circuits operating at a nominal supply voltage is 8 FO4 delays at 3.6 GHz in a 100 nm technology [30]. However, the logic paths of sub-threshold circuits have much higher delay and operate at much lower maximum frequencies. An analysis of the minimum clock period, skew, and the timing properties of the register is used to determine the upper bound (critical path) and lower bound (shortest path) of the allowed delay. Therefore, the timing requirements given by (9) and (10) are re-written as, respectively, (11) and (12), where  $T_{min}$  is the minimum allowed clock period and  $t_{logic}$  is the propagation delay through the combinational logic. The longest and shortest delay using the nominal buffers operating in sub-threshold is determined as, respectively, 12 FO4 and 6 FO4 for supply voltages ranging from 200 mV to 250 mV, and for the optimized sub- $V_t$  buffers as 5 FO4 and 2 FO4 for the same voltage range. For the 130 nm CMOS process used in this paper, the FO4 delay at a sub-threshold voltage of 200 mV and 250 mV is, respectively, 103.2 ns and 36.25 ns for the optimized inverter.

$$t_{logic,critical} \le T_{min} + \delta_{adjusted} - t_{clk-to-q,tt} - t_{s,tt}$$
(11)

$$t_{logic,shortest} > \delta_{adjusted} - t_{clk-to-q,tt} + t_{h,tt}$$
(12)

## C. Sensitivity of Clock Network to Power Supply Variation

The effect of up to  $\pm 10\%$  power supply variation on the clock network (schematic shown in Fig. 7) is analyzed for a sub-threshold supply voltage of 250 mV, with results provided in Fig. 9. The variation in insertion delay, FO4 delay, skew, and minimum clock period is characterized using unoptimized nominal and optimized sub- $V_t$  buffers, both operating in subthreshold. Improvement in the insertion delay, clock skew, and clock period is observed when  $sub-V_t$  buffers are used as compared to a clock distribution network using buffers designed for nominal operation. The insertion delay is reduced to 0.79× when sub-V<sub>t</sub> buffers are used (for the typical corner) as compared and normalized to the insertion delay of nominal buffers operating at 250 mV. The insertion delay using nominal and sub- $V_t$  buffers increases to, respectively,  $1.72 \times$  and  $1.34 \times$ , normalized to the insertion delay of a nominal buffer operating at 250 mV, when the supply voltage is reduced to 225 mV. In addition, the minimum clock period is reduced to  $0.74 \times$ when optimized sub- $V_t$  buffers are used as compared to nonoptimized nominal buffers for a 250 mV supply. The use of optimized buffers reduces the skew to  $0.52 \times$  as compared to the skew with non-optimized buffers. At a supply voltage of 250 mV, the skew of the clock-path with nominal and sub- $V_t$ buffers is, respectively, 5.6 FO4 and 2.9 FO4.

A minimum clock period of 500 ns is required to meet the timing constraints of the flip-flops at a 250 mV supply voltage. The requirements of 1) a minimum permitted clock period and 2) preventing race conditions, are used to analyze the maximum allowed variation in power supply voltage. In addition, power supply variation is analyzed by considering three different PDN configurations: Case 1) a single power distribution network that delivers current to the clock buffers, registers, and combinational logic circuits (baseline), Case 2) two power domains, where one power network is dedicated to the combinational logic circuits and the other power network supplies current to the clock buffers and registers, and Case 3) two power domains, where one power network supplies current to the combinational logic circuits and clock buffers and the other power network supplies current to the registers. The separate PDNs for Case 2 and 3 are either operated at two different sub-threshold voltages with a minimum voltage of 250 mV or the circuit blocks within the PDNs are more robust to noise at 250 mV and, therefore, the voltage is kept the same for both PDNs, as done for the analysis in this section.

The simulation results are provided in Table. I, where the maximum tolerable noise is listed for the three different PDN configurations with both nominal and sub- $V_t$  buffers. The power overhead of the VR is not considered as analyzing the effect of power supply noise on the clock network is the primary objective. Based on simulation results, the use of a



Figure 9: Variation in (a) insertion delay, and (b) skew, minimum clock period, and FO4 delay. The clock network with nominal and sub- $V_t$  buffers is represented as, respectively, solid and dashed lines.

Table I: Maximum tolerable noise of clock network for three PDN configurations using nominal and sub- $V_t$  buffers at 250 mV

| 230 111 V.        |                                                 |        |        |
|-------------------|-------------------------------------------------|--------|--------|
|                   | Maximum tolerable noise<br>(per cent of 250 mV) |        |        |
|                   | Case 1                                          | Case 2 | Case 3 |
| Nominal buffer    | 2%                                              | 4%     | 4%     |
| Sub- $V_t$ buffer | 7%                                              | 8%     | 10%    |

single PDN with nominal buffers tolerates up to 2% supply noise variation, while the use of sub- $V_t$  buffers with a single PDN increases the noise tolerance to 7%. In addition, when using nominal buffers, the tolerance to noise increases by up to 4% when separate PDNs are used for either combinational logic only (Case 2) or nominal buffers with combinational logic together (Case 3). The use of sub- $V_t$  buffers with a separate PDN for combinational logic provides the maximum noise tolerance of 10%, which indicates a benefit of including multiple power domains when considering sub-threshold clock delivery. Therefore, the robustness and performance of the clock distribution network is improved when buffers optimized for sub- $V_t$  are used for a supply voltage of 250 mV. In addition, for ultra-low voltage circuits, a multi-domain power distribution network is beneficial to meet the timing constraints of the circuit when in the presence of supply and process variation. The proposed technique is also applicable to multicore systems, where each core includes dedicated OCVRs.

#### V. CONCLUSIONS

A methodology for clock-power co-design is proposed for sub-threshold computing. A multi-voltage domain power distribution network is used to deliver current to the components of a clock distribution network operating in sub-threshold. The effect of process and supply voltage variation on the clock network at an operating voltage of 250 mV is analyzed. The use of optimized sub- $V_t$  buffers results in a reduction in the minimum clock period, skew, and insertion delay to, respectively, 0.74×, 0.52×, and 0.79× when normalized to a clock network that includes non-optimized nominal buffers operating at 250 mV. In addition, the clock network that includes sub- $V_t$  buffers tolerates up to 8% and 10% supply noise when multi-voltage domain PDNs are used for, respectively, 1) combinational logic, and 2) combinational logic and clock buffers.

#### References

- H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, "Near-threshold Voltage (NTV) Design: Opportunities and Challenges," *Proceedings of the IEEE/ACM Annual Design Automation Conference*, pp. 1153–1158, June 2012.
- [2] M. S. Hossain and I. Savidis, "Robust Near-threshold Inverter with Improved Performance for Ultra-Low Power Applications," *Proceedings* of the IEEE International Symposium on Circuits and Systems (ISCAS), pp. 738–741, May 2016.
- [3] M. S. Hossain and I. Savidis, "Bi-directional Input/Output Circuits with Integrated Level Shifters for Near-threshold Computing," *Proceedings* of the IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1240–1243, August 2017.
- [4] M. S. Hossain and I. Savidis, "Noise Constrained Optimum Selection of Supply Voltage for IoT Applications," *Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS)*, pp. 1–5, May 2018.
- [5] J. Myers, A. Savanth, D. Howard, R. Gaddh, P. Prabhat, and D. Flynn, "8.1 An 80nW retention 11.7 pJ/cycle active subthreshold ARM Cortex-M0+ subsystem in 65nm CMOS for WSN applications," *Proceedings* of the IEEE International Solid-State Circuits Conference (ISSCC), pp. 1–3, February 2015.
- [6] A. P. Chandrakasan, D. C. Daly, D. F. Finchelstein, J. Kwong, Y. K. Ramadass, M. E. Sinangil, V. Sze, and N. Verma, "Technologies for Ultradynamic Voltage Scaling," *Proceedings of the IEEE*, Vol. 98, No. 2, pp. 191–214, January 2010.
- [7] M. Hwang, A. Raychowdhury, K. Kim, and K. Roy, "A 85mV 40nW Process-Tolerant Subthreshold 8×8 FIR Filter in 130nm Technology," *Proceedings of the IEEE Symposium on VLSI Circuits*, pp. 154–155, October 2007.
- [8] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, *et al.*, "Performance and Variability Optimization Strategies in a Sub-200mV, 3.5 pJ/Inst, 11nW Subthreshold Processor," *Proceedings of the IEEE Symposium on VLSI Circuits*, pp. 152–153, June 2007.
- [9] S. C. Jocke, J. F. Bolus, S. N. Wooters, A. D. Jurik, A. C. Weaver, T. N. Blalock, and B. H. Calhoun, "A 2.6-µW Sub-Threshold Mixed-Signal ECG SoC," *Proceedings of the IEEE Symposium on VLSI Circuits*, pp. 60–61, June 2009.
- [10] S. Luetkemeier, T. Jungeblut, M. Porrmann, and U. Rueckert, "A 200mV 32b Subthreshold Processor with Adaptive Supply Voltage Control," *Proceedings of the IEEE International Solid-State Circuits Conference* (ISSCC), pp. 484–486, February 2012.
- [11] E. Salman and E. G. Friedman, *High Performance Integrated Circuit Design*, McGraw Hill Professional, 2012.

- [12] K. Han, J. Li, A. B. Kahng, S. Nath, and J. Lee, "A global-local optimization framework for simultaneous multi-mode multi-corner clock skew variation reduction," *Proceedings of the 52nd Annual Design Automation Conference*, p. 26, June 2015.
- [13] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits*, Prentice Hall, Englewood Cliffs, 2nd Edition, 2002.
- [14] M. Seok, D. Blaauw, and D. Sylvester, "Robust Clock Network Design Methodology for Ultra-Low Voltage Operations," *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, Vol. 1, No. 2, pp. 120–130, August 2011.
- [15] L. Lin, S. Jain, and M. Alioto, "Reconfigurable Clock Networks for Random Skew Mitigation from Subthreshold to Nominal Voltage," *Proceedings of the IEEE International Solid-State Circuits Conference* (ISSCC), pp. 440–441, February 2017.
- [16] M. Seok, D. Blaauw, and D. Sylvester, "Clock Network Design for Ultra-Low Power Applications," *Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design*, pp. 271–276, August 2010.
- [17] J. R. Tolbert, X. Zhao, S. K. Lim, and S. Mukhopadhyay, "Analysis and Design of Energy and Slew Aware Subthreshold Clock Systems," *IEEE Transactions on Computer-Aided Design of Integrated Circuits* and Systems, Vol. 30, No. 9, pp. 1349–1358, August 2011.
- [18] J. R. Tolbert, X. Zhao, S. K. Lim, and S. Mukhopadhyay, "Slew-Aware Clock Tree Design for Reliable Subthreshold Circuits," *Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design*, pp. 15–20, August 2009.
- [19] J. Kil, J. Gu, and C. H. Kim, "A High-Speed Variation-Tolerant Interconnect Technique for Sub-Threshold Circuits using Capacitive Boosting," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 16, No. 4, pp. 456–465, April 2008.
- [20] R. Kumar, "Power Source for Clock Distribution Network," August 2016, US Patent 9419589.
- [21] Y. K. Ramadass and A. P. Chandrakasan, "Minimum Energy Tracking Loop with Embedded DC–DC Converter Enabling Ultra-Low-Voltage Operation Down to 250 mV in 65 nm CMOS," *IEEE Journal of Solid-State Circuits*, Vol. 43, No. 1, pp. 256–265, January 2008.
- [22] T. Na, J. H. Ko, and S. Mukhopadhyay, "Clock Data Compensation Aware Digital Circuits Design for Voltage Margin Reduction," *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol. 64, No. 9, pp. 2401–2413, June 2017.
- [23] K. A. Bowman, S. Raina, J. T. Bridges, D. J. Yingling, H. H. Nguyen, B. R. Appel, Y. N. Kolla, J. Jeong, F. I. Atallah, and D. W. Hansquine, "A 16 nm all-digital auto-calibrating adaptive clock distribution for supply voltage droop tolerance across a wide operating range," *IEEE Journal* of Solid-State Circuits, Vol. 51, No. 1, pp. 8–17, January 2016.
- [24] K. L. Wong, T. Rahal-Arabi, M. Ma, and G. Taylor, "Enhancing Microprocessor Immunity to Power Supply Noise with Clock-Data Compensation," *IEEE Journal of Solid-State Circuits*, Vol. 41, No. 4, pp. 749–758, March 2006.
- [25] P. Chiu and M. Chen, "High-to-Low Level Shifter," September 6 2005, US Patent 6,940,333.
- [26] R. Nowakowski and N. Tang, "Efficiency of Synchronous versus Nonsynchronous Buck Converters," *Texas Instruments Incorporated*, 2009 [Online], Avaiable: http://www.ti.com/lit/an/slyt358/slyt358.pdf.
- [27] D. Jauregui, B. Wang, and R. Chen, "Power Loss Calculation with Common Source Inductance Consideration for Synchronous Buck Converters," *Texas Instruments Incorporated*, June 2011.
- [28] ADP1851 Data Sheet, "Analog Devices," 2018, Available from http://www.analog.com/media/en/technical-documentation/datasheets/ADP1851.pdf.
- [29] ITRS Team, "International Technology Roadmap for Semiconductors," 2013, Available from http://www.itrs.net/ITRS
- [30] MS Hrishikesh, D. Burger, N. P. Jouppi, S. W. Keckler, K. I. Farkas, and P. Shivakumar, "The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays," *Proceedings of the IEEE International Symposium of Computer Architecture (ISCA)*, pp. 14–24, May 2002.