

## Circuit Characterization and Performance Estimation II



## Outline

- 1. Delay Estimation
- 2. Logical Effort and Transistor Sizing
- **3. Power Dissipation**
- 4. Interconnect
- 5. Wire Engineering
- 6. Design Margin
- 7. Reliability
- 8. Scaling

## Power and Energy

 Power is drawn from a voltage source attached to the V<sub>DD</sub> pin(s) of a chip.

- Instantaneous Power:  $P(t) = i_{DD}(t)V_{DD}$
- Energy:  $E = \int_{0}^{1} P(t)dt = \int_{0}^{1} i_{DD}(t)V_{DD}dt$
- Average Power:

$$P_{\text{avg}} = \frac{E}{T} = \frac{1}{T} \int_{0}^{T} i_{DD}(t) V_{DD} dt$$

## Static and Dynamic Dissipation

$$P_{\text{total}} = P_{\text{static}} + P_{\text{dynamic}}$$

- Static dissipation
  - Subthreshold conduction through OFF transistors
  - Tunneling current through gate oxide
  - Leakage through reverse-biased diodes
  - Contention current in ratioed circuits
- Dynamic dissipation
  - Charging and discharging of load capacitance
  - "Short-circuit" current while both pMOS and nMOS networks are partially ON

## **Dynamic Power**

- Dynamic power is required to charge and discharge load capacitances when transistors switch.
- One cycle involves a rising and falling output.
- On rising output, charge Q = CV<sub>DD</sub> is required
- On falling output, charge is dumped to GND
- This repeats Tf<sub>sw</sub> times over an interval of T



#### Dynamic Power Cont.



#### **Chih-Cheng Hsieh**

#### **VLSI** Design

## Activity Factor

- Suppose the system clock frequency = f
- Let  $f_{sw} = \alpha f$ , where  $\alpha$  = activity factor
  - If the signal is a clock,  $\alpha$  = 1
  - If the signal switches once per cycle,  $\alpha$  = ½
  - Dynamic gates:
    - Switch either 0 or 2 times per cycle,  $\alpha$  = ½
  - Static gates:
    - Depends on design, but typically  $\alpha$  = 0.1
- Dynamic power:

$$P_{\rm dynamic} = \alpha C V_{DD}^2 f$$

## Short Circuit Current

- When transistors switch, both nMOS and pMOS networks may be momentarily ON at once
- Leads to a blip of "short circuit" current.
- < 10% of dynamic power if rise/fall times are comparable for input and output

## Example

- 200M transistor chip
  - 20M logic transistors
    - Average width: 12  $\lambda$
  - 180M memory transistors
    - Average width: 4  $\lambda$
  - 1.2 V 100 nm process ( $\lambda$  = 0.5\* feature size = 50nm)
  - $-C_g = 2 \text{ fF}/\mu m$

## Dynamic Example

- Static CMOS logic gates: activity factor = 0.1
- Memory arrays: activity factor = 0.05 (many banks and partially activated at a time!)
- Estimate dynamic power consumption per MHz.
  - Neglect wire capacitance and short-circuit current.

## Dynamic Example

- Static CMOS logic gates: activity factor = 0.1
- Memory arrays: activity factor = 0.05 (many banks and partially activated at a time!)
- Estimate dynamic power consumption per MHz.

Neglect wire capacitance.

$$C_{\text{logic}} = (20 \times 10^{6})(12\lambda)(0.05\,\mu m \,/\,\lambda)(2\,fF \,/\,\mu m) = 24nF$$
$$C_{\text{mem}} = (180 \times 10^{6})(4\lambda)(0.05\,\mu m \,/\,\lambda)(2\,fF \,/\,\mu m) = 72nF$$
$$P_{\text{dynamic}} = \left[0.1C_{\text{logic}} + 0.05C_{\text{mem}}\right](1.2)^{2} \,f = 8.6 \text{ mW/MHz}$$
$$= 8.6 \text{ W} @ 1 \text{ GHz}$$

#### **Static Power**

- Static power is consumed even when chip is quiescent.
  - Ratioed circuits burn power in fight between ON transistors
  - Leakage draws power from nominally OFF devices

$$I_{ds} = I_{ds0} e^{\frac{V_{gs} - V_t}{nv_T}} \left[ 1 - e^{\frac{-V_{ds}}{v_T}} \right]$$

$$V_{t} = V_{t0} - \eta V_{ds} + \gamma \left(\sqrt{\phi_{s} + V_{sb}} - \sqrt{\phi_{s}}\right)$$

## Ratio Example

- The chip contains a 32 word x 48 bit ROM
  - Uses 1:32 pseudo-nMOS decoder and bitline pullups
  - On average, one wordline and 24 bitlines are high
- Find static power drawn by the ROM

$$- \beta = 75 \ \mu A/V^2, V_{DD} = 1.8V$$
  
 $- V_{tp} = -0.4V$ 

• Solution:

$$I_{\text{pull-up}} = \beta \frac{\left(V_{DD} - \left|V_{tp}\right|\right)^2}{2} = 73\mu\text{A}$$
$$P_{\text{pull-up}} = V_{DD}I_{\text{pull-up}} = 130\mu\text{W}$$
$$P_{\text{static}} = (31 + 24)P_{\text{pull-up}} = 7.2 \text{ mW}$$

## Leakage Example

- The process has two threshold voltages and two oxide thicknesses.
- Subthreshold leakage:
  - 20 nA/ $\mu m$  for low V  $_t$
  - 0.02 nA/ $\mu m$  for high V  $_t$
- Gate leakage:
  - 3 nA/ $\mu$ m for thin oxide
  - 0.002 nA/ $\mu$ m for thick oxide
- Memories use low-leakage transistors everywhere
- Gates use low-leakage transistors on 80% of logic

## Leakage Example Cont.

- Estimate static power:
  - High leakage:
  - Low leakage:

 $(20 \times 10^{6})(0.2)(12\lambda)(0.05 \mu m / \lambda) = 2.4 \times 10^{6} \mu m$ 

 $(20 \times 10^{6})(0.8)(12\lambda)(0.05 \mu m / \lambda) +$  $(180 \times 10^{6})(4\lambda)(0.05 \mu m / \lambda) = 45.6 \times 10^{6} \mu m$ 

$$I_{static} = (2.4 \times 10^{6} \,\mu m) [(20nA / \,\mu m) / 2 + (3nA / \,\mu m)] + (45.6 \times 10^{6} \,\mu m) [(0.02nA / \,\mu m) / 2 + (0.002nA / \,\mu m)] = 32mA$$
$$P_{static} = I_{static} V_{DD} = 38mW$$

If no low leakage devices, P<sub>static</sub> = 749 mW (!)

## Low Power Design

- Reduce dynamic power
  - $-\alpha$ : clock gating, sleep mode
  - C: small transistors (esp. on clock), short wires
  - $-V_{DD}$ : lowest suitable voltage
  - f: lowest suitable frequency
- Reduce static power
  - Selectively use ratioed circuits
  - Selectively use low V<sub>t</sub> devices
  - Leakage reduction:

stacked devices, body bias, low temperature

## **Reduce Static Power**

• Leakage stack effect

• MTCMOS : Multiple Threshold CMOS



• Body bias





Chih-Cheng Hsieh

4- 17

#### **VLSI** Design

## Outline

- 1. Delay Estimation
- 2. Logical Effort and Transistor Sizing
- 3. Power Dissipation

#### 4. Interconnect

- 5. Wire Engineering
- 6. Design Margin
- 7. Reliability
- 8. Scaling

### Interconnect

- Chips are mostly made of wires called interconnect
  - In stick diagram, wires set size
  - Transistors are little things under the wires
  - Many layers of wires
- Wires are as important as transistors
  - Speed
  - Power
  - Noise
- Alternating layers run orthogonally

### Wire Geometry

- Pitch = w + s
- Aspect ratio: AR = t/w
  - Old processes had AR << 1</p>
  - Modern processes have AR  $\approx 2$ 
    - Pack in many skinny wires



## Layer Stack

- AMI 0.6 µm process has 3 metal layers
- Modern processes use 6-10+ metal layers
- Example: Intel 180 nm process
- M1: thin, narrow (< 3λ)</li>
  High density cells
- M2-M4: thicker
  For longer wires
- M5-M6: thickest
  - For V<sub>DD</sub>, GND, clk

| Layer | <b>T</b> (nm) | <b>W</b> (nm) | <b>S</b> (nm) | AR  |    |
|-------|---------------|---------------|---------------|-----|----|
| 6     | 1720          | 860           | 860           | 2.0 |    |
|       | 1000          |               |               |     |    |
| 5     | 1600          | 800           | 800           | 2.0 |    |
|       | 1000          |               |               |     |    |
| 4     | 1080          | 540           | 540           | 2.0 |    |
|       | 700           |               |               |     |    |
| 3     | 700<br>700    | 320           | 320           | 2.2 |    |
| 2     | 700<br>700    | 320           | 320           | 22  | aa |
| Z     | 700<br>700    | 520           | 320           | 2.2 | 88 |
| 1     | 480<br>800    | 250           | 250           | 1.9 | 00 |

Substrate

## Wire Resistance

•  $\rho = resistivity (\Omega^*m)$ 

$$R = \frac{\rho}{t} \frac{l}{w} = R_{\Box} \frac{l}{w}$$

- $R_{\Box} = sheet \ resistance \ (\Omega/\Box)$ -  $\Box$  is a dimensionless unit(!)
- Count number of squares

 $-R = R_{\Box} * (# of squares)$ 



## Choice of Metals

- Until 180 nm, most wires were aluminum
- Modern processes often use copper
  - Cu atoms diffuse into silicon and damage FETs
  - Must be surrounded by a diffusion barrier

| Metal           | Bulk resistivity (μΩ*cm) |
|-----------------|--------------------------|
| Silver (Ag)     | 1.6                      |
| Copper (Cu)     | 1.7                      |
| Gold (Au)       | 2.2                      |
| Aluminum (Al)   | 2.8                      |
| Tungsten (W)    | 5.3                      |
| Molybdenum (Mo) | 5.3                      |

### Sheet Resistance

• Typical sheet resistances in 180 nm process

| Layer                     | Sheet Resistance ( $\Omega/\Box$ ) |
|---------------------------|------------------------------------|
| Diffusion (silicided)     | 3-10                               |
| Diffusion (no silicide)   | 50-200                             |
| Polysilicon (silicided)   | 3-10                               |
| Polysilicon (no silicide) | 50-400                             |
| Metal1                    | 0.08                               |
| Metal2                    | 0.05                               |
| Metal3                    | 0.05                               |
| Metal4                    | 0.03                               |
| Metal5                    | 0.02                               |
| Metal6                    | 0.02                               |

#### **Contacts Resistance**

- Contacts and vias also have 2-20  $\Omega$
- Use many contacts for lower R
  - Many small contacts for current crowding around periphery



## Wire Capacitance

- Wire has capacitance per unit length
  - To neighbors
  - To layers above and below



## Capacitance Trends

- Parallel plate equation:  $C = \varepsilon A/d$ 
  - Wires are not parallel plates, but obey trends
  - Increasing area (W, t) increases capacitance
  - Increasing distance (s, h) decreases capacitance
- Dielectric constant
  - $-\epsilon = k\epsilon_0$
  - $-\epsilon_0 = 8.85 \text{ x } 10^{-14} \text{ F/cm}$
  - $k = 3.9 \text{ for } SiO_2$
- Processes are starting to use low-k dielectrics  $- k \approx 3$  (or less) as dielectrics use air pockets

## M2 Capacitance Data

- Typical wires have  $\sim 0.2 \text{ fF}/\mu m$ 
  - Compare to 2 fF/ $\mu$ m for gate capacitance



**VLSI** Design

#### **Chih-Cheng Hsieh**

## Diffusion & Polysilicon

- Diffusion capacitance is very high (about 2 fF/ $\mu$ m)
  - Comparable to gate capacitance
  - Diffusion also has high resistance
  - Avoid using diffusion *runners* for wires!
- Polysilicon has lower C but high R
  - Use for transistor gates
  - Occasionally for very short wires between gates

## Lumped Element Models

4- 30

- Wires are a distributed system
  - Approximate with lumped element models



- 3-segment π-model is accurate to 3% in simulation
- L-model needs 100 segments for same accuracy!
- Use single segment  $\pi$ -model for Elmore delay

**VLSI Design** 

## Example

- Metal2 wire in 180 nm process
  - 5 mm long
  - $-0.32\ \mu m$  wide
- Construct a 3-segment  $\pi$ -model
  - $-R_{\Box} = 0.05 \ \Omega/\Box$  => R = 781  $\Omega$
  - $-C_{permicron} = 0.2 \text{ fF}/\mu m => C = 1 \text{ pF}$



# Wire RC Delay

- Estimate the delay of a 10x inverter driving a 2x inverter at the end of the 5mm wire from the previous example.
  - Effective R = 2.5 k $\Omega/\mu m$  for gates, C = 2 fF/ $\mu m$
  - Unit inverter:  $4\lambda$  = 0.36  $\mu$ m nMOS,  $8\lambda$  = 0.72  $\mu$ m pMOS
    - $R(10x) = 2.5k\Omega/(0.36x10)=690$ , C(2x) = (0.36+0.72)x2=2 fF.
    - $t_{pd} = (690\Omega)^*(500fF) + (690\Omega + 781\Omega)^*(5000fF + 4fF) = 1.1 \text{ ns.}$



## Crosstalk

- A capacitor does not like to change its voltage instantaneously.
- A wire has high capacitance to its neighbor.
  - When the neighbor switches from 1-> 0 or 0->1, the wire tends to switch too.
  - Called capacitive *coupling* or *crosstalk*.
- Crosstalk effects
  - Noise on nonswitching wires
  - Increased delay on switching wires

## Crosstalk Delay

- Assume layers above and below on average are quiet
  - Second terminal of capacitor can be ignored
  - Model as  $C_{gnd} = C_{top} + C_{bot}$
- Effective C<sub>adj</sub> depends on behavior of neighbors
  - Miller Coupling Factor (MCF)

| В                    | ΔV              | C <sub>eff(A)</sub>     | MCF |
|----------------------|-----------------|-------------------------|-----|
| Constant             | V <sub>DD</sub> | $C_{gnd} + C_{adj}$     | 1   |
| Switching with A     | 0               | C <sub>gnd</sub>        | 0   |
| Switching opposite A | $2V_{DD}$       | $C_{gnd}$ + 2 $C_{adj}$ | 2   |



## **Crosstalk Noise**

- Crosstalk causes noise on nonswitching wires
- If victim is floating:
  - model as capacitive voltage divider



## **Driven Victims**

- Usually victim is driven by a gate that fights noise
  - Noise depends on relative resistances
  - Victim driver is in linear region, and aggressor driver is in saturation. (p3-53)
  - If sizes are same,  $R_{aggressor} = 2-4 \times R_{victim}$



## **Coupling Waveforms**

• Simulated coupling for C<sub>adj</sub> = C<sub>gnd</sub>



**VLSI Design** 

#### **Chih-Cheng Hsieh**

## Noise Implications

- So what if we have noise?
- If the noise is less than the noise margin, nothing happens
- Static CMOS logic will eventually settle to correct output even if disturbed by large noise spikes
  - But glitches cause extra delay
  - Also cause extra power from false transitions
- Dynamic logic never recovers from glitches
- Memories and other sensitive circuits also can produce the wrong answer

**VLSI** Design

## Outline

- 1. Delay Estimation
- 2. Logical Effort and Transistor Sizing
- 3. Power Dissipation
- 4. Interconnect
- 5. Wire Engineering
- 6. Design Margin
- 7. Reliability
- 8. Scaling

## Wire Engineering

- Goal: achieve delay, area, power goals with acceptable noise
- Degrees of freedom:

## Wire Engineering

- Goal: achieve delay, area, power goals with acceptable noise
- Degrees of freedom:



# Wire Engineering

- Goal: achieve delay, area, power goals with acceptable noise
- Degrees of freedom:



**VLSI** Design

**Chih-Cheng Hsieh** 

## Repeaters

- R and C are proportional to *I* (*length*)
- RC delay is proportional to *l*<sup>2</sup>
  - Unacceptably great for long wires
- Break long wires into N shorter segments
  - Drive each one with an inverter or buffer



**VLSI** Design

## Repeater Design

- How many repeaters should we use?
- How large should each one be?
- Equivalent Circuit
  - Wire length /
    - Wire Capaitance C<sub>w</sub>\**I*, Resistance R<sub>w</sub>\**I*
  - Inverter width W (nMOS = W, pMOS = 2W)
    - Gate Capacitance C'\*W, Resistance R/W

## Repeater Design

- How many repeaters should we use?
- How large should each one be?
- Equivalent Circuit
  - Wire length I/N
    - Wire Capacitance C<sub>w</sub>\**I/N*, Resistance R<sub>w</sub>\*I/N
  - Inverter width W (nMOS = W, pMOS = 2W)
    - Gate Capacitance C'\*W, Resistance R/W



#### **Repeater Results**

• Write equation for Elmore Delay

$$t_{pd} = N \left[ \frac{R}{W} \left( C_w \frac{l}{N} + C'W \right) + R_w \frac{l}{N} \left( \frac{C_w}{2} \frac{l}{N} + CW \right) \right]$$

- Differentiate with respect to W and N
- Set equal to 0, solve

$$\frac{l}{N} = \sqrt{\frac{2RC'}{R_w C_w}}$$
$$\frac{t_{pd}}{l} = \left(2 + \sqrt{2}\right)\sqrt{RC'R_w C_w}$$
$$W = \sqrt{\frac{RC_w}{R_w C'}}$$

**VLSI** Design

~60-80 ps/mm

in 180 nm process