## Design Tools for Reliability $\mathrm{Analysis}^1$

Aamir Ahmed Khan Institute for Electronic Design Automation Technische Universität München

Presented at Moscow-Bavarian Joint Advanced Student School 2009, Zelenograd

March 4, 2009

<sup>1</sup>Adapted mainly from [1]

# Contents

| 1        | Intr | roduction                                                                                                | 3  |
|----------|------|----------------------------------------------------------------------------------------------------------|----|
|          | 1.1  | Definition of Reliability                                                                                | 3  |
|          | 1.2  | Reliability Paradigm                                                                                     | 4  |
|          |      | 1.2.1 Infant Mortality                                                                                   | 4  |
|          |      | 1.2.2 Normal Operating Life                                                                              | 5  |
|          |      | 1.2.3 Wear Out $\ldots$ | 5  |
| <b>2</b> | Phy  | vsical Phenomena Affecting Reliability                                                                   | 6  |
|          | 2.1  | Hot Carrier Injection                                                                                    | 6  |
|          |      | 2.1.1 HCI Degradation Model                                                                              | 7  |
|          | 2.2  | Negative Bias Temperature Instability                                                                    | 9  |
|          |      | 2.2.1 NBTI Degradation Model                                                                             | 9  |
|          | 2.3  | Electromigration                                                                                         | 10 |
|          |      | 2.3.1 EM Degradation Model                                                                               | 12 |
| 3        | Rel  | iability Simulation                                                                                      | 13 |
|          | 3.1  | Reliability Simulation Flow                                                                              | 13 |
|          | 3.2  | Reliability Model Extraction                                                                             | 13 |
|          | 3.3  | Some Reliability Simulators                                                                              | 14 |
|          |      | 3.3.1 BERT                                                                                               | 14 |
|          |      | 3.3.2 GLACIER                                                                                            | 15 |
| 4        | Cor  | nclusion                                                                                                 | 17 |

# List of Figures

| 1.1 | Reliability bathtub curve                                                                                                    | 4  |
|-----|------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1 | Transistor undergoing HCI                                                                                                    | 7  |
| 2.2 | Reliability effects in CMOS inverter at various operating points                                                             | 8  |
| 2.3 | $I_D$ degradation over time due to HCI $\ldots \ldots \ldots$ | 9  |
| 2.4 | Transistor undergoing NBTI                                                                                                   | 10 |
| 2.5 | $I_D$ degradation over time due to NBTI $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$       | 11 |
| 2.6 | Voids in an IC wire due to electromigration                                                                                  | 11 |
| 3.1 | Typical reliability simulation flow                                                                                          | 14 |
| 3.2 | BERT simulation flow                                                                                                         | 15 |

# Introduction

Reliability has recently become one of the most important things to worry about in semiconductor design. In this age of miniaturization, particularly aggressive towards the designer, there is a fierce pressure to deliver ever more performance out of shrinking design margins, increasing role of secondary effects and integration density etc. In the past, designers used to trade off reliability with safety margins, keeping in mind the anticipated deterioration of the device over its intended lifetime. This all has changed due to very small design margins which a designer can no longer afford to trade off recklessly. Accurate understanding and modeling of reliability effects is becoming increasingly necessary for the viability of semiconductor industry [1].

This paper explores the phenomenon of reliability from a designer's perspective. It explains how failures occur over the lifetime of a semiconductor device and how they are related to the age of the device. Some important aging effects like HCI (Hot carrier Injection), NBTI (Negative Bias Temperature Instability) and EM (Electromigration) are then explained at the physical level. Steps for estimating and modeling these phenomena into the device models are also described. Finally, towards the end of the paper, reliability simulation setup is discussed and some modeling and simulating tools available to the designer are also explained.

### **1.1** Definition of Reliability

Reliability is defined as the ability of a system to maintain its desired functionality under all circumstances during its lifetime. A system is regarded as more reliable if its operation is maintained during unexpected and hostile circumstances as well. Hostile circumstances can be anything from environmental disturbances to failure of a component within the system.

In the context of semiconductors, reliability is mostly concerned with the aging of semiconductor materials and the bad effects it brings to the normal operation of the devices



Figure 1.1: Reliability bathtub curve [3]

in an integrated circuit.

### 1.2 Reliability Paradigm

To better understand reliability, we need to look closely at the failure rate vs lifetime profile of a system. Fig. 1.1 shows a typical reliability bathtub curve from which, lifetime can be divided into three distinct phases [2].

#### 1.2.1 Infant Mortality

This is the phase during which the devices which are manufactured with the extreme values in the tolerance region tend to fail very early in their operation. One way to minimize such number of devices is to improve the manufacturing process, but this is well beyond the scope of reliability from a designer's point of view. One thing that can be done is to identify such mal-manufactured devices and throw them away. This is done by the so called *burn-in testing* process. All the manufactured devices are subjected to elevated operating conditions for a short time to induce accelerated stress. Thus the devices which are close to the periphery of tolerance region eventually fail and are thrown away. This incurs a decrease in yield but it saves the manufacturer from the embarrassment of failure in field during the promised operating life.

#### 1.2.2 Normal Operating Life

It is the phase for which the manufacturer claims the system to work according to the specifications. During this time, devices incur stress and do degrade but still work within their performance specifications. Failure rate is low and remains fairly constant and is mainly due to — corner case operations that are not taken care of during the design, soft errors due to radiation and exceeding the allowed operating conditions. But the last issue is rarely a matter of concern since the operating conditions are already stipulated and it is the responsibility of the user to observe them. Failures in this phase can be minimized by stringent design/verification methods, providing immunity to soft errors and employing some safety margins to extend the operating conditions.

#### 1.2.3 Wear Out

This is the final phase of a device lifetime and once approached, the device has already been sufficiently aged and deteriorated that the manufacturer cannot further guarantee its successful operation. As time passes during this phase, failure rate increases as more and more devices fail due to aging deterioration. This is a natural phenomenon in all the semiconductor materials under stress and the designer has no control over it except to employ safety margins in the design which extend the normal operating life of the device.

As will be explained later, the purpose of designing for reliability is to salvage every last second of normal operating life of the device. This is achieved by the better understanding of aging effects responsible for device deterioration and modeling them to have an accurate estimate of their impact thus avoiding overestimation in safety margins.

# Physical Phenomena Affecting Reliability

There are several reliability effects for semiconductor devices, originating due to material fatigue and quantum phenomena. Accurately modeling these phenomena is of paramount importance to any reliability simulator. Hence a deeper understanding of the underlying physical processes is necessary. Most reliability effects are concerned with the degradation of device performance due to aging deterioration, the amount of which depends not only upon operating conditions (stress) but also the signal patterns. Under specific operating conditions, some effects are pronounced while others are negligible and vice versa [1], [2]. The following list summarizes major reliability effects.

- Transistor degradation
  - Hot Carrier Injection (HCI)
  - Negative Bias Temperature Instability (NBTI)
- Transistor abrupt failure
  - Field oxide breakdown
- Interconnect degradation
  - Electromigration (EM)
  - Self heating

In this paper, three important effects (HCI, NBTI and EM) are discussed.

### 2.1 Hot Carrier Injection

HCI is the most common and important type of aging effect which degrades both nMOS and pMOS transistors [1]. High energy moving carriers in the MOSFET channel (in the



Figure 2.1: Transistor undergoing HCI [1]

form of drain current) can create new electron-hole pairs (EHP's) upon impact ionization with the atoms in the channel. The newly generated minority carriers are attracted towards the gate electrode and thus trapped inside the gate oxide layer. The trapping of these foreign carriers creates interface states and alters the threshold voltage of transistor which in turn produces several other effects.

Consider, for example the case of nMOS transistor as shown in Fig. 2.1. Under very high lateral electric field in the channel (implies a large  $V_{DS}$ ), moving electrons which constitute drain current, gain enough kinetic energy to knock out EHP's from the channel atoms near the drain end. The holes of these EHP's are attracted towards the substrate and constitute the leakage current while the electrons are attracted towards the gate oxide and become trapped there creating interface states. This increases the threshold voltage of the nMOS transistor which then decreases the drain current for a given gate voltage. The end result of decreased drain current is that the transistor becomes slower in charging/discharging its load which causes serious timing problems in digital circuits. For analog circuits, HCI effects manifest as the degradation in transconductance of the transistors [1], [2].

#### 2.1.1 HCI Degradation Model

From the description above, it is evident that the transistor only incurs HCI when it is conducting drain current (implies transition current for CMOS gates). This means that HCI occurs only when a CMOS gate switches its state as indicated in Fig. 2.2. Also the more current a transistor is conducting, the more EHP's it will generate, enhancing HCI. Furthermore HCI is severe for short channel transistors due to high lateral electric field. Thus HCI will have increasing role in device degradation in the future technologies. Another important observation is that HCI is temperature dependent and increases at low



Figure 2.2: Reliability effects in CMOS inverter at various operating points. HCI at A and B. NBTI at C. [1]

temperatures because of the increase in mean free path of carriers, which allows them to gain more kinetic energy [1].

For modeling HCI, drain current  $(I_D)$  and substrate current  $(I_B)$  are good monitors. An important fact to remember is that all reliability effects are cumulative over the time, for device is under stress. Hence a unified parameter called AGE, dependent upon stress time, is defined as [4],

$$AGE(\tau) = \int_0^\tau \frac{I_B}{H \cdot W} \left(\frac{I_B}{I_D}\right)^m dt$$
(2.1)

where  $\tau$  is the time device is under stress, W is the transistor width, H and m are fitting constants. With this AGE parameter, degradation in  $I_D$  is modeled as,

$$I_{Dsat}(\tau) = I_{Dsat}(0) - (AGE(\tau))^n$$
(2.2)

where n is another fitting parameter determined experimentally. This HCI degradation model also requires accurate modeling of  $I_B$ , decoupled from other leakage components.

$$I_B = \frac{A_i}{B_i} (V_{DS} - V_{DSsat}) \cdot I_D \cdot \exp\left(-\frac{B_i \cdot L}{V_{DS} - V_{DSsat}}\right)$$
(2.3)

where  $A_i$ ,  $B_i$  are again some fitting parameters and L is the transistor length. Fig. 2.3 shows the result of degradation in  $I_D$  due to HCI effects.



Figure 2.3:  $I_D$  degradation over time due to HCI [4]

### 2.2 Negative Bias Temperature Instability

Unlike HCI, NBTI is only a pMOS aging effect [1]. It is a complex electro-chemical phenomenon which occurs at high temperatures with a high vertical electric field in the channel. It is widely believed that NBTI degradation is due to generation of interface traps, which are unsaturated silicon dangling bonds. One of the most successful models that has been able to explain NBTI phenomenon is the reaction diffusion model. This model proposes that the generation of interface traps is because of a hole induced electro-chemical reaction at the  $Si - SiO_2$  interface. The effect of trapped holes in the oxide is same as HCI i.e., increase in threshold voltage and decrease in drain current [5].

#### 2.2.1 NBTI Degradation Model

From the description above, it is evident that the pMOS transistor only incurs NBTI when it is turned on. It is not necessary for the transistor to be conducting current, thus NBTI is also occurring when the input of a CMOS gate is held low, in contrast to HCI. Furthermore NBTI is severe for thin oxide transistors due to high vertical electric field. Thus NBTI will also have increasing role on device degradation in the future technologies. Also since oxide thickness is not a design parameter, NBTI will occur in all pMOS transistors regardless of their lengths. This is again in contrast to HCI which is not significant for long channel transistors [1].

For modeling NBTI,  $V_{GS}$  is a good monitor. An AGE parameter is similarly defined



Figure 2.4: Transistor undergoing NBTI [1]

as [4],

$$AGE(\tau) = \int_0^\tau \sqrt[n]{A \cdot \exp\left(-\frac{\Delta H}{kT}\right) \cdot \exp\left(-\gamma V_{GS}\right)} dt$$
(2.4)

where  $\tau$  is the time device is under stress, n, A and  $\Delta H$  are fitting parameters determined experimentally,  $\gamma$  is the body effect parameter, k is the Boltzmann's constant and T is the absolute temperature. Degradation in  $I_D$  is modeled in the similar way,

$$I_{Dsat}(\tau) = I_{Dsat}(0) - (AGE(\tau))^n \tag{2.5}$$

Fig. 2.5 shows the result of degradation in  $I_D$  due to NBTI effects. One important fact is that NBTI degradation is recoverable under AC stress. Thus a more elaborate model must take this recovery into account to avoid overestimation.

### 2.3 Electromigration

EM refers to the formation of voids in the metal interconnect of an IC. This happens due to high current densities which exert a force on metal atoms and they slowly migrate over time to permanently form voids in the metal wire as shown in Fig. 2.6. The problem is more pronounced for aluminum interconnect than copper. Also DC current cause more damage to a wire than an AC current of same density, for the obvious reason of continual force being exerted by DC in a constant direction in contrast to AC. To avoid EM issues, safe limits are posed to current densities that can be carried through the IC interconnect wires. Due to the nature of EM phenomenon, there are different limits for AC, DC and peak current densities [2].



Figure 2.5:  $I_D$  degradation over time due to NBTI [4]



Figure 2.6: Voids in an IC wire due to electromigration [6]

#### 2.3.1 EM Degradation Model

Electromigration is modeled by Black's equation which relates mean time to failure of a wire due to EM effects to the current density and metal properties [1].

$$MTTF = \frac{A}{J^n} \exp\left(\frac{E_a}{kT}\right)$$
(2.6)

where A is a constant dependent upon structural and geometrical properties of the metal,  $E_a$  is the activation energy of the metal, J is the current density, k is the Boltzmann constant and T is the absolute temperature.

Black's equation requires the determination of worst-case current density limits. Both static and dynamic techniques exist for determining these limits. In static method which are applicable to digital circuits only, worst case current densities are determined not by applying stimuli but by utilizing switching probabilities and the current equations of logic gates [1].

$$I = \frac{C_L \ V_{DD}}{t} \tag{2.7}$$

where  $C_L$  is the capacitive load of the gate,  $V_{DD}$  is the supply voltage and t is the time which is taken as half of the rise or fall time for determining peak current limit while for average current limit, it is taken as the average time between the two switching events according to the switching probabilities.

On the other hand, dynamic methods rely on simulation and hence applicable to all circuits and are more accurate. The challenge in these methods is to determine stimuli which trigger worst case currents in the wires [1].

# **Reliability Simulation**

The purpose of reliability simulation is to verify if the device will perform within the specifications during whole of its lifetime. This is achieved by doing several circuit simulations utilizing fresh and aged device models. As mentioned earlier and is emphasized again, generation of aged device models is the key to accurate reliability simulation [1].

### 3.1 Reliability Simulation Flow

Fig. 3.1 shows a typical reliability simulation flow. At first, circuit simulation is run using fresh device models as usual. From the results of this simulation, impact of reliability effects is estimated and is used to select degraded models from a library of aged device models. These degraded models are then used for a second pass of simulation, to predict the behaviour of aged circuit [1].

### 3.2 Reliability Model Extraction

Aging deterioration of a device is a very slow process under normal operating conditions and it may take years for a device to degrade appreciably. To extract reliability effects model, devices are exposed to elevated voltage and temperature conditions to induce accelerated stress. With accelerated stress, devices undergo measurable amount of degradation in a short period of time. Reliability model parameters are then extracted and used to generate a library of aged device models at various points in time as indicated in Fig. 3.1 [1].

Important thing here to note is that the extracted models come from accelerated stress conditions whereas the effects of degradation in the actual circuit will be under normal operating conditions. So this method of model extraction relies on a fundamental assumption that the accelerated stress can be extrapolated back to normal operating conditions. Thus an intelligent accelerated stress scheme is necessary which quickly degrades the device by an appreciable amount while still be able to extrapolate with reasonable accuracy [1].



Figure 3.1: Typical reliability simulation flow [1]

Another concern in model extraction is that several reliability effects are occurring simultaneously. But the fact that certain effects are dominant in a certain region of operation while others are not and vice versa, can help. The burn-in method used to induce accelerated stress must use this information and be such as to stress the device only in a certain region of operation. In this way, reliability models can be extracted for each effect in turn [1].

### **3.3** Some Reliability Simulators

As mentioned earlier, reliability has become a major concern in semiconductor design, so there are now available several tools for analyzing reliability of circuits. A few famous of such tools are briefly discussed in this section.

### 3.3.1 BERT [7]

Berkeley Reliability Tools or BERT is a set of tools to simulate several reliability effects. It is modular in nature with each module dedicated to simulate one particular effect. The BERT modules are summarized below.

**1. CAS:** Circuit Aging Simulator simulates the hot-carrier (HCI) degradation in MOS transistors.



Figure 3.2: BERT simulation flow [7]

- 2. CORS: Circuit Oxide Reliability Simulator simulates the time-dependent oxide dielectric breakdown.
- **3. EM**: Electromigration Simulator simulates the IC interconnect degradation due to electromigration.
- 4. **BiCAS:** Bipolar Circuit Aging Simulator simulates the hot-carrier (HCI) degradation in bipolar transistors.

Fig. 3.2 shows the simulation flow of BERT. BERT is used in conjunction with a circuit simulator, usually SPICE. A SPICE netlist (also called deck for historical reasons) containing extra BERT commands and the device and reliability models are given to BERT preprocessor. Out of this data, preprocessor creates some intermediate files which are needed by postprocessor and passes the rest of the data to SPICE. Using the node waveforms obtained from SPICE simulation, postprocessor then performs the reliability simulation and generates result along with the AGE of each device in the circuit.

### 3.3.2 GLACIER [8]

GLACIER is a gate level HCI reliability simulator and so it is much faster than BERT and other transistor level simulators at the expense of accuracy. GLACIER models HCI effects into gate models as the degradation in their timing performance. The degraded timing ratio is defined as,

$$\alpha(T_{slew}, C_L, N_{SW}) = \frac{T_{aged}}{T_{fresh}}$$
(3.1)

where  $T_{fresh}$  and  $T_{aged}$  are gate delays of fresh and aged gate,  $T_{slew}$  is the input slew rate,  $C_L$  is the load capacitance and  $N_{SW}$  is the number of switching events that occur at the

input.

A degraded timing library is generated for all the standard cells. For reliability simulation, the amount of stress is determined by the input switching frequency and then the corresponding degraded standard cell is used in the simulation.

## Conclusion

In this paper, reliability concerns in semiconductor design have been discussed from a designer's viewpoint. Three important reliability effects — HCI, NBTI and EM have been covered in detail and steps to model them have been described. HCI affects both nMOS and pMOS transistors and is active only while transistors are conducting current (implies switching activity for CMOS gates). NBTI affects only pMOS transistors and is active when the transistor is on. Both of these effects manifest as reduction in drain current as the device ages which causes timing violations in digital circuits and transconductance reduction in analog circuits. Electromigration on the other hand is an interconnect degradation phenomenon and can be avoided by limiting current density in the wires to safe limits.

Reliability simulation requires aged models of devices which can be extracted while operating the circuit under elevated operating conditions (also called accelerated stress or burn-in). Under accelerated stress, devices age very quickly and incur measurable amount of degradation from which aged models can be extracted. These aged models are then extrapolated back to normal operating conditions while running reliability simulations.

Two of the reliability simulation tools have also been discussed briefly. BERT is a set of tools to simulate several reliability effects and is used in conjunction with a device level simulator, usually SPICE. GLACIER on the other hand is a gate level reliability simulator which only takes into account the degraded gate delays for aged circuit.

Increased interest in semiconductor reliability has lead to the development of several tools for circuit design and analysis. Reliability aware design saves a designer precious design margins from wasting into conservative guard bands. As future technologies will have even tighter design margins and increased parameter variations, reliability aware design will surely become a necessity instead of an option.

# Bibliography

- Zhihong Liu, Bruce W. McGaughy, and James Z. Ma. Design tools for reliability analysis. In DAC '06: Proceedings of the 43rd annual conference on Design automation, pages 182–187, New York, NY, USA, 2006. ACM.
- [2] Neil H. E. Weste, David Harris, and Ayan Banerjee. CMOS VLSI Design: A Circuits and Systems Perspective. Pearson Education, 3rd edition, 2005.
- [3] Wikipedia. Bathtub curve Wikipedia, the free encyclopedia, 2009. [Online; accessed 26-Feburary-2009].
- [4] Keith Green, Fuchen Mu, Gautam Kapila1, and Vijay Reddy. Simulation of circuit reliability with RelXpert. In *Cadence Designer Network User Group*. September 2005. [Online; accessed 1-March-2009].
- [5] Wikipedia. Negative bias temperature instability Wikipedia, the free encyclopedia, 2009. [Online; accessed 26-Feburary-2009].
- [6] Wikipedia. Electromigration Wikipedia, the free encyclopedia, 2009. [Online; accessed 26-Feburary-2009].
- [7] R.H. Tu, E. Rosenbaum, et al. Berkeley reliability tools—BERT. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 12(10):1524–1534, 1993.
- [8] Lifeng Wu, Jingkun Fang, et al. Glacier: A hot carrier gate level circuit characterization and simulation system for VLSI design. In *ISQED 2000: Proceedings of the IEEE First International Symposium on Quality Electronic Design, 2000*, pages 73–79, New York, NY, USA, 2000. IEEE.