

## Methods for Analysis of Robustness and Reliability of ICs

Daniel Müller-Gritschneder Institute for Electronic Design Automation TU München



#### Agenda

- Institute for Electronic Design Automation at TUM
- Reliability Analysis (D.Lorenz)
  - Aging Effects on Digital Circuits
  - Aging Analysis on Gate and Register-Transfer Level
  - Aging Monitors
- Robustness Validation (Martin Barke, Martin Radetzki)
  - Measuring robustness
  - Robustness as a probability
  - Robustness of Digital Circuits











#### TU München

- Total Students ~25000
- Students starting each year ~6500
- Students graduating each year ~3500
- Professors ~400
- Researchers ~5000
- Non-scientific staff ~2900









#### **TU München - Departments**





#### **Electrical Engineering and Information Technology**

- 2250 Students
- 37 Professors
- 329 Researchers (56% paid from 3rd party funding)
- 134 Non-scientific staff
- German study programs
  - B. Sc. in Electrical Engineering and Information Technology
  - M.Sc. in Electrical Engineering and Information Technology
- English study programs
  - M.Sc. in Communications Engineering (MSCE)
  - M.Sc. in Power Engineering (MSPE)
- Research fields
  - Electrical Power Engineering
  - Information and Communication Technology
  - Microelectronics
  - Circuits and Systems
  - Automation and Autonomous Systems





#### 🔿 MIET

#### Institute for Electronic Design Automation

- 1975 first German university EDA institute
- 21 members (15 PhD candidates)
- EDA tools and design methodology
- 2002: spin-Off MunEDA (<u>www.muneda.com</u>) (WiCkeD)
- 3<sup>rd</sup> party funding: Funding: BMBF, DFG, industry
- Industry partners: Infineon, Intel Mobile, Bosch, Atmel, Freescale,
- University partners: Berkeley, Bogazici Istanbul, Carnegie Mellon, KU Leuven, Sevilla



Institute for EDA Prof. U. Schlichtmann, PD H. Graeb http://eda.ei.tum.de



#### **Research Areas**





- Physical design
- Statistical timing analysis
- Reliability analysis
- High-level modeling/Virtual prototyping
- Analog EDA











### MIET 📀

#### Layout Synthesis

- Fast placement of a circuit on a chip with minimal total netlength and no cell overlap
- Kraftwerk:
  - Circuit is spring system: Cells are attracted by free spaces and pushed by each other
  - Minimize quadratic netlength
  - Won Placement Contest of ISPD 2006
- Future task: placement of 3D-ICs

P. Spindler, U. Schlichtmann, F. M. Johannes, "Kraftwerk2 - A Fast Forced-Directed Quadratic Placement Approach Using an Accurate Net Model", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 8, pp. 1398-1411, August 2008









#### Structural Analysis of Digital Circuits







casc. curr. mirror bank

differential stage



#### Analog Yield Optimization



CPU time: 25min (equal to 1 Monte-Carlo analysis)

| performance specification |      | initial |              | optimized |      |
|---------------------------|------|---------|--------------|-----------|------|
| gain [dB]                 | ≥65  | 76      | 2.5σ         | 76        | 4.2σ |
| transit frequency [MHz]   | ≥30  | 67      | 7.7σ         | 58        | 4.5σ |
| phase margin [°]          | ≥60  | 68      | <b>1.8</b> σ | 71        | 3.9σ |
| slew rate [V/µs]          | ≥32  | 67      | 6.3σ         | 58        | 3.9σ |
| DC power [mW]             | ≤3,5 | 2.6     | 1.1σ         | 2.3       | 4.2σ |
| overall yield             |      | 82.9%   |              | 99.99%    |      |







#### Virtual Prototyping

- Virtual prototypes based on Transaction Level Models (TLM):
  - Models of complete embedded systems (SystemC/TLM)
  - Platforms for early software development and design space exploration



- Ongoing work: Non-functional performance estimation
  - Prediction of execution times of software on embedded processors.
  - Prediction of communication delays for large buffer transfers (TLM+ modelling style, BMBF SANITAS project)
  - Prediction of power consumption on task level.



#### Error-Resilient System Level Design



The goal shouldn't be to eliminate failure; it should be to build a system resilient enough to withstand it. (Megan McArdle: In Defense of Failure. Time Magazin 11/2010)



#### Agenda

- Institute for Electronic Design Automation at TUM
- Reliability Analysis (D.Lorenz)
  - Aging Effects on Digital Circuits
  - Aging Analysis on Gate and Register-Transfer Level
  - Aging Monitors
- Robustness Validation (Martin Barke, Martin Radetzki)
  - Measuring robustness
  - Robustness as a probability
  - Robustness of Digital Circuits





#### Silicon Technology [Channel Length]

Source: Reliable Systems on Unreliable Fabrics, Todd Austin et al, IEEE Design & Test of Computers, 2008



#### **Causes for Circuit Failure**

| Permanent Errors                                                                 | Transient Errors                                                                         | Process Variations                                                                             | Parameter drift                                                                      |  |  |
|----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|--|--|
| Time Dependent<br>Dielectric Breakdown<br>(TDDB),<br>Electro-Migration (EM),<br> | Single Event<br>Upsets (SEU) due<br>to radiation,<br>Cross talk,                         | Variations in doping<br>concentration<br>densities, oxide<br>thicknesses, diffusion<br>depths, | Negative Bias<br>Temperature Instability<br>(NBTI)<br>Hot Carrier Injection<br>(HCI) |  |  |
| Statistical treatment<br>(What is the risk that<br>an IC will fail?)             | Statistical<br>treatment (What<br>is the risk that an<br>error at the<br>output occurs?) | Statistical treatment<br>(SSTA: What is the<br>Sigma and Mean)                                 | Deterministic treatment<br>(How large is the<br>degradation?)                        |  |  |
| 1 Timing Sign-Off                                                                |                                                                                          |                                                                                                |                                                                                      |  |  |



#### Dominant drift-related aging effects

|                  | <u>Negative Bias Temperature</u><br>Instability (NBTI) | <u>H</u> ot <u>C</u> arrier <u>Injection</u><br>(HCI)   |
|------------------|--------------------------------------------------------|---------------------------------------------------------|
| Device           | PMOS                                                   | PMOS & NMOS                                             |
| Modeled by       | Threshold voltage drift $\Delta V_{th}$                | Degradation of drain saturation current $\Delta I_{on}$ |
| Stress condition | Transistor in inversion                                | Transistor switches                                     |





#### Traditional timing sign-off





#### Timing sign-off considers aging (Aging analysis)





#### Aging analysis not yet part of (digital) design flow

- Today: reliability concerns handled by safety margins
  - Overestimate: Performance wasted
  - Underestimate: Redesign / customer returns
- Available: Aging analysis on transistor level (e.g. RelXpert)
  - Up to several thousand transistors
  - Not applicable for timing sign-off
- Goal : Aging analysis on higher abstraction levels
  - Aging-aware gate model necessary
  - Aging-aware RTL models to analyze circuit in early design stages



#### Output arrival times over lifetime



- 90nm industrial cell library
- Output 866 and 874 change order: uncritical path can become critical due to aging

- Operating conditions: P<sub>nom</sub>, 27°C, 0.9V
- Use profile: 125°C, 1.32V, 10y
- Workload by probabilistic method



# Gate performance degradation given by parameter drift & gate sensitivity



Degradation of gate performance (e.g. gate delay)



### 🔊 MIET

#### Logarithmic lifetime dependence

Drift over lifetime (DC stress)



Aging ⇒ Drift increasing with time

- Modeling degradation at End of Lifetime (EOL) is sufficient
- Circuit recovers in unstressed condition
- Never recovers to state before stress period





#### Supply voltage and temperature affect drift & sensitivity

drift 90nm; regVt; t=10a ; SP=0% (wc); W=10 $\mu$ m; Lmin -0.06 -0.06 -0.04 -0.04 -0.02 -0.04 -0.02 -0.04 -0.02 -0.04 -0.02 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.04 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.05 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -0.55 -





- Effective value ⇒ drift (value over lifetime)
- Current value ⇒ sensitivity
- Worst-case scenario: Stressed in V<sub>DD,HIGH</sub>-Mode & operating in V<sub>DD,LOW</sub>-Mode ⇒ high drift and high sensitivity



Workload: Signal Probability and Transition Density

• Signal probability (SP): Probability that signal has the logic value '1'



- Transition density (TD): Average number of signal transitions/time unit
- SP=0.5 TD=0.5





Drift is dependent on workload

- Workload = input signals/lifetime
- Input signals define stress
  condition
- Real signal:



 Periodic signal with same signal probability SP & transition density TD



Drift over signal probability (SP) due to NBTI

- SP & TD for reliability analysis by:
  - Simulation (input vector!)
  - Probabilistic methods
  - Not avail. ⇒ worst case



#### Gate sensitivity for different technologies



 Gates are more sensitive to a transistor parameter drift in newer technologies (because of lower supply voltage)



#### Technology trend:

Performance degradation for different stress profiles



\*C65LP netlist with exchanged transistor models





#### Agenda

- Institute for Electronic Design Automation at TUM
- Reliability Analysis (D.Lorenz)
  - Aging Effects on Digital Circuits
  - Aging Analysis on Gate and Register-Transfer Level
  - Aging Monitors
- Robustness Validation (Martin Barke, Martin Radetzki)
  - Measuring robustness
  - Robustness as a probability
  - Robustness of Digital Circuits



Static Timing Analysis (STA) – Ramp Signal





#### Static Timing Analysis – Delay Models



Gate delay d depends on input slope slope<sub>IN</sub>





#### Static Timing Analysis – Delay Models



Gate delay different for rising/falling edges



#### Static Timing Analysis – Delay Models



- Gate delay different for different input pins
- Modeled by different edges in timing graph (TG)
- Delay from A to C is different to delay from B to C for same input signal






Aging-aware gate model required for aging analysis





load

# State of the Art - Table based approach



- ⊗ New use profile → library recharacterization
- Characterize gate for varying workload conditions
- Existing analysis flow can be reused





State of the Art – Paul et al. (Alpha-power Law)

 Estimation of gate delay with αpower-law model:



- Only delay, no slope
- $\ensuremath{\mathfrak{S}}$  Only threshold voltage drift  $\Delta V_{th}$
- $\bigcirc$  **One**  $\Delta V_{th}$  for **all** transistors
- Use profile independent

\*"Temporal Performance Degradation under NBTI: Estimation and Design for Improved Reliability of Nanoscale Circuits", Paul et al., DATE'06



# State of the Art – Sapatnekar & Cao

Dependence on ΔV<sub>th</sub> obtained during characterization:

 $d_{age} = d_0 + f\left(\Delta V_{th}\right)$ 

- Radom signal represented as equivalent periodic signal described by Signal Probability (SP) (Sapatnekar)
- Long term prediction model for  $\Delta V_{th}$ : closed form for upper bound of reaction-diffusion

Only delay, no slope

- $\ensuremath{\mathfrak{S}}$  One  $\Delta V_{th}$  for all transistors
- Use profile independent



model (Cao) \*"An Analytical Model for Negative Bias Temperature Instability", Sapatnekar et al., ICCAD'06 "The impact of NBTI on the Performance of Combinational and Sequential Circuits", Cao et al., DAC'07



#### Comparison of approaches

|                                 | GLACIER      | Paul         | Sepatnekar   | Cao          | AgeGate*     |
|---------------------------------|--------------|--------------|--------------|--------------|--------------|
| NBTI (with recovery)            |              | <(√)         | ✓ (✓ )       | ✓(✓)         | ✓(×)         |
| HCI                             | $\checkmark$ |              |              |              | $\checkmark$ |
| Individual transistor<br>drifts |              |              |              |              | $\checkmark$ |
| Aged output slope               |              |              |              |              | $\checkmark$ |
| Use profile independent model   |              | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |

\*"Aging Analysis of Circuit Timing Considering NBTI and HCI", Lorenz et al., IEEE International On-Line Testing Symposium (IOLTS) 2009



#### Aging aware AgeGate gate model consists of 3 parts

| Canonical gate model |
|----------------------|
|----------------------|

**Degradation equations** 

Structural information



# Gate model provides aged gate performance for drifts





# Drifts calculated by degradation equations

Canonical gate model

Degradation equations

• NBTI:  $\Delta V_{th,m} = f(\mathbf{UP}, t_{\mathrm{stress,NBTI,m}}, W_m, L_m)$ • HCI:  $\Delta I_{on,m} = f(\mathbf{UP}, t_{\mathrm{stress,HCI,m}}, W_m, L_m)$   $t_{\mathrm{stress,NBTI,m}} = P_{NBTI,m} \cdot t_{\mathrm{life}}$   $t_{\mathrm{stress,HCI,m}} = P_{HCI,m} \cdot t_{\mathrm{life}}$   $P_{NBTI(HCI),m}$ : Probability that transistor m in stress cond.  $\mathbf{UP}$ : use profile: Temp, Vdd  $W_m, L_m$ : transistor sizes

**Structural Information** 



# Structural information needed for drift calculation





# Calculating stress probability for NBTI

NBTI stress condition for transistor M:

- 0V applied to gate contact of M
- Source contact of *M* has to be at Vdd

#### NOR3



Example:

- $M_A$  stressed  $\Leftrightarrow$  A at "0"
- M<sub>B</sub> stressed ⇔ Signals B & A at "0"



# Calculating stress probability for NBTI

Stress condition for transistor  $M_B$ :

- 1-SP<sub>B</sub>: logic "0" applied to gate contact of M<sub>B</sub>
- 1-SP<sub>A</sub>: logic "1" applied to source contact of M<sub>B</sub>



$$1 - SP_B = P(M_B \text{``in Inversion''})$$
  
 $1 - SP_A = P(M_A \text{ in ``Inversion''})$ 

For independent signals  

$$P_{\text{NBTI,B}} = P(\mathsf{A} \text{ is } '0' \land \mathsf{B} \text{ is } '0')$$
  
 $= (1 - SP_A) \cdot (1 - SP_B)$ 

For dependent signals: Worst-case assumption  $P_{\text{NBTI,B}} = P(\mathsf{A} \text{ is } '0' \land \mathsf{B} \text{ is } '0')$  $= \min((1 - SP_A), (1 - SP_B))$ 



#### Aging analysis flow

Use profile specification

#### Workload determination

- Logic simulation
- Probabilistic method
- Specification of worst-case values

#### Timing analysis

- Compute stress probability
- Compute parameter drift
- Compute gate performances



# Degradation of critical path delay for ISCAS'85 benchmark circuits



- Industrial 90nm cell library
- Use profile:
   125°C, 1.32V, 10y
- Measurement conditions: 27°C, 0.9V
- Worst-case analysis
   (SP=0 and TD=2 for all nets)
- Runtime: 35s for c7552
- Both effects are relevant
- Not considering aged output slope → 24% underestimation



# Comparison: w and w/o individual transistor drifts



Workload estimation with probabilistic method (*SP*=0.5 and *TD*=0.4 at all inputs)

w/o individual transistor drifts → degradation 20% overestimated



# Timing Model on Register Transfer Level (RTL)



- RTL: Mapping of logical/arithmetical operations to time slot of duration of the clock period T<sub>0</sub>
- Example: Adder operation must finish in one clock period
- Aged circuit *might* fail because operation takes longer than T<sub>0</sub>

Aging-aware timing model (TM) at RTL enables:

- Considering impact of aging on system early in design process
- Quick performance determination at system level
- Design space exploration



# Timing models on RTL



- Functional unit delay determined by critical path delay
- w/o aging: one critical path
- with aging: multiple possible critical timing paths



# Proposed aging-aware RTL Timing Model\*

# Idea: Reduced timing graph (TG) that just contains **possible critical paths (PCP)**

A possible critical path (PCP) is the critical path of a degraded circuit for a defined combination of temperature T, supply voltage V, workload of the input signals and lifetime t.



schematic



timing graph



reduced timing graph

- As accurate as timing analysis on gate level
- Speed-up due to reduced TG

\*"Aging model for timing analysis at register-transfer-level", Lorenz et al., TAU 2010



Aging-aware gate model needed for generation/evaluation of RTL Timing Model

operating conditions over lifetime & workload





 $AO \longrightarrow OZ$  $[d_0, d_{age, max}]$   Approach independent of used gate model

 Interval for gate delay because actual gate delay unknown during characterization

d<sub>0</sub>: fresh gate delay d<sub>age</sub>: aged gate delay



# Reduction of the Timing Graph

- Identification of elements that are not part of a possible critical path:
  - 1. Block-based reduction step
  - 2. Path-based reduction step
  - 3. Reconvergent fan-out reduction step
- Validity region has to be specified:
  - max lifetime
  - max temperature
  - max supply voltage





## 1. Block-based reduction step

"Sum" and "max":  $sum([A_l, A_u], [B_l, B_u]) = [A_l + B_l, A_u + B_u]$  $max([A_l, A_u], [B_l, B_u]) = [max(A_l, B_l), max(A_u, B_u)]$ 

"Lesser than":

$$[A_l, A_u] < [B_l, B_u] = A_u < B_l$$



 Static timing analysis with intervals



# 2. Path-based reduction step



 Delay of possible critical paths > critical path delay of fresh circuit !



#### 3. Reconvergent fan-out reduction step



 Checks whether the faster of two sub-paths is also a possible critical path



#### Results

- RC-adder, CLA-adder, ISCAS'85 benchmarks
- 90 nm industrial cell library
- Reduction (ratio) =  $\frac{\text{removed nodes (edges)}}{\text{nodes (edges) of original TG}}$
- Two validity regions specified:
  - − Region 1: 100°C; 1.2V; 5y  $\rightarrow$  max  $\Delta V_{th} = 8\%$
  - − Region 2: 125°C; 1.32V; 10y  $\rightarrow$  max ΔV<sub>th</sub> = 17%





# Achieved reduction ratios



- Mean reduction: 88% for nodes 92% for edges
- Speed-up compared to timing analysis on gate level:
  56x (mean 25x)
- Characterization time: < 1min (except c432: 7min)





# Agenda

- Institute for Electronic Design Automation at TUM
- Reliability Analysis (D.Lorenz)
  - Aging Effects on Digital Circuits
  - Aging Analysis on Gate and Register-Transfer Level
  - Aging Monitors
- Robustness Validation (Martin Barke, Martin Radetzki)
  - Measuring robustness
  - Robustness as a probability
  - Robustness of Digital Circuits



Impact of workload on degradation (c880)



MCA over SP (workload) at inputs

Workload and operating conditions are unknown:

- Worst-case design
- Better: Monitor and react



# Ways to monitor aging

- Single device monitors
  - Measure threshold voltage drift
  - Hard to correlate to circuit performance
- Generic test structure (e.g. inverter ring oscillator)
  - Measure oscillating frequency
  - Neglects workload impact
- Circuit replica (e.g. critical path replica)
  - Measure delay/oscillating frequency
  - Neglects workload impact
- Delay fault test
  - Measure delay of current critical path
  - Only safe way: Workload impact considered
  - Operation must be paused



Requirements for delay fault monitor

- Built-In Self Test (BIST)
- (Enhanced) scan design
- Test vectors for all possible critical paths

Possible critical path (PCP):

A possible critical path (PCP) is the critical path of a degraded circuit for a defined combination of temperature T, supply voltage V, workload of the input signals and lifetime t.



#### Enhanced scan design





Ways the system can react

- Disable degraded circuit (e.g. one core of multi-core processor)
- Reduce clock frequency
- Increase supply voltage (Caution: accelerates aging)
- Replace degraded circuit by redundant one (degraded circuit can recover)
- Use degraded circuit for uncritical tasks (probabilistic CMOS)



State of the art: Path selection approaches (Test vectors for all possible critical paths)

- Nominal case:
  - "On Path Selection in Combinational Logic Circuits", Li, TCAD'89
  - "Finding a small set of longest testable paths that cover every gate", Sharma, ITC'02
- Process variation:
  - "Longest-path selection for delay test under process variation", Lu, TCADICS'05
  - "Statistical Path Selection for At-Speed Test", Zolotov, TCADICS'05
- Aging:
  - "Testing for transistor aging", Baba, VTS'09



# Identifying PCPs



Basic idea: Reduced timing graph (TG)

- Gate delays modeled as intervals
- Remove edges & nodes not part of a possible critical path



# Results

- ISCAS benchmark circuits
- Inverter, NAND and NOR gates from 90nm industrial cell library
- Operating conditions over lifetime:
  - Supply voltage: 1.32V
  - Temperature: 125°C
  - Lifetime: 10y
- Only NBTI



# Possible Critical Paths (PCPs)

| Circuit  | #Paths  | #PCPs<br>[Baba] | Factor<br>[Baba] | #PCPs<br>[Ours] | Factor<br>[Ours] |
|----------|---------|-----------------|------------------|-----------------|------------------|
| c17.v    | 18      | 3               | 6 x              | 3               | 6 x              |
| c432.v   | 123652  | 157             | 788 x            | 157             | 788 x            |
| c499.v   | 452608  | 1487            | 304 x            | 375             | 1207 x           |
| c880a.v  | 16956   | 98              | 173 x            | 74              | 229 x            |
| c1355.v  | 522368  | 3376            | 155 x            | 2224            | 235 x            |
| c1908.v  | 1536434 | 4596            | 334 x            | 2091            | 735 x            |
| c2670a.v | 31286   | 21              | 1490 x           | 21              | 1490 x           |
| c3540a.v | 4248254 | 15276           | 278 x            | 1345            | 3159 x           |
| c5315a.v | 738816  | 1568            | 471 x            | 899             | 822 x            |
| c7552.v  | 448564  | 3173            | 141 x            | 522             | 859 x            |
| MEAN     |         |                 | 414 x            |                 | 953 x            |

 Mean reduction of paths: 953x (414x [Baba])





# Agenda

- Institute for Electronic Design Automation at TUM
- Reliability Analysis (D.Lorenz)
  - Aging Effects on Digital Circuits
  - Aging Analysis on Gate and Register-Transfer Level
  - Aging Monitors
- Robustness Validation (Martin Barke, Martin Radetzki)
  - Measuring robustness
  - Robustness as a probability
  - Robustness of Digital Circuits


**Reliability and Robustness** 

#### • Reliability:

- A circuit is reliable, if it has a high probability to operate correctly according to its specification during its lifetime.
- Specifications covers customers use cases.

#### Robustness:

- A circuit is robust, if it has a high probability to operate correctly even under conditions outside the specification during its lifetime.
- Specification does not cover all possible use cases.
- Safety critical applications in hard-to-predict enviroments.
- Investigate complete range of operating conditions, in which circuits operate correctly -> Allows comparison of different implementations.



What's the use of measuring robustness?

- Designers can choose between different implementations of a circuit depending on the specification and the mission profile, e.g., temperature over lifetime.
- Possible optimization techniques





System view



- Functionality: system transforms inputs
  → outputs
- Additional "input": operating conditions
  - nominal value,  $\pi^{nom}$
  - deviation around nominal conditions, perturbation  $\boldsymbol{\Pi}$
- Additional "output": generation / guarantee of properties,  $\phi^{nom}$  and  $\Phi$



Specification of required properties



- Application A requires  $\Phi_A$
- Application B requires  $\Phi_{B}$
- A chip that supports A and B must meet both,  $\Phi_A$  and  $\Phi_B$
- The chip specification may further narrow down the properties Φ (robustness margin)



### Specification of tolerated operation conditions



- Application A must tolerate  $\Pi_A$
- Application B must tolerate Π<sub>B</sub>
- A chip that supports A and B must tolerate Π<sub>A</sub> or Π<sub>B</sub>
- The chip specification may further extend the operating range  $\Pi$  (robustness margin)

Specifies *against what*  $\Phi$  of *f* is robust (operating conditions perturbation  $\Pi$ )



## A non-robust design



- $\pi \in \Pi$  exists so that  $f(\pi) \quad \Phi_{\mathcal{E}}$
- Robustness  $\rho_f(\Pi, \Phi) = 0$



# A robust design



- Robust closure  $\Pi^*$  so that  $\Pi \subseteq \Pi^*$
- For all  $\pi \in \Pi^*$ :  $f(\pi) \in \Phi$
- Robustness  $\rho_f(\Pi, \Phi) > 0$

Whereas  $\Pi$  is specified,  $\Pi^*$  is characteristic of the design



# Quantifying robustness



- $\pi_{2} \qquad \Pi^{*}$   $\rho_{f}(\Pi, \Phi)$   $\Pi$   $C(\Pi^{*}) \qquad \pi_{1}$
- Robustness against π<sub>i</sub> = min distance (1-dim) between points from C(Π\*) and Π with identical coordinates except on axis *i*
- Global robustness = min distance (m-dim) between any two points from C(Π\*) and Π, may be smaller than all ρ against π<sub>i</sub>



## Comparison of designs





# Agenda

- Institute for Electronic Design Automation at TUM
- Reliability Analysis (D.Lorenz)
  - Aging Effects on Digital Circuits
  - Aging Analysis on Gate and Register-Transfer Level
  - Aging Monitors
- Robustness Validation (Martin Barke, Martin Radetzki)
  - Measuring robustness
  - Robustness as a probability
  - Robustness of Digital Circuits



### Robustness as a probability



Radetzki, M.; Bringmann, O.; Nebel, W.; Olbrich, M.; Salfelder, F.; Schlichtmann, U.: Robustheit nanoelektronischer Schaltungen und Systeme, 4. GMM/GI/ITG-Fachtagung Zuverlässigkeit und Entwurf, Wildbad Kreuth, September 2010.





# Agenda

- Institute for Electronic Design Automation at TUM
- Reliability Analysis (D.Lorenz)
  - Aging Effects on Digital Circuits
  - Aging Analysis on Gate and Register-Transfer Level
  - Aging Monitors
- Robustness Validation (Martin Barke, Martin Radetzki)
  - Measuring robustness
  - Robustness as a probability
  - Robustness of Digital Circuits



Robustness validation in the context of aging analysis

- Properties  $\Phi$ : frequency **f**
- Operating conditions Π: temperature T, supply voltage V<sub>dd</sub>





# Conclusion

- Sensitivity of IC circuits to parameter drifts increases with shrinking technologies -> This causes a reliability problem.
- Aged gate and RTL models:
  - Several possible critical paths must be considered during timing sign-off
  - High dependence on workload/operating conditions
  - Aging monitors one possibility to observe workload dependent aging effects
- Robustness
  - Evaluate circuits based on a mission profile that covers cases outside specified enviromental conditions.
  - Extended to include ageing degradation effects.