

# Particle Detectors for Self-Adaptive Fault-Tolerant Systems

Marko Andjelkovic, Junchao Chen, Milos Krstic



innovations for high performance microelectronics

21<sup>st</sup> and 22<sup>nd</sup> April, 2021 2<sup>nd</sup> ELICSIR Training School





### 1 Background

- 2 State-of-the-Art Semiconductor Particle Detectors
- 3 Particle Detectors Developed at IHP
  - **3.1** *Embedded SRAM as a Particle Detector*
  - **3.2** *Pulse Stretching Inverter Chain as a Particle Detector*
- 4 Comparison of Particle Detectors
- 5 Application in a Self-Adaptive Multiprocessing System

6 Summary

# ibp

## 1. Background

### Space ionizing radiation

- Complex (wide range of sources and energies)
- Dynamic (variable radiation intensity)

### Radiation sources in space

- Radiation trapped in Earth's magnetic field (Van Allen belts)
- Galactic Cosmic Rays (GCRs)
  - From deep space
- Solar Particle Events (SPEs)
  - Solar flares and coronal mass ejections from the Sun



[Illustration from https://www.nasa.gov]



### 1. Background

Due to Solar Particle Events, the particle flux in space may increase by
 2 – 6 orders of magnitude during a period of several hours or days



elicsirproject

# 1. Background

### Single Event Effects (SEEs)

- Major reliability threat for Integrated Circuits (ICs) used in space applications
- Caused by a single energetic particle (e.g. proton, neutron, heavy ion)
- Soft SEEs: temporary impact (data loss)
- Hard SEEs: permanent physical damage
- Soft SEEs are critical for nano-scale ICs:
  - Single Event Transients (SETs) voltage glitches in combinational logic
  - Single Event Upsets (SEUs) bit flips in memory and sequential logic







Soft Error Rate (SER) – number of soft errors in a system, induced by SETs and SEUs in a given time interval

$$SER = \sum_{i=1}^{N} SER_{NOMINAL}(i) \times SER_{DERATING}(i)$$

N = number of components in the system

SER is strongly influenced by particle flux and LET

$$SER_{NOMINAL} = k \cdot Flux \cdot Area \cdot e^{-Q_{CRIT}/Q_S}$$

 $SER_{DERATING} = Logical_{DER} \times Electrical_{DER} \times Timing_{DER}$ 

depend on LET

1. Background



### Fault-tolerance techniques (to reduce the total SER):

- Static fault-tolerance: cannot be changed once it is implemented
- <u>Dynamic (adaptive) fault-tolerance</u>: can be adjusted during the runtime

### Self-adaptive fault-tolerance

- Trade-off between performance, power consumption and radiation hardness
- Activation of fault-tolerant mechanisms only under critical radiation levels

### Key requirement for self-adaptive fault-tolerance in space

Measurement of radiation intensity (particle flux and LET)

### Requirements for particle detectors for self-adaptive systems

- Ability to monitor particle flux and LET
- Possibility of integration on the same chip with the target system
- Low cost (low hardware and power overhead)
- Low detection latency (fast response)
- Immunity to false alarms, multiple errors and error accumulation

### Most common semiconductor particle detectors

- Diode-based detectors
- SRAM-based detectors
- Bulk built-in current detectors
- Acoustic wave detectors
- > 3D NAND flash detectors



None of these detectors satisfies all requirements

V<sub>BIAS</sub>

### Diode Detectors

- > Reverse-biased *pn* junction as a particle sensor
- Pulsed-current or direct-current response
- Various implementations (e.g. strips, pixels)
- Commercial components or custom-designed detectors

elics (project

#### Strengths:

- Flux and LET detection
- High detection efficiency

#### Weaknesses:

- Mixed-signal readout
- Difficult on-chip integration





### SRAM Detectors

- Flux is measured in terms of SEU rate
- SEU detection with scrubbing and error detection and correction (EDAC) logic

elicsir

- Stand-alone chip implementation
- Commercial data storage medium

#### Strengths:

- Fully digital processing
- Low cost implementation

#### Weaknesses:

- Cannot detect particle LET
- Prone to multiple errors
- Large area overhead
- High latency



WL



Two transistors in a 6T cell are sensitive to SEUs







### Bulk Built-in Current Sensor (BBICS)

- Connected to transistors' bulk terminal
- Detection of particle-induced current pulse
- Detected current pulse is transformed into transient voltage pulse (alarm signal)

elicsir





G. Wirth et al., Microelectronics Reliability, 2008

Two BBICS are required for each circuit, for PMOS bulk and NMOS bulk

Each BBICS can monitor 1000s of transistors

#### www.ihp-microelectronics.com



Silicon Surface

Acoustic Wave Detector

Particle Strike Location

Acoustic Wave

5

### Acoustic Wave Detectors

- Detects acoustic waves generated in substrate by an incident energetic particle
- Cantilever-like structure is used as a detector
- Particle strike causes the change of capacitance of cantilever

elicsir

#### Strengths:

- \* Detection of strike location
- Several on-chip sensors are \* sufficient

#### Weaknesses:

www.ihp-microelectronics.com

- Mixed-signal readout
- *Cannot detect particle LET*



G. Upasani et al., IEEE Trans. on Computers, 2016





- Ionizing radiation causes the change of transistors' threshold voltage
- Commercially available for data storage  $\geq$





#### Strengths:

- Detection of flux, LET and angle of incidence
- *Low sensitivity to multi-bit* \* errors

#### Weaknesses:

- Difficult on-chip integration
- *Complex processing logic*



### Two alternatives to existing particle detectors:

- Embedded SRAM as a particle detector
  - J. Chen *et al.*, Electronic Circuit with Integrated SEU Monitor, European Patent Application EP 3 748 637 A1, Bulletin 2020/50.
  - J. Chen *et al.*, Prediction of Solar Particle Events with SRAM-Based Soft Error Rate Monitor and Supervised Machine Learning, Microelectronics Reliability, 2020.
- Pulse stretching inverter chain as a particle detector
  - M. Andjelkovic *et al.*, A Particle Detector Based on Pulse Stretching Inverter Chain, in Proc. International Conference on Electronic Circuits and Systems (ICECS), 2019.
  - M. Andjelkovic *et al.*, Monitoring of Particle Count Rate and LET Variations with Pulse Stretching Inverters, IEEE Transactions on Nuclear Science, 2021. (Accepted paper)

### On-chip data storage (SRAM) is used also as a particle detector

Negligible area and power overhead compared to stand-alone SRAM detectors

#### Same operating principles as standalone SRAM detectors

- > Number of SEUs in a given time interval (once per hour) is measured
- Standard <u>scrubbing</u> and <u>Single-Error-Correction Double-Error-Detection (SEC-DEC)</u> procedures used to correct single errors

### Additional function

Detection of permanent errors in SRAM



roiect



### **20** Mbit embedded SRAM as a particle detector



#### www.ihp-microelectronics.com I elicsifpr

# **3.1 Embedded SRAM** as a Particle Detector

### Error detection and correction flow

- Scrubbing procedure reads all memory words to detect errors
- Re-scrubbing the memory word when a new error is detected
- Single bit error is corrected in the 1<sup>st</sup> scrubbing round
- Error type is determined in the 2<sup>nd</sup> scrubbing round
- > Error address is logged in register file
- Detection latency = scrubbing period ≈ 50 ms







### Synthesis results for 20 Mbit SRAM

| Parameter                                | Value  |
|------------------------------------------|--------|
| Technology (µm)                          | 0.13   |
| Supply Voltage (V)                       | 1.2    |
| Frequency ( <i>MHz</i> )                 | 50     |
| Total area ( <i>mm</i> <sup>2</sup> )    | 14     |
| Total power dissipation (mW)             | 384    |
| 'Non-SRAM' part* area (mm <sup>2</sup> ) | 0.0957 |
| 'Non-SRAM' part power ( <i>mW</i> )      | 0.211  |

# Area and power dissipation comparison between 20 Mbit SRAM and 'Non-SRAM' part

|                                 | 20 Mbit SRAM  | Non-SRAM part |
|---------------------------------|---------------|---------------|
| Area ( <i>mm</i> <sup>2</sup> ) | 13.9 (99.3 %) | 0.0957 (0.7%) |
| Power consumption ( <i>mW</i> ) | 383 (99.9%)   | 0.211 (0.1%)  |

#### Area and power dissipation comparison in the 'Non-SRAM' part

|                                 | SEU Monitor  | EDAC + Scrubbing +<br>Control Unit |
|---------------------------------|--------------|------------------------------------|
| Area ( <i>um</i> ²)             | 95739 (84 %) | 18706 (16%)                        |
| Power consumption ( <i>mW</i> ) | 0.211 (80%)  | 0.054 (20%)                        |

\*: 'Non-SRAM' part contains **control unit**, **SEU monitor**, **EDAC** and **scrubbing module** 



#### Two skew-sized CMOS inverters act as a sensor

- > Off-state transistors are sensitive to particles
- On-state transistors are restoring elements
- Pulse stretching allows to detect low-LET particles

### Basic operating principles

- > SET count rate is proportional to flux
- SET pulse width variation is proportional to LET variation

### Transistor sizing guidelines

- Large off-state transistors (larger sensitive area)
- Small on-state transistors (reduced restoring current)

#### Pulse Stretching Cell (PSC)



#### elics (Project

### For sufficiently large sensing area, a multi-PSC configuration is needed

- Serial PSC configuration
- Parallel PSC configuration
- Serial PSC configuration
  - Cannot detect LET variation due to variable sensitivity across the chain
  - High latency (tens or hundreds ns)

### Parallel PSC configuration

- Can detect LET variation in terms of SET pulse width change
- Low latency (several ns)





### Simulated SET pulse width dependence on LET (for parallel PSC configuration)

- Increasing the number of PSCs in parallel decreases the SET pulse width
- Up to 12 PSC in parallel ensure detectability of SET pulses
- SET pulse width increases by 550 ps for LET from 1 to 100 MeVcm<sup>2</sup>mg<sup>-1</sup>



### Digital readout circuit

- Multiple parallel PSC arrays are connected to an OR-tree
- Detected SETs are filtered and grouped in several pulse width ranges
- Counters store the number of detected SETs in each width range
- Each SET pulse width range corresponds to a unique LET range







| Type of detector             | Readout<br>method | Hardware<br>overhead | Detection<br>latency | Prob. of<br>false alarms | Multiple<br>errors | LET<br>detection |
|------------------------------|-------------------|----------------------|----------------------|--------------------------|--------------------|------------------|
| Current sensor               | Digital           | < 30 %               | < 10 CC              | Medium                   | No                 | No               |
| Acoustic wave                | Mixed-mode        | < 20 %               | < 100 CC             | Medium                   | No                 | No               |
| Diode                        | Mixed-mode        | > 100 %              | 100s CC              | Low                      | No                 | Yes              |
| Stand-alone SRAM             | Digital           | > 100 %              | > 1000 CC            | Low                      | Yes                | No               |
| 3D NAND flash                | Mixed-mode        | > 50 %               | 100s CC              | Low                      | No                 | Yes              |
| Embedded SRAM                | Digital           | <1%                  | > 1000 CC            | Low                      | Yes                | No               |
| Pulse streching<br>inverters | Digital           | < 20 %               | < 10 CC              | Low                      | No                 | Yes              |
|                              |                   |                      |                      |                          | CC =               | clock cycles     |

www.ihp-microelectronics.com

# 5. Application in a Self-Adaptive Multiprocessing System

### Multiprocessing (multi-core) system

- Inherent hardware redundancy
- Operating modes are selected by reconfiguring the cores
- On-chip sensors enable real-time
  SER monitoring
- Three operating modes:
  - De-stress (low-power) mode
  - High performance mode
  - Fault-tolerant (rad-hard) mode





- One or more cores operate, while others are switched off
- High performance mode
  - All cores operate in parallel, i.e. execute different tasks
- Core-level fault tolerant mode
  - Dual Modular Redundancy (DMR)
  - Triple Modular Redundancy (TMR)
  - Quad Modular Redundancy (QMR)

# 5. Application in a Self-Adaptive Multiprocessing System

# Case study: Self-adaptive quad-core system





## 5. Application in a Self-Adaptive Multiprocessing System

### Case study: Self-adaptive quad-core system

- De-stress mode
  - One or more cores operate, while others are switched off

#### High performance mode

- All cores operate in parallel, i.e. execute different tasks
- Fault tolerant mode
  - Dual Modular Redundancy (DMR)
  - Triple Modular Redundancy (TMR)
  - ✤ Quad Modular Redundancy (QMR)





4 different tasks



One or more cores operate, while others are switched off

Case study: Self-adaptive guad-core system

- High performance mode
  - All cores operate in parallel, i.e. execute • different tasks

### Fault tolerant mode

- Dual Modular Redundancy (DMR)
- Triple Modular Redundancy (TMR)
- Quadruple Modular Redundancy (QMR)







 One or more cores operate, while others are switched off

Case study: Self-adaptive guad-core system

- > High performance mode
  - All cores operate in parallel, i.e. execute different tasks

### Fault tolerant mode

- Dual Modular Redundancy (DMR)
- Triple Modular Redundancy (TMR)
- Quadruple Modular Redundancy (QMR)









One or more cores operate, while others are switched off

Case study: Self-adaptive guad-core system

- High performance mode
  - All cores operate in parallel, i.e. execute different tasks

### Fault tolerant mode

- Dual Modular Redundancy (DMR)
- Triple Modular Redundancy (TMR)
- Quadruple Modular Redundancy (QMR)



:work (same task)





#### www.ihp-microelectronics.com I elicsir

## 5. Application in a Self-Adaptive Multiprocessing System

### Case study: Self-adaptive quad-core system

- Self-adaptive mode switching: configuring the least amount of core level redundancy depending on the current SERs measured by detector
- Power consumption in a year is lower than individual DMR, TMR and QMR configurations

| SER (upsets/(bit*day))              | Operating Mode                | Duration Time/Year*<br>(hours) |
|-------------------------------------|-------------------------------|--------------------------------|
| < 10 <sup>-8</sup>                  | High-Performance<br>De-Stress | 5460                           |
| 10 <sup>-8</sup> ~ 10 <sup>-7</sup> | DMR                           | 3120                           |
| 10 <sup>-7</sup> ~ 10 <sup>-6</sup> | TMR                           | 162                            |
| > 10 <sup>-6</sup>                  | QMR                           | 18                             |

\*: merge of SERs under different solar conditions into a one-year average





#### Power consumption in one year



- Self-adaptive multiprocessing provides trade-off between performance, power consumption and fault-tolerance
- Particle detection is a key requirement for self-adaptive fault-tolerance in space
- Existing particle detectors cannot provide optimal performance for on-chip particle detection
- Two alternative solutions are proposed
  - Embedded SRAM detector:
    - \* Negligible hardware and power overhead due to the use of existing on-chip resources
  - > <u>Pulse stretching detector</u>:
    - Detection of particle flux and LET variations with purely digital readout circuit



# Thank you for your attention!

Marko Andjelkovic

IHP – Innovations for High Performance Microelectronics Im Technologiepark 25 15236 Frankfurt (Oder) Germany Phone: +49 (0) 335 5625 527 Fax: +49 (0) 335 5625 413 Email: andjelkovic@ihp-microelectronics.com

www.ihp-microelectronics.com



https://elicsir.elfak.rs/



innovations for high performance microelectronics