CLASSY-FI: Cross-Layer Application-Specific Synthesis and Analysis of Fault Injection

Since the identification of physical causes for soft errors in the 1970s, the sensitivity of circuits for soft errors has been increasing due to voltage and structure shrinking. Functional safety standards, such as ISO 26262-9, demand explicit measures to assess (and, if necessary, mitigate) the effect of soft errors on safety and robustness. This is commonly done by performing extensive fault injection (FI) experiments on the target system that try to mimic the effects of transient faults (by changing logic signals) and then observing the system’s behavior with respect to its functional specification.

Logic faults can be injected on the pin, flip-flop, ISA, or even program level – and it is an open question, which level is “best” to assess a system’s robustness: Higher levels provide for a higher fault-injection efficiency, but lower levels are closer to the physical reality and various researchers have shown that the injection level of the commonly assumed single event upsets (SEUs) can have quite an impact on the results. In general, the lower the injection level (e.g., flip-flop level vs. ISA level), the more precisely we mimic the effects of real SEUs in the hardware.

The goal of the CLASSY-FI project is to derive constructive methods and techniques for scalable, yet precise and complete FI to experimentally assess the robustness of safety-critical embedded control systems against soft errors. The key idea behind the CLASSY-FI method is an application-specific cross-layer data-flow analysis, by which we consider the program–hardware-specific fault-propagation structure systematically on different levels. This fosters a hybrid approach for FI on multiple levels: Virtually cover all faults in the lower-level fault space (􏰀→ precision and completeness), but do the actual injections in the higher-level fault space (􏰀→ scalability) whenever possible and semantically equivalent. Otherwise inject on the lower level, supported by problem-specific hardware-assisted fault injection (HAFI) techniques.

Technically, CLASSY-FI aims at a methodology for the automatic generation of application–hardware-specific fault space (FS) construction and FI implementations that are highly tailored towards the actual system under test. Scientifically, we thereby provide new insights into the questions: (1) How well (quantitatively) does precise ISA-level FI cover precise lower-level FI, also with respect to single flip-flop (FF) faults that evolve to ISA-level multi-bit errors? (2) Is it feasible to reach full fault-space coverage on FF level (for reasonably sized systems) by an application-specific multi-layer fault-space analysis and automatic derivation of campaign-tailored hybrid FI platforms? (3) What is the influence of the μ-architecture on ISA-level FI coverage and to what degree can they be reused in case of incremental evolution of the μ-architecture?

People

Oskar Pusz (M.Sc.)

Tim-Marek Thomas (M.Sc.)

Prof. Dr.-Ing. Christian Dietrich

Apl. Prof. Dr.-Ing. Guillermo Payá Vayá

Prof. Dr.-Ing. habil. Daniel Lohmann

Latest News

2023-10-31 Checkpoint Placement for Systematic Fault-Injection Campaigns at ICCAD

Tim-Marek Thomas presents Checkpoint Placement for Systematic Fault-Injection Campaigns at the 42nd International Conference on Computer-Aided Design (ICCAD '23) in San Francisco, CA, USA. In the paper we present a new approach to reduce the forwarding phase in fault-injection campaigns by the clever placement of checkpoints. Compared to the classical static placement of checkpoints, this reduces the forwarding time by 88–99 percent. The paper is related to our CLASSY-FI project.

2021-06-22 Data-Flow–Sensitive Fault-Space Pruning for the Injection of Transient Hardware Faults at LCTES '21

Oskar Pusz presents Data-Flow–Sensitive Fault-Space Pruning for the Injection of Transient Hardware Faults at the Conference on Languages, Compilers and Tools for Embedded Systems (LCTES '21).

In the paper, we describe Data-Flow–Sensitive Fault-Space Pruning (DFP), a new precise and fault-space–complete data-flow sensitive fault-space pruning method that extends on def/use-pruning by also considering the instructions’ semantics when deriving fault-equivalence sets. In our experimental evaluation, this already reduces the number of necessary injections by up to 18 percent compared to def/use pruning.

The DFP is the core element in the ISA level of our research project CLASSY-FI.

The source code and evaluation artifacts are available here: Source Code and Evaluation Data for the Paper: Data-Flow–Sensitive Fault-Space Pruning for the Injection of Transient Hardware Faults.

Publications

SAFECOMP Conference B ACTOR: Accelerating Fault Injection Campaigns using Timeout Detection based on Autocorrelation

Tim-Marek Thomas, Christian Dietrich, Oskar Pusz, Daniel Lohmann41st International Conference on Computer Safety, Reliability and Security (SAFECOMP 2022)Springer-Verlag2022.
PDF Slides 10.1007/978-3-031-14835-4_17 [BibTex]

SAFECOMP Conference B SailFAIL: Model-Derived Simulation-Assisted ISA-Level Fault-Injection Platforms

Christian Dietrich, Malte Bargholz, Yannick Loeck, Marcel Budoj, Luca Nedaskowskij, Daniel Lohmann41st International Conference on Computer Safety, Reliability and Security (SAFECOMP 2022)Springer-Verlag2022.
PDF Slides 10.1007/978-3-031-14835-4_14 [BibTex]

LCTES Conference A Data-Flow–Sensitive Fault-Space Pruning for the Injection of Transient Hardware Faults

Oskar Pusz, Christian Dietrich, Daniel LohmannProceedings of the 2021 ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES '21)ACM Press2021.
PDF Slides 10.1145/3461648.3463851 [BibTex]

LCTES Artifact A Source Code and Evaluation Data for the Paper: Data-Flow–Sensitive Fault-Space Pruning for the Injection of Transient Hardware Faults

PRDC Conference B Program-Structure–Guided Approximation of Large Fault Spaces

Oskar Pusz, Daniel Kiechle, Christian Dietrich, Daniel Lohmann2019 24th Pacific Rim International Symposium on Dependable Computing (PRDC'19)IEEE Computer Society Press2019.
PDF Slides 10.1109/PRDC47002.2019.00044 [BibTex]

DAC Conference A Cross-Layer Fault-Space Pruning for Hardware-Assisted Fault Injection

Christian Dietrich, Achim Schmider, Oskar Pusz, Guillermo Payá-Vayá, Daniel LohmannProceedings of the 55th Annual Design Automation Conference 2018 (DAC '18)ACM Press2018.
PDF Slides Raw Data 10.1145/3195970.3196019 [BibTex]

Theses

Finished Theses

Data-Flow Analysis for Fault-Equivalence Set Forming on the ISA Layer

Typ: Bachelorarbeit
Status: abgeschlossen
Supervisors: Oskar Pusz
Christian Dietrich
Daniel Lohmann
Bearbeiter: Zena Obeidi (abgegeben: 01. Mar 2019)

Schotbruch: Automatisierte Ableitung von Injektionsplattformen für transiente Hardwarefehler aus formalen Prozessormodellen

Use SAIL language to integrate an ISA implementations into a fault injection framework. Different CPU architectures shall be evaluated for reliability. [PDF]

Typ: Masterarbeit
Status: abgeschlossen
Supervisors: Christian Dietrich
Daniel Lohmann
Bearbeiter: Marcel Budoj (abgegeben: 08. May 2019)

Acceleration of Fault-Injection Campaigns through Early Timeout Detection

Developing methods to avoid unnecessary fault-injection campaign run time

Typ: Masterarbeit
Status: abgeschlossen
Supervisors: Oskar Pusz
Daniel Lohmann
Bearbeiter: Felix Siegel (abgegeben: 22. May 2020)

Formalizing the Execution Semantics of the AVR Instruction Set with the Description Language SAIL

Implementing the AVR-processor instruction-set architecture in SAIL for generating emulators automatically.

Typ: Bachelorarbeit
Status: abgeschlossen
Supervisors: Christian Dietrich
Oskar Pusz
Daniel Lohmann
Bearbeiter: Luca Nedaskovskij (abgegeben: 16. Oct 2020)

Transient-Fault Resilience of a Capability-enabled Processor Plattform

Integration of SAIL-based MIPS and CHERI emulators into the FAIL* fault-injection tool and quantitative fault-resilience comparision. [PDF]

Typ: Masterarbeit
Status: abgeschlossen
Supervisors: Christian Dietrich
Daniel Lohmann
Bearbeiter: Malte Bargholz (abgegeben: 01. Nov 2020)

Design and Implementation of Benchmarks for Systematic Fault Injection

Awesome benchmarks for awesome fault injection methods.

Typ: Bachelorarbeit
Status: abgeschlossen
Supervisors: Oskar Pusz
Daniel Lohmann
Bearbeiter: Jannis Bujak (abgegeben: 02. Mar 2021)

Pruning of Soft-Error Fault Spaces by Dynamic Register-Usage Tracing in a Formal Instruction-Set Model

In this thesis, the SAIL compiler should be extended to allow the C-emulator to record all dynamic register reads and writes to these state registers. This information should then be integrated into the FAIL* toolchain to inject only those state registers that are actually used by a given executed instruction.

Typ: Masterarbeit
Status: abgeschlossen
Supervisors: Christian Dietrich
Daniel Lohmann
Bearbeiter: Yannick Loeck (abgegeben: 26. May 2021)

Design and Implementation of an Early Timeout-Detection Mechanism for Systematic Fault-Injection Campaigns

Avoiding unnecessary fault-injection campaign run time

Typ: Masterarbeit
Status: abgeschlossen
Supervisors: Oskar Pusz
Daniel Lohmann
Bearbeiter: Tim-Marek Thomas (abgegeben: 22. Oct 2021)

Leveraging Application-Specific Knowledge to Guide Statistical Fault Injection

Awesome sampling methods for fast fault injection campaigns.

Typ: Bachelor-/Masterarbeit
Status: abgeschlossen
Supervisors: Tim-Marek Thomas
Daniel Lohmann