CLASSY-FI: Cross-Layer Application-Specific Synthesis and Analysis of Fault Injection
Since the identification of physical causes for soft errors in the 1970s, the sensitivity of circuits for soft errors has been increasing due to voltage and structure shrinking. Functional safety standards, such as ISO 26262-9, demand explicit measures to assess (and, if necessary, mitigate) the effect of soft errors on safety and robustness. This is commonly done by performing extensive fault injection (FI) experiments on the target system that try to mimic the effects of transient faults (by changing logic signals) and then observing the system’s behavior with respect to its functional specification.
Logic faults can be injected on the pin, flip-flop, ISA, or even program level – and it is an open question, which level is “best” to assess a system’s robustness: Higher levels provide for a higher fault-injection efficiency, but lower levels are closer to the physical reality and various researchers have shown that the injection level of the commonly assumed single event upsets (SEUs) can have quite an impact on the results. In general, the lower the injection level (e.g., flip-flop level vs. ISA level), the more precisely we mimic the effects of real SEUs in the hardware.
The goal of the CLASSY-FI project is to derive constructive methods and techniques for scalable, yet precise and complete FI to experimentally assess the robustness of safety-critical embedded control systems against soft errors. The key idea behind the CLASSY-FI method is an application-specific cross-layer data-flow analysis, by which we consider the program–hardware-specific fault-propagation structure systematically on different levels. This fosters a hybrid approach for FI on multiple levels: Virtually cover all faults in the lower-level fault space (→ precision and completeness), but do the actual injections in the higher-level fault space (→ scalability) whenever possible and semantically equivalent. Otherwise inject on the lower level, supported by problem-specific hardware-assisted fault injection (HAFI) techniques.
Technically, CLASSY-FI aims at a methodology for the automatic generation of application–hardware-specific fault space (FS) construction and FI implementations that are highly tailored towards the actual system under test. Scientifically, we thereby provide new insights into the questions: (1) How well (quantitatively) does precise ISA-level FI cover precise lower-level FI, also with respect to single flip-flop (FF) faults that evolve to ISA-level multi-bit errors? (2) Is it feasible to reach full fault-space coverage on FF level (for reasonably sized systems) by an application-specific multi-layer fault-space analysis and automatic derivation of campaign-tailored hybrid FI platforms? (3) What is the influence of the μ-architecture on ISA-level FI coverage and to what degree can they be reused in case of incremental evolution of the μ-architecture?
Oskar Pusz presents our paper Program-Structure–Guided Approximation of Large Fault Spaces at the 24th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC '19) in Kyoto, Japan. In the paper we describe an approach to reduce the number of required fault injections using program-structure informations while aiming full fault-space coverage. Results show that injections can be reduced by up to 76 percent with an deviation of less than 2.7 percent and we keep the locality of the results regarding silent data corruptions to a low deviation.
Program-Structure–Guided Approximation of Large Fault Spaces
2019 24th Pacific Rim International Symposium on Dependable Computing (PRDC'19)IEEE Computer Society Press2019.
Cross-Layer Fault-Space Pruning for Hardware-Assisted Fault Injection
Proceedings of the 55th Annual Design Automation Conference 2018 (DAC '18)ACM Press2018.
Supervisors: Christian Dietrich, Daniel Lohmann
Bearbeiter: Yannick Loeck
In this thesis, the SAIL compiler should be extended to allow the C-emulator to record all dynamic register reads and writes to these state registers. This information should then be integrated into the FAIL* toolchain to inject only those state registers that are actually used by a given executed instruction.