Towards Reliable Embedded Systems: Accelerating Systematic Fault-Injection Campaigns using Checkpoint State Comparison

Testing fault tolerance mechanisms is commonly done by performing extensive fault injection experiments on a system that try to mimic physical causes of radiation effects like soft errors/bit flips and then observing the system’s behaviour. There are many possibilities for such injections: Every bit in every cycle. This spans a so-called fault space and one of the first steps is determining equivalent sets of possible injection points, which lead to the same system's behaviour to reduce the number of injections needed to test the functional reliability of the system.

Every single injection includes in general a run-to-completion execution and comparing its behavior with a fault-free execution. Doing this for the whole fault space is expensive, which slows down the development of robust and save systems. Using checkpoints to speed up the fast forward phase is a widely employed tool to accelerate every single experiment. However checkpoints can also be used to decide whether an injected fault is benign, i.e. leads to no negative outcome or not. This is done by comparing the state of the current experiment run with the saved fault-free checkpoint.

image

The goal of this thesis is to implement a checkpoint comparison mechanism into the existing fault injection tool FAIL*. The most simple implementation would be to compare the whole state of the simulator (e.g. Registers and Memory) with the state saved in the checkpoint. Thus, a challenge is to reduce the needed comparisons by using beforehand collected information. A starting point could be the data-flow pruner, which already tracks which injected fault can affect which memory areas. Another part is the thoroughly evaluation of the implemented mechanism using the MiBench benchmark suite.

Skills

LCTES Conference A
Data-Flow–Sensitive Fault-Space Pruning for the Injection of Transient Hardware Faults
Oskar Pusz, Christian Dietrich, Daniel LohmannProceedings of the 2021 ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES '21)ACM Press2021.
PDF Slides 10.1145/3461648.3463851 [BibTex]

Further Information