Budget Driven Fault Injection Campaigns for Continuous Intergration Testing
- Typ der Arbeit: Masterarbeit
- Status der Arbeit: offen
- Projekte: CLASSY-FI
- Betreuer: Tim-Marek Thomas, Daniel Lohmann
Context
Testing fault tolerance mechanisms and by indirection safety-critical systems is commonly done by performing extensive fault injection (FI) experiments on a system that tries to mimic physical causes of soft errors/bit flips and then observing the system’s behaviour. There are many possibilities for such injections, for instance, every bit in every cycle. The size of the fault space also heavily depends on the layer in the system stack. In our case, we perform ISA-level fault injection, and FAIL*1 is the fault injection framework of our choice.
Currently, fault injection campaigns are performed on a by-need basis. However, as advocated by the community2 performing tests early and often leads to a more reliable system. In an ideal world, testing is done continuously and since the project start, for instance, in a continuous integration (CI) setting.
As a FI campaign, which covers the whole fault space, takes a long time3, we need to shorten the campaign runtime to enable FI in a CI setting. A solution to this is injecting only a sample of the fault space4. However, as the samples are typically chosen at random and uniformly distributed, the sample might provide a misleading picture of the dependability of the program under test (PUT), especially if we want to use the results to guide developers.
Problem
We envision, given a injection budget, a mechanism that selects the most promising candidates for silent data corruptions (SDCs), the worst case outcome of a fault. However, the prediction of the probability of an SDC is a chicken and egg problem. If we could predict, whether an injection leads to an SDC or not, we wouldn't need the FI campaign in the first place.
Thus, we need another solution. The idea is based on Ebrahimi et.al.5, which builds a k-ary tree to bundle FI pilots into larger chunks (multi-bit injections) to save overall campaign runtime5. We could adapt this, and combine it with knowledge of the program under test.

Goal
The main goal of this thesis is to provide a way to enable meaningful results under a tight budget of injections.
Ebrahimi et.al.5 propose to bundle FI pilots in multi-bit injections, where each multi-bit injection is a node in a k-ary tree. The leaves correspond to single-bit injections, and each node above the leaves bundles injections together. In their implementation, the bundled leaves (single injections) are chosen randomly.
However, in our case, we could try to bundle them in a semantically meaningful way (depending on e.g. static analysis of the program under test). Each layer of the tree could represent a finer granularity: basic blocks, function boundaries, maybe module (translation unit) boundaries, etc. Then we could calculate the needed budget to account for a certain level of granularity, whether a part of the PUT is susceptible to SDCs. Based on these test results, the developer could decide whether the results of the deeper part of the subtree are needed.
In the CI setting, we also could utilize knowledge from injections for past commits/patches. If in the past a certain subtree was not affected by SDCs, we could combine them on the next run and only re-split them if the combined multi-bit injection leads to an SDC.
But the mechanism of Ebrahimi et.al. are based on the finding, that single-bit injections that lead to SDCs are independent to each other and for instance do not cancel each other out (with a very very low propability). If we switch from randomly bundled injections to injections that are somehow related, we need to re-verify this property and need a of model how probable it actually is that injections mask each other out.
For the evaluation, several targets come to mind:
- MiBench 3 benchmark suite
- For the first steps, setup already exists.
- Zephyr Unittests 8
- Zephyr, an modern RTOS, provides a large amount of tests, of which some might be suitable.
- At the time of the evaluation a benchmark framework will exist for this
- Libre Pilot 6
- Quadcoptor firmware, with a few small unit tests.
- FSFW 7
- Flight Software Framework from the University of Stuttgart, used for CubeSat Satellites. Also includes a lot of unittests, of which some might be suitable.
To summarize, following is needed:
- Reevaluation of the hypothesis of Ebrahimi et. al.
- With totally random grouped injections
- For this, the original algorithm needs to be reimplemented into FAIL*
- Evaluate the design space: how can the fault space be cut into meaningful pieces
- On the time axis: translation unit, (static) functions, (dynamic) function executions, basic blocks,...
- On the space axis: global variables, structs, local variables,...
- Customize FAIL* to allow the bundled injections
- Evaluate the hypothesis, i.e. measure the introduced error due to the bundling
- Evaluate gains, compare against sampling
- Utilize the CI context
- For recurring FI campaigns, adjust the tree depending on knowledge from past runs
- Remeasure
Topics: C++ (for FAIL*), python (for data visualization, benchmark automation; I am open for other tooling), docker/podman (environment to run/build FAIL*), dependability analysis
References
-
Already simple Benchmarks from e.g. the Mibench benchmarksuite, take several hours on a server with dozens of cores for a full (def/use pruned) fault space coverage. ↩↩
-
FI Acceleration by Simultaneous Injection of Non-interacting Faults ↩↩↩
-
https://github.com/zephyrproject-rtos/zephyr/tree/main/tests ↩
