Acceleration of Fault-Injection Campaigns through Early Timeout Detection

Fault injection is a common approach to systematically assess the resilience of a system and the effectiveness of software-based counter measures. It tries to mimic either physical causes for single event upsets (by exposing the system to, e.g. heat or radiation) or their effects (by changing logic signals). For the fault injection, we use the simulation-based fault injection framework FAIL*, which extracts program traces and simulates the representative faults.

Every single injection includes in general a run-to-completion execution and comparing its behavior with a fault-free execution. Executing the application which ends in a timeout is one possible unexpected behaviour. The time, which is lost until a fixed timeout limit is reached, is ineffective and sums up over all possible injection points (every cycle and every single bit).

First goal of this thesis is to develop methods to stop an execution earlier because a timeout is expected instead of setting a hard-coded threshold. The next step will be to integrate these methods into the fault-injection framework FAIL*. This should accelarate fault-injection campaigns in general.
Finally, the effects of the developed methods should be evaluated with focus on different application types, its resulting behaviors and possibly its origins.

Further Reading