Fault-Space Approximation Using Return-Value Fault Injection

Due to shrinking transistor sizes and operating voltages, transient hardware faults causes by single event upsets (SEU), also called soft errors, become an emerging challenge for safety-critical systems. SEUs could appear by radiation e.g. and can be mitigated by fault tolerance mechanisms. Testing such mechanisms is nearly impossible under realistic influences (like radiation canons) because radiation is non-deterministic in general.

Testing fault tolerance mechanisms is commonly done by performing extensive fault injection (FI) experiments on a system that try to mimic either the physical causes for SEUs or their effects and then observing the system’s behaviour. The biggest advantage doing such FIs are repeatable experiment results while testing fault tolerance mechanisms.

There are two dimensions of possible fault injections: Every bit in every cycle. When and where a FI is useful for testing safety and robustness of a system is one of the main questions of FI. Evaluating all possible injections in this huge fault space is effectively impossible.

During an execution of a function, registers could be injected in its function space of the fault space. There are many opportunities for FIs again and the number of possible FIs rises the more instructions would be executed. One idea for an approximation is to generalise the register injections and inject the return value of a function only. The return value is handled as a single register holding a return value at a specific time. It should be compared if both ways (register and return-value injection) lead to the same behaviour and if this concept of approximation is useful or not while reducing the number of needed FIs. One another focus will be developing and evaluating different, sufficient, and statistically correct metrics for return-value FI.

FIs could be done by using the C++ application FAIL*. This fault injection tool is able to simulate fault injections in x86 processors and should be used/extended by the idea above.

  • H. Schirmeier, M. Hoffmann, C. Dietrich, M. Lenz, D. Lohmann, and O. Spinczyk. FAIL*: An open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In Proceedings of the 11th European Dependable Computing Conference (EDCC '15), pages 245–255. IEEE Computer Society Press, Sept. 2015. PDF
  • H. Schirmeier, M. Hoffmann, R. Kapitza, D. Lohmann, and O. Spinczyk. FAIL*: Towards a versatile fault-injection experiment framework. In G. Mühl, J. Richling, and A. Herkersdorf, editors, 25th International Conference on Architecture of Computing Systems (ARCS '12), Workshop Proceedings, volume 200 of Lecture Notes in Informatics, pages 201–210. German Society of Informatics, Mar. 2012. PDF

Further Reading