Fault-Tolerant Systems

Fault Mitigation

Fault prevention and fault tolerance aim to provide the ability to deliver a service that can be trusted, while fault removal and fault forecasting aim to reach confidence in that ability by justifying that the functional, dependability and security specifications are adequate and that the system is likely to meet them.

The STF group designs, implements, and deploys approaches to meet the desired dependability attributes of computing systems.

A non-exhaustive list of approaches most commonly used by STF includes spatial redundancy (N-Modular Redundancy), Error-Correcting Codes (ECC), functional diversification (N-Version Programming, Recovery Block), modelling (Markov chains), and fault injection.