Authors: Jeremy Boone, Sultan Qasim Khan
In the previous blog post we described a set of software-based fault injection countermeasures. However, we recognize that software-based mitigations are not a silver bullet and do have several drawbacks. Though they can frustrate an attacker and reduce the reliability of an exploit attempt, a persistent attacker may possess the required resources and motivation to develop a more advanced fault injection exploit that is able to bypass software-based mitigations. In general, software-based mitigations have the following drawbacks:
- Implementation Cost: Software-based mitigations are difficult to apply manually to a code base without considerable refactoring effort. Invasive refactoring may also introduce regressions.
- Maintenance Cost: The software-based countermeasures tend to complicate the source code and can severely reduce the software’s maintainability.
- Incomplete Coverage: Because the addition of mitigating macros is a manual process, it is possible to overlook a critical area and unintentionally leave a gap in the coverage.
- Fragility: The mitigations are fragile to compiler optimization settings, and a clever compiler may attempt to eliminate redundant comparisons, undermining the degree of protection offered by the macros or inline functions.
- Performance Impact: The mitigations will slow code execution and will increase the binary size. These are undesirable side effects for low level firmware, such as a boot ROM.
For these reasons, in this final blog post we describe several alternative solutions.
Alternative Software Mitigations
There exist automated approaches to insert certain software-based fault injection mitigations at build time across an entire code base or binary. These approaches include instruction duplication, memory store verification, and some forms of control flow integrity intended to provide fault detection.
Such approaches provide greater coverage than manual insertion of macros, while also potentially reducing implementation and maintenance costs. However, these automated mitigations also carry several drawbacks compared to more targeted manually applied mitigations as described in the previous post.
- Global application of fault mitigations across an entire binary tends to greatly increase binary size and reduce performance. A halving of performance and doubling of binary size is typical.
- Existing mitigations tend to defend a narrow subset of possible faults such as single instruction skips, rather than the broader range of achievable faults described in part 2 of this blog series.
- Automated mitigations cannot solve fail-open design patterns since it requires understanding the intent of the code.
- Most existing automated mitigation tooling is experimental and academic in nature and may lack maturity or robustness.
Instruction Duplication
Instruction duplication involves performing each assembly operation twice in order to protect against single instruction skip faults. Naive duplication of all assembly instructions cannot be done as it would change the behavior of the code. However, through certain transformations such as register remapping and instruction substitution, redundancy against single instruction skips can be achieved without changing code behavior.
The idea of instruction duplication for fault tolerance has existed for decades, and forms of it have been formally proven for resistance to single instruction skips. Several academics have implemented automation of the transformations necessary for instruction duplication at build time (1, 2, 3). However, NCC Group is not aware of any commercial or production-ready automated implementations of instruction duplication.
Instruction duplication can also sometimes defend against other faults, such as incorrect instruction evaluation, if the subsequent instruction corrects the result. However, targeted faults that cause incorrect instruction evaluation at the second instance of a duplicated instruction can defeat the duplication countermeasure. Thus, instruction duplication can be considered largely ineffective against fault types other than single instruction skips, and realistic fault models must include faults other than instruction skips.
Control Flow Integrity
Control flow integrity (CFI) implemented for the purpose of fault detection aims to detect incorrect branching induced by faults. This differs from more commonly known CFI implementations which are intended primarily as mitigations against software vulnerabilities such as memory corruption overwriting function pointers or return-oriented programming. Fault detecting CFI implementations add checks at the basic block level to detect invalid control flow transfers from unexpected origins. Some implementations can also detect fault-induced invalid conditional branching through redundant checking of conditions. Both counter- and signature-based fault detecting CFI implementations exist.
The performance and code size overhead of CFI depends on the size of basic blocks and the frequency of branching in protected code. A runtime performance penalty of up to 60% is typical for fault detecting CFI implementations, depending on the nature of the code.
CFI can detect entry into basic blocks from unexpected locations, and some forms of CFI can also detect entry into basic blocks due to faulted conditional checks. However, CFI is not a complete defense against fault injection attacks. While CFI protects branching, it does not protect operations performed within basic blocks. Faults such as invalid fetches, instruction skips, and failed writeback can cause incorrect evaluation of instructions within basic blocks that will not be detected by CFI. However, CFI can be used together with manual countermeasures to protect critical operations and calculations.
While several automated build-time fault detecting CFI implementations have been developed by academics, NCC Group is not aware of any currently available production-ready implementations. Low overhead CFI implementations designed to protect against software attacks are well known and already integrated with compilers such as Clang, but such implementations provide little protection against hardware fault attacks.
Hardware Mitigations
Hardware countermeasures against fault injection are designed to detect and react to glitching attempts. When a glitch is detected, it is best for security to trigger a CPU reset rather than forcing a repeat of the faulted instruction, as resetting the CPU would greatly slow down an attacker’s repeated attempts to glitch the same instruction. Noteworthy examples of hardware mechanisms that can be used for glitch detection include:
- Fast reacting voltage monitoring or Brown-out Detection (BOD) circuitry can be used to detect voltage glitches.
- Tunable Replica Circuits (TRCs) can be used to detect both voltage and clock glitches and can detect very fast glitches.
- Phase Locked Loops (PLLs) can be used to detect electromagnetic fault injection (EMFI) since EMFI tends to momentarily break a PLL out of its “locked” state.
- Comparison of external clocks to a high frequency ring oscillator based internal reference clock can detect clock glitches.
- Optical sensors and wire meshes embedded within the semiconductor package can sense tamper events such as decapsulation, which may mitigate optical fault injection.
- Shadow registers can improve fault resiliency through data redundancy.
- Hardware-based pointer authentication can mitigate certain classes of faults that influence software’s control-flow.
Closing Remarks
At this point, we hope our readers have acquired a better understanding of fundamental fault injection concepts, how fault injection attacks can target low level firmware, and how to mitigate such attacks at both the software and hardware level.
Although software-based countermeasures have their drawbacks, they do offer valuable fault protection in limited circumstances. This is mainly achieved by increasing the difficulty to exploit a FI weakness by forcing an attacker to perform multiple well-timed successful glitches, or by reducing the reliability of a successful glitch through elimination of reliable trigger conditions. However, skilled and persistent adversaries can ultimately circumvent such software-only defenses.
That said, one key advantage of software-based mitigations is that they can be applied to firmware in a relatively quick manner, offering some degree of protection while more robust hardware-based countermeasures can be developed for future product generations. This is often necessary because, while software patches could be engineered in weeks or months, hardware-based mitigations might take years to appear in a product due to lengthy semiconductor engineering timelines.
Whenever glitch mitigations are added to a product, it is critically important that the engineering effort is performed in conjunction with testing and characterization to ensure that the countermeasures have the desired effect. For example, a hardware-based BOD or TRC must be carefully tuned for all power or clock domains to ensure it cannot be circumvented, and likewise, software-based mitigations must be meticulously evaluated to ensure all security-sensitive operations are protected by redundancy and that all externally observable events are masked by random delays.
NCC Group generally advises our clients to pursue software-based mitigations as a near-term solution, while working on hardware-based mitigations for the next generation of product.