Verder naar navigatie Doorgaan naar hoofdinhoud Ga naar de voettekst

LeaPFRogging PFR Implementations

23 augustus 2023

door NCC Group

Back in October of 2022, this announcement by AMI caught my eye. AMI has contributed a product named “Tektagon Open Edition” to the Open Compute Project (OCP). 

Tektagon OpenEdition is an open-source Platform Root of Trust (PRoT) solution with foundational firmware security features that detect platform firmware corruption, recover the firmware and protect firmware integrity. With its open-source code, Tektagon OpenEdition™ augments transparency, resulting in high-quality code […] 

I decided to dig in and audit the recently open sourced code. But first, some background: Tektagon is a hardware root-of-trust (HRoT) that implements Intel PFR 2.0. So… What exactly is PFR? 

Platform Firmware Resiliency 

PFR, or Platform Firmware Resiliency, is a standard defined by everyone’s favorite standards body, NIST, in SP 800-193. The specification describes guidelines that support the resiliency of platform firmware and data against destructive attacks or unauthorized changes. These security properties are upheld by a new HRoT device that implements the PFR logic. 

At its core, PFR acknowledges that in addition to the boot firmware (e.g., the BIOS), a platform contains numerous other peripheral devices which execute firmware and therefore also require integrity verification. Examples of these peripherals typically include GPUs, network cards, storage controllers, display controllers, and so on. Many of these peripherals are highly privileged (e.g., DMA capable), and so they are attractive targets for an attacker. It is important that their firmware images are protected from tampering. That is, if an attacker could compromise one of these peripherals by tampering with its firmware, they might be able to: 

  1. Achieve persistence on the platform across reboots.
  2. Pivot towards compromising other more highly privileged firmware components.  
  3. Violate multi-tenant isolation and confidentiality expectations in cloud environments. 

Although these motivations sound like they are centered around only protecting the integrity of the platform firmware and its data assets, the SP 800-193 specification also describes how PFR is crucial for protecting firmware availability. Here, availability refers to the ability to recover from corrupted flash storage, which might occur due to a failed firmware update, or perhaps, cosmic rays that cause bit flips in flash.

In the PFR specification, these security requirements appear as three guiding principles:  

  1. Protection: How authenticity and integrity of firmware and data should be upheld. 
  2. Detection: How to detect when firmware or data integrity has been violated.  
  3. Recovery: How to restore the platform to a known good state.  

This is a somewhat crowded technology space. In addition to AMI’s Tektagon product, many other vendors have created their own PFR (or PFR-like) solutions whose purpose is to help assure device firmware authenticity and availability, further complicating the already complex x86 system boot process. Examples include Microsoft’s Project Cerberus which is used in Azure, Intel PFR, Google Titan, Lattice’s Root of Trust FPGA solution, and more. 

PFR Attack Surfaces 

PFR introduces a new device, a microcontroller or FPGA, that positions itself as the man-in-the-middle on the flash memory SPI bus. By sitting on the bus, PFR chipsets can interpose all bus transactions. Whenever a device (such as the Board Management Controller (BMC) or Platform Controller Hub (PCH)) reads or writes SPI flash, the PFR chipset proxies that request. This grants PFR the crucial responsibility of verifying the authenticity and integrity of all code and data that resides in the persistent storage media. 

A simplified block diagram of a typical PFR solution

However, by interposing buses in this manner, PFR exposes itself to a rather large attack surface. It must read, parse, and verify various binary blobs (firmware and data) that exist in flash. Such parsing can be a tedious and delicate process. If the code is not written defensively (a challenge for even the best C programmers) then memory safety violations may arise. Another concern is race conditions such as time-of-check-time-of-use (TOCTOU) or double fetch problems. 

The PFR attack surface is also expanded by the fact that it communicates with other devices via I2C or SMBus. The bus typically carries the MCTP and SPDM protocols. Without going into too much detail about these specifications, these protocols are used to:

  1. Establish a secure messaging channel between devices and IP blocks.
  2. Perform device firmware attestation.
  3. Detect and recover from TCB (Trusted Computing Base) failures.

Within the HRoT, these command handlers may accept variable length arguments, and so memory safety is again required when managing the message queues. 

So, with that in mind, I decided to jump into the recently open-sourced AMI Tektagon project and hunt for bugs. 

Vulnerability #1: I2C Command Handler 

This first vulnerability occurs in the PCH/BMC command handler. This is the same I2C communication interface that was mentioned above. Two of the command handlers violate memory safety.  

uint8_t gUfmFifoData[64]; 
uint8_t gReadFifoData[64]; 
... 
uint8_t gFifoData; 
... 
static unsigned int mailBox_index; 

uint8_t PchBmcCommands(unsigned char *CipherText, uint8_t ReadFlag) 
{ 
    byte DataToSend = 0; 
    uint8_t i = 0; 

    switch (CipherText[0]) { 
        ... 
        case UfmCmdTriggerValue: 
            if (ReadFlag == TRUE) { 
                DataToSend = get_provision_commandTrigger(); 
            } else { 
                if (CipherText[1]   EXECUTE_UFM_COMMAND) { 
                    ... 
                } else if (CipherText[1]   FLUSH_WRITE_FIFO) { 
                    memset( gUfmFifoData, 0, sizeof(gUfmFifoData)); 
                    gFifoData = 0; 
                } else if (CipherText[1]   FLUSH_READ_FIFO) { 
                    memset( gReadFifoData, 0, sizeof(gReadFifoData)); 
                    gFifoData = 0; 
                    mailBox_index = 0; 
                } 
            } 
            break; 

        case UfmWriteFIFO: 
            gUfmFifoData[gFifoData++] = CipherText[1]; 
            break; 

        case UfmReadFIFO: 
            DataToSend = gReadFifoData[mailBox_index]; 
            mailBox_index++; 
            break; 
        ...

Above, the UfmWriteFIFO command can eventually write data past the end of the gUfmFifoData[] array. This may occur if the attacker issues more than 64 commands in sequence without flushing the FIFO by sending a UfmCmdTriggerValue command. Because gFifoData is a uint8_t type, this enables an attacker to overwrite up to 192 bytes past the end of the FIFO buffer. 

Similarly, the UfmReadFIFO command can read data out-of-bounds by repeated invocations of the command between FIFO flushes. This OOB data appears to be eventually disclosed in the I2C response message in DataToSend. Because mailbox_index is an unsigned int type, this would enable an attacker to disclose a significant quantity of PFR SRAM, albeit relatively slowly due to only 1 byte being exposed at a time. 

I estimate that these command processing vulnerabilities can be triggered in three different scenarios: 

  1. A physical attacker that is tampering with the I2C bus traffic and injecting PCH/BMC commands to the Tektagon device. Physical attacks can often be discounted for cloud platforms where data centers are expected to be secured facilities, however thought should be given to whether a given deployment is vulnerable to supply chain attacks and hardware implants, as well as malicious or compelled insiders (especially in cases where servers are deployed in third party data centers where physical security is harder to monitor). 
  2. Given the prevalence of BMC vulnerabilities that have been discovered over the last several years, a more likely attack scenario is that a compromised BMC is aiming to pivot towards compromising the Tektagon device in order to undermine the platform’s PFR capabilities or to achieve persistence. 
  3. If the I2C bus happened to be a shared bus with multiple other peripherals of lesser privilege, then one could imagine a scenario where the host kernel (in the CPU) could access this bus and communicate directly with the PFR device, even if that was never the intention. 

Vulnerability #2: SPI Flash Parsing 

The next vulnerability occurs when the Tektagon firmware reads a public key from SPI flash. In the linked GitHub issue, I found and reported five instances where this same bug appears throughout the Tektagon source code, but for the sake of brevity, I will focus on just one simple example here. 

int get_rsa_public_key(uint8_t flash_id, uint32_t address, struct rsa_public_key *public_key) 
{ 
    int status = Success; 
    uint16_t key_length; 
    uint8_t  exponent_length; 
    uint32_t modules_address, exponent_address; 

    // Key Length 
    status = pfr_spi_read(flash_id, address, sizeof(key_length),  key_length); 
    if (status != Success){ 
        return Failure; 
    } 
 
    modules_address = address + sizeof(key_length); 
    // rsa_key_module 
    status = pfr_spi_read(flash_id, modules_address, key_length, public_key->modulus); 
    ... 

The code above performs two SPI flash reads. The first read operation obtains a size value (key_length) from a public key structure in flash, and the second read operation uses this key_length to obtain the RSA public key modulus.  

The bug arises due to lack of input validation. If the contents of external SPI flash were tampered with by an attacker, then key_length may be larger than expected. This length value is not validated before being passed as the size argument to the second pfr_spi_read() call, which can lead to out-of-bounds memory writes of public_key->modulus[].  

The modulus buffer is RSA_MAX_KEY_LENGTH (512) bytes in length, and in all locations where get_rsa_public_key() is called, the public_key structure is declared on the stack. Because the Zephyr build config used by Tektagon does not define CONFIG_STACK_CANARIES, such a stack-based memory corruption vulnerability would be highly exploitable. 

Conclusion

These two vulnerabilities were extremely shallow, and I discovered them both in the same afternoon after first pulling the source code from GitHub. I am fairly certain that other vulnerabilities exist in this code.  

(As an aside, you might also be interested to know that Tektagon is based on the Zephyr RTOS, for which we published a research report a few years back, highlighting numerous vulnerabilities in both its implementation and design.) 

These bugs are great illustrations of how a “security feature” is not always a “secure feature”. Although PFR aims to improve platform security, it does so at the cost of introducing new attack surfaces. Bugs in these attack surfaces can be abused to achieve privilege escalation by the very same adversaries and threats that PFR is designed to defend against – that is, threats involving maliciously tampered SPI flash contents, and adversaries who have compromised a peripheral device and are seeking to pivot laterally to attack another device firmware. 

Think carefully about the threat model of your products, and how adding new features and attack surfaces might affect your overall security posture. As always, we recommend you perform a full assessment of any third-party firmware components before they make it into your product. This is just as true for open source as it is for proprietary code bases, and in particular, new and untested components and technologies. 

As of April 6th 2023, these vulnerabilities were fixed in commit d6d935e. No CVEs were issued by AMI. 

Disclosure Timeline 

  • Oct 25, 2022 – Initial disclosure on GitHub. 
  • Nov 3, 2022 – Response from vendor indicating that fixes are in progress. 
  • Jan 6, 2023 – NCC Group requests an update. 
  • Jan 13, 2023 – Vendor communicated a plan to release fixes by end of January. 
  • Feb 10, 2023 – NCC Group requests an update. 
  • Feb 13, 2023 – Vendor revised plan to release fixes by end of February or early March. 
  • Mar 31, 2023 – NCC Group requests another update.
  • Apr 4, 2023 – Vendor indicates the next release is planned on or before the 2nd week of April.
  • Apr 6, 2023 – Commit d6d935e reorganizes the repo. It fixes vulnerability #1 but only partially fixes vulnerability #2.
  • May 2, 2023 – NCC Group reviewed above commit and provided detailed analysis of the unfixed issues.
  • May 5, 2023 – Vendor communicated that the remaining fixes will land by May 12th.
  • May 31, 2023 – NCC Group queried the status of the fixes.
  • July 25, 2023 – Vendor indicated that remaining unfixed functions are dead/unused code.
  • Aug 18, 2023 – NCC Group reviewed the code to confirm that the functions are unused.
  • Aug 23, 2023 – Publication of this advisory.