earlyremoval, in the Conservatory, with the Wrench: Exploring Ghidra’s decompiler internals to make automatic P-Code analysis scripts

20 May 2022

(The version of Ghidra used in this article is 10.1.2. For the Go string recovery tool release, skip ahead to Ghostrings Release.)

Introduction

A well-known issue with reverse engineering Go programs is that the lack of null terminators in Go strings makes recovering string definitions from compiled binaries difficult. Within a compiled Go program, many of the constant string values are stored together in one giant blob, without any terminator characters built into the string data to mark where one string ends and another begins. Even a simple program that just prints “Hello world!” has over 1,500 strings in it related to the Go runtime system and other standard libraries. This can cause typical ASCII string discovery implementations, such as the one provided by Ghidra, to create false positive string definitions that are tens of thousands of characters long.

Instead of null terminated strings, Go uses a string structure that consists of a pointer and length value. Many of these string structures are created on the program’s stack at runtime, so recovering individual string start locations and length values requires analyzing the compiled machine code. There are a few existing scripts that perform this analysis by checking for certain patterns of x86-64 instructions,¹ but they miss structures created with unhandled variations of instructions that ultimately have the same effect on the stack, and they’re also restricted to a specific ISA.

I thought this problem presented a good opportunity to leverage the Ghidra decompiler for program analysis. Ideally, the decompiler’s lifted P-Code output would simplify analysis by translating different variations of instructions that write a value to the stack into a single copy operation. With a convenient way to scan for operations that write potential string address and length values to adjacent locations on the stack, the remaining analysis step would be to check if a chunk of memory defined by those address and length values contains only printable ASCII characters. As a bonus, the analysis wouldn’t be restricted to a single ISA, thanks to the many SLEIGH specifications that translate machine code into the initial raw form of P-Code.

However, when I first attempted to implement this analysis, the machine instructions in question didn’t appear to produce any P-Code output at all…

The Case of the Missing P-Code Operations

Consider the following example Go program:

package main
import "fmt"

func main() {
    fmt.Printf("Who killed Mr. COPY?n")
}

In an x86-64 ELF build of this program, the instructions that set up the string structure on the stack in main.main can be found at addresses 0x48e7c4 through 0x48e7d0:

0048e7c4        LEA        RAX,[DAT_004c584a]
0048e7cb        MOV        qword ptr [RSP + local_48],RAX=>DAT_004c584a
0048e7d0        MOV        qword ptr [RSP + local_40],0x15
0048e7d9        MOV        qword ptr [RSP + local_38],0x0
0048e7e2        XORPS      XMM0,XMM0
0048e7e5        MOVUPS     xmmword ptr [RSP + local_30[0]],XMM0
0048e7ea        CALL       fmt.Fprintf

The address of the string content, 0x4c584a, is written to the stack by the MOV instruction at 0x48e7cb. The string’s length, 0x15, is written to the stack by the MOV instruction at 0x48e7d0.

We can display the raw P-Code for each machine instruction by enabling the P-Code listing field in Ghidra’s code listing:

0048e7c4        LEA        RAX,[DAT_004c584a]
                                     RAX = COPY 0x4c584a:8
0048e7cb        MOV        qword ptr [RSP + local_48],RAX=>DAT_004c584a
                                     $U3800:8 = INT_ADD 16:8, RSP
                                     $Uc000:8 = COPY RAX
                                     STORE ram($U3800:8), $Uc000:8
0048e7d0        MOV        qword ptr [RSP + local_40],0x15
                                     $U3800:8 = INT_ADD 24:8, RSP
                                     $Uc080:8 = COPY 21:8
                                     STORE ram($U3800:8), $Uc080:8

Note that both MOV instructions are translated into an INT_ADD, COPY, and STORE operation sequence in the raw P-Code.

To get the high (analyzed) P-Code output from the decompiler, we can write a script to use the decompiler API. This script decompiles the currently selected function and prints the resulting P-Code operations to the console:

PrintHighPCode.java> Running...
PCode for function main.main @ 0048e790
...
0x48e799:0xb2   (unique, 0xc000, 8) CAST (unique, 0x1000007a, 8)
0x48e799:0xb3   (unique, 0x10000082, 8) CAST (register, 0x20, 8)
0x48e79d:0x11    ---  CBRANCH (ram, 0x48e7f9, 1) , (unique, 0x1000005f, 1)
0x48e79d:0xac   (unique, 0x1000005f, 1) BOOL_AND (register, 0x200, 1) , (register, 0x206, 1)
0x48e7ea:0x3a    ---  CALL (ram, 0x4866f0, 8)
0x48e7ea:0x8c   (ram, 0x5601b0, 8) INDIRECT (ram, 0x5601b0, 8) , (const, 0x3a, 4)
0x48e7f8:0x49    ---  RETURN (const, 0x0, 8)
0x48e7f8:0x8d   (ram, 0x5601b0, 8) COPY (ram, 0x5601b0, 8)
...
PrintHighPCode.java> Finished!

The instructions for setting up the stack address and length are completely missing! In fact, all thirteen instructions leading up to the function call at 0x48e7ea are missing from the P-Code output.

Running the “Decompiler Parameter ID” analyzer provides a small hint as to what’s happening with these instructions. For reference, here’s the analyzer’s description:

Creates parameter and local variables for a Function using Decompiler.
WARNING: This can take a SIGNIFICANT Amount of Time!
         Turned off by default for large programs
You can run this later using "Analysis->Decompiler Parameter ID"

(This analyzer won’t be enabled by default unless the program is a PE binary under 2 MB in size.² A Windows build of a Go “hello world” program is just over 2 MB, so it’s unlikely to run automatically for a Go binary.)

After the analyzer finishes, some parameters are associated with the fmt.Fprintf function call in the decompiled C code view. Although the call setup is incorrect, some of the missing stack data now appears:

fmt.Fprintf(param_1,param_2,param_3,go.itab.*os.File,io.Writer,param_5,param_6,
            (long)go.itab.*os.File,io.Writer,os.Stdout,(sigaction *) DAT_004c584a,
            (sigaction *)0x15,0,(sigaction *)0x0,0);

Looking at the decompiler P-Code output again, the CALL op now has those same parameters associated with it, but the P-Code ops for the MOV instructions are still missing:

PrintHighPCode.java> Running...
PCode for function main.main @ 0048e790
...
0x48e79d:0x11    ---  CBRANCH (ram, 0x48e7f9, 1) , (unique, 0x100000d5, 1)
0x48e79d:0xcb   (unique, 0x100000d5, 1) BOOL_AND (register, 0x200, 1) , (register, 0x206, 1)
0x48e7ea:0x3a    ---  CALL (ram, 0x4866f0, 8) , (register, 0x38, 8) , (register, 0x30, 8) , (register, 0x10, 8) , (unique, 0x100000e0, 8) , (register, 0x80, 8) , (register, 0x88, 8) , (unique, 0x10000130, 8) , (ram, 0x5601b0, 8) , (unique, 0x100000d8, 8) , (const, 0x15, 8) , (const, 0x0, 8) , (const, 0x0, 8) , (const, 0x0, 8)
0x48e7ea:0xa7   (ram, 0x5601b0, 8) INDIRECT (ram, 0x5601b0, 8) , (const, 0x3a, 4)
0x48e7ea:0xce   (unique, 0x10000128, 8) PTRSUB (const, 0x0, 8) , (const, 0x4c584a, 8)
0x48e7ea:0xcf   (unique, 0x100000e0, 8) PTRSUB (const, 0x0, 8) , (const, 0x4dc7c0, 8)
0x48e7ea:0xd0   (unique, 0x100000e8, 8) PTRSUB (const, 0x0, 8) , (const, 0x4dc7c0, 8)
0x48e7ea:0xd8   (unique, 0x100000d8, 8) CAST (unique, 0x10000128, 8)
0x48e7ea:0xd9   (unique, 0x10000130, 8) CAST (unique, 0x100000e8, 8)
...
PrintHighPCode.java> Finished!

This suggests that the stack write operations are eaten up by analysis that associates stack values with function calls. This may be due to incorrect function signatures or calling convention definitions, but in the interest of performing automatic analysis, I’d prefer not to have to fix up these definitions when encountering a new Go binary, or any other type of binary that I might want to use P-Code analysis on.

So, what’s actually happening to the raw P-Code ops that represent the machine instructions we’re interested in? Is there any way to get them to appear in the decompiler output?

Simplification Styles

Looking over the DecompInterface class documentation, one method stands out:

public boolean setSimplificationStyle(java.lang.String actionstring)

This allows the application to the type of analysis performed by the decompiler, by giving the name of an analysis class. Right now, there are a few predefined classes. But there soon may be support for applications to define their own class and tailoring the decompiler’s behaviour for that class.

The current predefined analysis class are:

“decompile” – this is the default, and performs all analysis steps suitable for producing C code.

“normalize” – omits type recovery from the analysis and some of the final clean-up steps involved in making valid C code. It is suitable for creating normalized pcode syntax trees of the dataflow.

“firstpass” – does no analysis, but produces an unmodified syntax tree of the dataflow from the

“register” – does ???.

“paramid” – does required amount of decompilation followed by analysis steps that send parameter measure information for parameter id analysis. raw pcode.

The analysis class descriptions here are limited, but I experimented with the different simplification style options and found that the register and firstpass styles produce P-Code ops for the target MOV instructions. However, the output for these styles is also more cumbersome: the decompile style outputs 24 ops, whereas the firstpass and register styles output over 150.

Here’s how the P-Code ops for the LEA and MOV instructions that produce the string structure look with the register simplification style:

PrintHighPCode.java> Running...
PCode for function main.main @ 0048e790 (simplification style: register)
...
0x48e7c4:0x27   (register, 0x0, 8) COPY (const, 0x4c584a, 8)
0x48e7cb:0x28   (unique, 0x3800, 8) INT_ADD (register, 0x20, 8) , (const, 0xffffffffffffffb8, 8)
0x48e7cb:0x29   (unique, 0xc000, 8) COPY (const, 0x4c584a, 8)
0x48e7cb:0x2a    ---  STORE (const, 0x1b1, 4) , (unique, 0x3800, 8) , (const, 0x4c584a, 8)
0x48e7cb:0x8c   (ram, 0x5601b0, 8) INDIRECT (ram, 0x5601b0, 8) , (const, 0x2a, 4)
0x48e7d0:0x2b   (unique, 0x3800, 8) INT_ADD (register, 0x20, 8) , (const, 0xffffffffffffffc0, 8)
0x48e7d0:0x2c   (unique, 0xc080, 8) COPY (const, 0x15, 8)
0x48e7d0:0x2d    ---  STORE (const, 0x1b1, 4) , (unique, 0x3800, 8) , (const, 0x15, 8)
0x48e7d0:0x8d   (ram, 0x5601b0, 8) INDIRECT (ram, 0x5601b0, 8) , (const, 0x2d, 4)
...
PrintHighPCode.java> Finished!

This output is very similar to the raw P-Code we saw earlier in the code listing. For example, compare the raw P-Code for 0x48e7cb:

MOV    qword ptr [RSP + local_48],RAX=>DAT_004c584a
    $U3800:8 = INT_ADD 16:8, RSP
    $Uc000:8 = COPY RAX
    STORE ram($U3800:8), $Uc000:8

Although this less processed P-Code output isn’t ideal, it appears to be the only readily available option. Let’s see how difficult it is to implement an analysis script based on this simplification style.

Register Style Analysis

The STORE operations appear to be the best targets for analyzing stack writes. The string address value conveniently propagates from the LEA instruction, through RAX, to the input of the P-Code STORE op for the MOV instruction. However, it’s not immediately obvious how to map the STORE output destination “(const, 0x1b1, 4), (unique, 0x3800, 8)” to its stack pointer offset without tracking all of these “unique” space varnodes.³

Digging through the API documentation some more, we can see the VarnodeAST class has a getDef() method that links to the P-Code operation that defined the varnode (if available). Using the Eclipse debugger on the P-Code listing script, we can interactively explore the decompiler output in the Eclipse debug console. Here’s the STORE op that copies the string address to the stack again:

pcodeOpAST.toString()
     (java.lang.String)  ---  STORE (const, 0x1b1, 4) , (unique, 0x3800, 8) , (const, 0x4c584a, 8)

The STORE operation setup is a little weird; instead of having an output, it has two inputs that define the output location. Input 0 is the constant ID of the destination address space, and input 1 is the varnode containing the destination pointer offset. Here’s input 1:

pcodeOpAST.getInput(1)
     (ghidra.program.model.pcode.VarnodeAST) (unique, 0x3800, 8)

Finally, input 1 has a reference to its defining operation – the INT_ADD that calculates an offset from RSP:

pcodeOpAST.getInput(1).getDef()
     (ghidra.program.model.pcode.PcodeOpAST) (unique, 0x3800, 8) INT_ADD (register, 0x20, 8) , (const, 0xffffffffffffffb8, 8)

Since we’d like to know that the constant is written to a location on the stack, the next question is how to map “(register, 0x20, 8)” to the stack pointer register. The Program class has a getRegister (Varnode varnode) method that returns a varnode’s corresponding Register object:

pcodeOpAST.getInput(1).getDef().getInput(0)
 (ghidra.program.model.pcode.VarnodeAST) (register, 0x20, 8)

currentProgram.getRegister(pcodeOpAST.getInput(1).getDef().getInput(0))
 (ghidra.program.model.lang.Register) RSP

The Register class has a getTypeFlags method and TYPE_SP flag constant that would apparently indicate a stack pointer type register. Unfortunately, we’ll have to rely on checking the register name string, as no type flags are set on the RSP register object:

currentProgram.getRegister(pcodeOpAST.getInput(1).getDef().getInput(0)).getTypeFlags()
 (int) 0

To see if a potential address and length value are adjacent to each other on the stack, we also want to determine what the stack offset value is. The STORE destination’s defining op has always been an INT_ADD in the binaries I tested, usually with a stack pointer register as the first input and a constant offset value as the second input. However, the offset input varnode is sometimes a register instead of a constant value, which requires traversing more defining P-Code ops to attempt to recover a constant offset value that the register would hold.

Similarly, the string address constant isn’t always propagated to the input of the STORE operation. For example, the STORE’s input varnode could be a register that the address was loaded into. This happens in a 32-bit ARM binary when the string data address is loaded into a register from the constant pool, and then copied from the register to the stack:

0009af44     ldr    r0,[PTR_DAT_0009af88]                    = 000c580b
                              r0 = LOAD ram(0x9af88:4)
0009af48     str    r0=>DAT_000c580b,[sp,#local_20]
                              $U8280:4 = INT_ADD sp, 12:4
                              STORE ram($U8280:4), r0

Here’s how the register style P-Code output looks for the above ARM instructions:

0x9af44:0x19    (register, 0x20, 4) LOAD (const, 0x1a1, 4) , (const, 0x9af88, 4)
0x9af48:0x1a    (unique, 0x8280, 4) INT_ADD (register, 0x54, 4) , (const, 0xffffffe0, 4)
0x9af48:0x1b     ---  STORE (const, 0x1a1, 4) , (unique, 0x8280, 4) , (register, 0x20, 4)

To handle a register value defined by a LOAD operation we’ll have to check if the data at the load address is recognized as a pointer:

// Register may hold an address
PcodeOp def = dataToStore.getDef();
// Check for LOAD op that loaded an address into the register,
// e.g. getting address from constant pool in ARM 32
if (def != null    def.getOpcode() == PcodeOp.LOAD) {
    int spaceId = (int) def.getInput(0).getOffset();
    long offset = def.getInput(1).getOffset();
    Address loadFrom = program.getAddressFactory().getAddress(spaceId, offset);

    Data dataLoaded = getDataAt(loadFrom);
    if (dataLoaded != null    dataLoaded.isPointer()) {
        candidateAddr = (Address) dataLoaded.getValue();
    }
}

One more API quirk to mention is that constant value varnodes store their value as an address offset in the “constant” address space.

if (varnode.isConstant()) {
    // The constant value is stored as its "address" offset
    long constVal = varnode.getAddress().getOffset();
    // ...
}

Combining this information is enough to make a working Go string analysis script based on scanning STORE operations from the register style P-Code output, but this is much clunkier than the ideal scenario I imagined. Preferably, the P-Code operations would just have a constant length or address value as the input and an address in the stack space as the output.

To figure out why P-Code ops for the target machine instructions disappear completely in the higher-level analysis styles, we’ll have to look into the Ghidra decompiler internals.

Decompiler Internals

The decompiler C++ source code is included with Ghidra in the Ghidra/Features/Decompiler/ directory. After installing the necessary build dependencies, run make doc in Ghidra/Features/Decompiler/src/decompile/cpp to generate the decompiler documentation.

The SetAction class documentation provides slightly better info on the simplification styles than “???”:

decompile – The main decompiler action

normalize – Decompilation tuned for normalization

jumptable – Simplify just enough to recover a jump-table

paramid – Simplify enough to recover function parameters

register – Perform one analysis pass on registers, without stack variables

firstpass – Construct the initial raw syntax tree, with no simplification

Searching for the simplification style names in the source code turns up two of the most important pieces of code to look at. The ActionDatabase::buildDefaultGroups method shows that these simplification styles are defined as groups of analysis rules:

/// (Re)build the default e root Actions: decompile, jumptable, normalize, paramid, register, firstpass
void ActionDatabase::buildDefaultGroups(void)

{
  if (isDefaultGroups) return;
  groupmap.clear();
  const char *members[] = { "base", "protorecovery", "protorecovery_a", "deindirect", "localrecovery",
                "deadcode", "typerecovery", "stackptrflow",
                "blockrecovery", "stackvars", "deadcontrolflow", "switchnorm",
                "cleanup", "merge", "dynamic", "casts", "analysis",
                "fixateglobals", "fixateproto",
                "segment", "returnsplit", "nodejoin", "doubleload", "doubleprecis",
                "unreachable", "subvar", "floatprecision",
                "conditionalexe", "" };
  setGroup("decompile",members);

  const char *jumptab[] = { "base", "noproto", "localrecovery", "deadcode", "stackptrflow",
                "stackvars", "analysis", "segment", "subvar", "conditionalexe", "" };
  setGroup("jumptable",jumptab);

 const  char *normali[] = { "base", "protorecovery", "protorecovery_b", "deindirect", "localrecovery",
                "deadcode", "stackptrflow", "normalanalysis",
                "stackvars", "deadcontrolflow", "analysis", "fixateproto", "nodejoin",
                "unreachable", "subvar", "floatprecision", "normalizebranches",
                "conditionalexe", "" };
  setGroup("normalize",normali);

  const  char *paramid[] = { "base", "protorecovery", "protorecovery_b", "deindirect", "localrecovery",
                             "deadcode", "typerecovery", "stackptrflow", "siganalysis",
                             "stackvars", "deadcontrolflow", "analysis", "fixateproto",
                             "unreachable", "subvar", "floatprecision",
                             "conditionalexe", "" };
  setGroup("paramid",paramid);

  const char *regmemb[] = { "base", "analysis", "subvar", "" };
  setGroup("register",regmemb);

  const char *firstmem[] = { "base", "" };
  setGroup("firstpass",firstmem);
  isDefaultGroups = true;
}

To clear up some of the terminology: “simplification styles” are also referred to as “root actions” or “groups” in the decompiler source code. They consist of groups of “base groups” such as “stackvars” or “typerecovery”, which are more fine-grained groups of specific analysis operations.

Just below buildDefaultGroups is the universal action list, which lists all analysis operations (“actions” and “rules”) in the order they should run. Each operation is associated with a base group. Operations are enabled when their base group is included in the current root action.

/// Construct the b universal Action that contains all possible components
/// param conf is the Architecture that will use the Action
void ActionDatabase::universalAction(Architecture *conf)

{
  vector<Rule *>::iterator iter;
  ActionGroup *act;
  ActionGroup *actmainloop;
  ActionGroup *actfullloop;
  ActionPool *actprop,*actprop2;
  ActionPool *actcleanup;
  ActionGroup *actstackstall;
  AddrSpace *stackspace = conf->getStackSpace();

  act = new ActionRestartGroup(Action::rule_onceperfunc,"universal",1);
  registerAction(universalname,act);

  act->addAction( new ActionStart("base"));
  act->addAction( new ActionConstbase("base"));
  act->addAction( new ActionNormalizeSetup("normalanalysis"));
  act->addAction( new ActionDefaultParams("base"));
  //  act->addAction( new ActionParamShiftStart("paramshift") );
  act->addAction( new ActionExtraPopSetup("base",stackspace) );
  act->addAction( new ActionPrototypeTypes("protorecovery"));
  act->addAction( new ActionFuncLink("protorecovery") );
  act->addAction( new ActionFuncLinkOutOnly("noproto") );
  //... snip ...
  act->addAction( new ActionDynamicSymbols("dynamic") );
  act->addAction( new ActionNameVars("merge") );
  act->addAction( new ActionSetCasts("casts") );
  act->addAction( new ActionFinalStructure("blockrecovery") );
  act->addAction( new ActionPrototypeWarnings("protorecovery") );
  act->addAction( new ActionStop("base") );
}

There are about 220 different actions in the universal list, so instead of going through all of them, I looked for a way to dynamically inspect the decompilation process. There is a way to build the decompiler with a debugging CLI, described in this GitHub issue. In short, once the necessary build dependencies are installed, go to Ghidra/Features/Decompiler/src/decompile/cpp and run make decomp_dbg.

The issue page describes how to save the function data XML payload that Ghidra generates for IPC with the decompiler and load it in the debugging CLI to reproduce a function decompilation. I couldn’t find any overall documentation on the CLI itself, though some of the individual command handler classes have documentation that describes the arguments they take (see the IfaceCommand class documentation for a list of handler classes). Beyond the initial XML loading steps, I had to browse through the code a bit to find my way around the CLI and understand what the debugging trace output meant.

To save a copy of the function data XML from the Ghidra GUI, we can use the “Debug Function Decompilation” option in the Decompile window drop-down menu. From a script we can achieve the same thing with the DecompInterface.enableDebug method:

File debugDump = askFile("Decompiler IPC XML Dump", "Save");
decompIfc.enableDebug(debugDump);

Start decomp_dbg with the SLEIGHHOME environment variable set to the Ghidra install directory. Then load the XML file with the restore command:

$ SLEIGHHOME=/opt/ghidra_10.1.2 ./decomp_dbg
[decomp]> restore /tmp/mystery_printf_main.xml
/tmp/mystery_printf_main.xml successfully loaded: Intel/AMD 64-bit x86

Next we can specify the function to decompile and set trace points on the individual instructions we want to observe analysis actions for. Here I’m including the addresses of the LEA and MOV instructions that set up the string structure on the stack, as well as the call to fmt.Fprintf:

[decomp]> load function main.main
Function main.main: 0x0048e790
[decomp]> trace address 0x48e7c4
OK (1 ranges)
[decomp]> trace address 0x48e7cb
OK (2 ranges)
[decomp]> trace address 0x48e7d0
OK (3 ranges)
[decomp]> trace address 0x48e7ea
OK (4 ranges)

Now we can use the decompile command to decompile the main.main function and trace the analysis process:

[decomp]> decompile
Decompiling main.main
DEBUG 0: extrapopsetup
0x0048e7ea:52: **
   0x0048e7ea:52: RSP(0x0048e7ea:52) = RSP(free) + #0x8

DEBUG 1: funclink
0x0048e7ea:57: **
   0x0048e7ea:57: u0x10000012(0x0048e7ea:57) = RSP(free) + #0x0
0x0048e7ea:58: **
   0x0048e7ea:58: u0x1000001a:1(0x0048e7ea:58) = *(ram,u0x10000012(0x0048e7ea:57))
0x0048e7ea:3a: call ffmt.Fprintf(free)
   0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58))

DEBUG 2: heritage
0x0048e7ea:5b: **
   0x0048e7ea:5b: RAX(0x0048e7ea:5b) = [create] i0x0048e7ea:3a(free)
0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58))
   0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58),RCX(0x0048e7b4:21),XMM0_Da(0x0048e7e2:31))
0x0048e7ea:5e: **
   0x0048e7ea:5e: RCX(0x0048e7ea:5e) = RCX(0x0048e7b4:21) [] i0x0048e7ea:3a(free)
...

Each trace entry shows the name of the rule that executed (after “DEBUG #”), followed by the list of P-Code transformations it performed. P-Code operations are identified by their target address and “time” value, which distinguishes between multiple P-Code operations for a single native machine instruction (e.g., 0x0048e7ea:3a vs 0x0048e7ea:5b). Double asterisks (“**”) after an address listing mean the op is dead or has no parent basic block (see PcodeOp::printDebug).

For example, this trace shows how the “Propagate Copy” rule replaces a reference to the RAX register with the constant value that was written to it by the COPY op at 0x0048e7c4:27:

DEBUG 5: propagatecopy
0x0048e7cb:29: u0x0000c000(0x0048e7cb:29) = RAX(0x0048e7c4:27)
   0x0048e7cb:29: u0x0000c000(0x0048e7cb:29) = #0x4c584a

To investigate these analysis steps in more detail via the source code, search for the name that appears after “DEBUG”. This name comes from the rule or action’s constructor:

class RulePropagateCopy : public Rule {
public:
  RulePropagateCopy(const string  g) : Rule(g, 0, "propagatecopy") {}   ///< Constructor

The implementation of a rule is found in its class’s applyOp method, e.g. RulePropagateCopy::applyOp.

Some of the trace output is a little cryptic, such as these lines with empty square brackets (“[]”):

0x0048e7ea:95: **
   0x0048e7ea:95: s0xffffffffffffffa0(0x0048e7ea:95) = s0xffffffffffffffa0(0x0048e7ea:39) [] i0x0048e7ea:3a(free)

Looking for the relevant printRaw method that creates the output line can help clarify its meaning. The square brackets happen to refer to INDIRECT operations. The “i”-prefixed address after the brackets shows an iop space address. This address space is used to store pointers to internal P-Code ops. Documentation for the IopSpace class and INDIRECT P-Code op both discuss the meaning of this kind of address:

The varnode input1 is not part of the machine state but is really an internal reference to a specific p-code operator that may be affecting the value of the output varnode. A special address space indicates input1’s use as an internal reference encoding.

The decompiler CLI print spaces command shows the mapping of address prefix characters to address spaces:

[decomp]> print spaces
0 : '#' const constant  small addrsize=8 wordsize=1 delay=0
1 : 'o' OTHER processor small addrsize=8 wordsize=1 delay=0
2 : 'u' unique internal  small addrsize=4 wordsize=1 delay=0
3 : 'r' ram processor small addrsize=8 wordsize=1 delay=1
4 : '%' register processor small addrsize=4 wordsize=1 delay=0
5 : 'f' fspec special   small addrsize=8 wordsize=1 delay=1
6 : 'i' iop special   small addrsize=8 wordsize=1 delay=1
7 : 'j' join special   small addrsize=4 wordsize=1 delay=0
8 : 's' stack spacebase small addrsize=8 wordsize=1 delay=1

Tracing the Decompiler Analysis

Now that we know a bit more about how to interpret the decompiler trace output, we can use it to understand what happens when we decompile the main.main function with a higher-level analysis style. For reference, the full trace of the main.main decompilation follows:

[decomp]> decompile
Decompiling main.main
DEBUG 0: extrapopsetup
0x0048e7ea:52: **
   0x0048e7ea:52: RSP(0x0048e7ea:52) = RSP(free) + #0x8

DEBUG 1: funclink
0x0048e7ea:57: **
   0x0048e7ea:57: u0x10000012(0x0048e7ea:57) = RSP(free) + #0x0
0x0048e7ea:58: **
   0x0048e7ea:58: u0x1000001a:1(0x0048e7ea:58) = *(ram,u0x10000012(0x0048e7ea:57))
0x0048e7ea:3a: call ffmt.Fprintf(free)
   0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58))

DEBUG 2: heritage
0x0048e7ea:5b: **
   0x0048e7ea:5b: RAX(0x0048e7ea:5b) = [create] i0x0048e7ea:3a(free)
0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58))
   0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58),RCX(0x0048e7b4:21),XMM0_Da(0x0048e7e2:31))
0x0048e7ea:5e: **
   0x0048e7ea:5e: RCX(0x0048e7ea:5e) = RCX(0x0048e7b4:21) [] i0x0048e7ea:3a(free)
0x0048e7ea:61: **
   0x0048e7ea:61: FS_OFFSET(0x0048e7ea:61) = FS_OFFSET(i) [] i0x0048e7ea:3a(free)
0x0048e7ea:64: **
   0x0048e7ea:64: CF(0x0048e7ea:64) = CF(0x0048e79f:12) [] i0x0048e7ea:3a(free)
0x0048e7ea:67: **
   0x0048e7ea:67: PF(0x0048e7ea:67) = PF(0x0048e79f:1a) [] i0x0048e7ea:3a(free)
0x0048e7ea:6a: **
   0x0048e7ea:6a: ZF(0x0048e7ea:6a) = ZF(0x0048e79f:16) [] i0x0048e7ea:3a(free)
0x0048e7ea:6d: **
   0x0048e7ea:6d: SF(0x0048e7ea:6d) = SF(0x0048e79f:15) [] i0x0048e7ea:3a(free)
0x0048e7ea:70: **
   0x0048e7ea:70: DF(0x0048e7ea:70) = DF(0x0048e790:4f) [] i0x0048e7ea:3a(free)
0x0048e7ea:73: **
   0x0048e7ea:73: OF(0x0048e7ea:73) = OF(0x0048e79f:13) [] i0x0048e7ea:3a(free)
0x0048e7ea:76: **
   0x0048e7ea:76: RIP(0x0048e7ea:76) = RIP(i) [] i0x0048e7ea:3a(free)
0x0048e7ea:7c: **
   0x0048e7ea:7c: XMM0_Da(0x0048e7ea:7c) = [create] i0x0048e7ea:3a(free)
0x0048e7ea:7f: **
   0x0048e7ea:7f: XMM0_Db(0x0048e7ea:7f) = [create] i0x0048e7ea:3a(free)
0x0048e7ea:82: **
   0x0048e7ea:82: XMM0_Dc(0x0048e7ea:82) = [create] i0x0048e7ea:3a(free)
0x0048e7ea:85: **
   0x0048e7ea:85: XMM0_Dd(0x0048e7ea:85) = [create] i0x0048e7ea:3a(free)
0x0048e7cb:28: u0x00003800(0x0048e7cb:28) = #0x10 + RSP(free)
   0x0048e7cb:28: u0x00003800(0x0048e7cb:28) = #0x10 + RSP(0x0048e79f:14)
0x0048e7cb:29: u0x0000c000(0x0048e7cb:29) = RAX(free)
   0x0048e7cb:29: u0x0000c000(0x0048e7cb:29) = RAX(0x0048e7c4:27)
0x0048e7cb:2a: *(ram,u0x00003800(free)) = u0x0000c000(free)
   0x0048e7cb:2a: *(ram,u0x00003800(0x0048e7cb:28)) = u0x0000c000(0x0048e7cb:29)
0x0048e7d0:2b: u0x00003800(0x0048e7d0:2b) = #0x18 + RSP(free)
   0x0048e7d0:2b: u0x00003800(0x0048e7d0:2b) = #0x18 + RSP(0x0048e79f:14)
0x0048e7d0:2d: *(ram,u0x00003800(free)) = u0x0000c080(free)
   0x0048e7d0:2d: *(ram,u0x00003800(0x0048e7d0:2b)) = u0x0000c080(0x0048e7d0:2c)
0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(free) - #0x8
   0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(0x0048e79f:14) - #0x8
0x0048e7ea:39: *(ram,RSP(free)) = #0x48e7ef
   0x0048e7ea:39: *(ram,RSP(0x0048e7ea:38)) = #0x48e7ef
0x0048e7ea:57: u0x10000012(0x0048e7ea:57) = RSP(free) + #0x0
   0x0048e7ea:57: u0x10000012(0x0048e7ea:57) = RSP(0x0048e7ea:38) + #0x0
0x0048e7ea:52: RSP(0x0048e7ea:52) = RSP(free) + #0x8
   0x0048e7ea:52: RSP(0x0048e7ea:52) = RSP(0x0048e7ea:38) + #0x8

DEBUG 3: deadcode
0x0048e7ea:5b: RAX(0x0048e7ea:5b) = [create] i0x0048e7ea:3a(free)
   0x0048e7ea:5b: **
0x0048e7ea:5e: RCX(0x0048e7ea:5e) = RCX(0x0048e7b4:21) [] i0x0048e7ea:3a(free)
   0x0048e7ea:5e: **
0x0048e7ea:52: RSP(0x0048e7ea:52) = RSP(0x0048e7ea:38) + #0x8
   0x0048e7ea:52: **
0x0048e7ea:61: FS_OFFSET(0x0048e7ea:61) = FS_OFFSET(i) [] i0x0048e7ea:3a(free)
   0x0048e7ea:61: **
0x0048e7ea:64: CF(0x0048e7ea:64) = CF(0x0048e79f:12) [] i0x0048e7ea:3a(free)
   0x0048e7ea:64: **
0x0048e7ea:67: PF(0x0048e7ea:67) = PF(0x0048e79f:1a) [] i0x0048e7ea:3a(free)
   0x0048e7ea:67: **
0x0048e7ea:6a: ZF(0x0048e7ea:6a) = ZF(0x0048e79f:16) [] i0x0048e7ea:3a(free)
   0x0048e7ea:6a: **
0x0048e7ea:6d: SF(0x0048e7ea:6d) = SF(0x0048e79f:15) [] i0x0048e7ea:3a(free)
   0x0048e7ea:6d: **
0x0048e7ea:70: DF(0x0048e7ea:70) = DF(0x0048e790:4f) [] i0x0048e7ea:3a(free)
   0x0048e7ea:70: **
0x0048e7ea:73: OF(0x0048e7ea:73) = OF(0x0048e79f:13) [] i0x0048e7ea:3a(free)
   0x0048e7ea:73: **
0x0048e7ea:76: RIP(0x0048e7ea:76) = RIP(i) [] i0x0048e7ea:3a(free)
   0x0048e7ea:76: **
0x0048e7ea:7c: XMM0_Da(0x0048e7ea:7c) = [create] i0x0048e7ea:3a(free)
   0x0048e7ea:7c: **
0x0048e7ea:7f: XMM0_Db(0x0048e7ea:7f) = [create] i0x0048e7ea:3a(free)
   0x0048e7ea:7f: **
0x0048e7ea:82: XMM0_Dc(0x0048e7ea:82) = [create] i0x0048e7ea:3a(free)
   0x0048e7ea:82: **
0x0048e7ea:85: XMM0_Dd(0x0048e7ea:85) = [create] i0x0048e7ea:3a(free)
   0x0048e7ea:85: **

DEBUG 4: termorder
0x0048e7cb:28: u0x00003800(0x0048e7cb:28) = #0x10 + RSP(0x0048e79f:14)
   0x0048e7cb:28: u0x00003800(0x0048e7cb:28) = RSP(0x0048e79f:14) + #0x10

DEBUG 5: propagatecopy
0x0048e7cb:29: u0x0000c000(0x0048e7cb:29) = RAX(0x0048e7c4:27)
   0x0048e7cb:29: u0x0000c000(0x0048e7cb:29) = #0x4c584a

DEBUG 6: propagatecopy
0x0048e7cb:2a: *(ram,u0x00003800(0x0048e7cb:28)) = u0x0000c000(0x0048e7cb:29)
   0x0048e7cb:2a: *(ram,u0x00003800(0x0048e7cb:28)) = #0x4c584a

DEBUG 7: termorder
0x0048e7d0:2b: u0x00003800(0x0048e7d0:2b) = #0x18 + RSP(0x0048e79f:14)
   0x0048e7d0:2b: u0x00003800(0x0048e7d0:2b) = RSP(0x0048e79f:14) + #0x18

DEBUG 8: propagatecopy
0x0048e7d0:2d: *(ram,u0x00003800(0x0048e7d0:2b)) = u0x0000c080(0x0048e7d0:2c)
   0x0048e7d0:2d: *(ram,u0x00003800(0x0048e7d0:2b)) = #0x15

DEBUG 9: sub2add
0x0048e7ea:88: **
   0x0048e7ea:88: u0x1000004f(0x0048e7ea:88) = #0x8 * #0xffffffffffffffff
0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(0x0048e79f:14) - #0x8
   0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(0x0048e79f:14) + u0x1000004f(0x0048e7ea:88)

DEBUG 10: propagatecopy
0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58),RCX(0x0048e7b4:21),XMM0_Da(0x0048e7e2:31))
   0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58),#0x4dc7c0,XMM0_Da(0x0048e7e2:31))

DEBUG 11: identityel
0x0048e7ea:57: u0x10000012(0x0048e7ea:57) = RSP(0x0048e7ea:38) + #0x0
   0x0048e7ea:57: u0x10000012(0x0048e7ea:57) = RSP(0x0048e7ea:38)

DEBUG 12: propagatecopy
0x0048e7ea:58: u0x1000001a:1(0x0048e7ea:58) = *(ram,u0x10000012(0x0048e7ea:57))
   0x0048e7ea:58: u0x1000001a:1(0x0048e7ea:58) = *(ram,RSP(0x0048e7ea:38))

DEBUG 13: collapseconstants
0x0048e7ea:88: u0x1000004f(0x0048e7ea:88) = #0x8 * #0xffffffffffffffff
   0x0048e7ea:88: u0x1000004f(0x0048e7ea:88) = #0xfffffffffffffff8

DEBUG 14: earlyremoval
0x0048e7c4:27: RAX(0x0048e7c4:27) = #0x4c584a
   0x0048e7c4:27: **

DEBUG 15: addmultcollapse
0x0048e7cb:28: u0x00003800(0x0048e7cb:28) = RSP(0x0048e79f:14) + #0x10
   0x0048e7cb:28: u0x00003800(0x0048e7cb:28) = RSP(i) + #0xffffffffffffffb8

DEBUG 16: earlyremoval
0x0048e7cb:29: u0x0000c000(0x0048e7cb:29) = #0x4c584a
   0x0048e7cb:29: **

DEBUG 17: addmultcollapse
0x0048e7d0:2b: u0x00003800(0x0048e7d0:2b) = RSP(0x0048e79f:14) + #0x18
   0x0048e7d0:2b: u0x00003800(0x0048e7d0:2b) = RSP(i) + #0xffffffffffffffc0

DEBUG 18: earlyremoval
0x0048e7d0:2c: u0x0000c080(0x0048e7d0:2c) = #0x15
   0x0048e7d0:2c: **

DEBUG 19: propagatecopy
0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(0x0048e79f:14) + u0x1000004f(0x0048e7ea:88)
   0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(0x0048e79f:14) + #0xfffffffffffffff8

DEBUG 20: propagatecopy
0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58),#0x4dc7c0,XMM0_Da(0x0048e7e2:31))
   0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58),#0x4dc7c0,#0x0:4)

DEBUG 21: earlyremoval
0x0048e7ea:57: u0x10000012(0x0048e7ea:57) = RSP(0x0048e7ea:38)
   0x0048e7ea:57: **

DEBUG 22: earlyremoval
0x0048e7ea:88: u0x1000004f(0x0048e7ea:88) = #0xfffffffffffffff8
   0x0048e7ea:88: **

DEBUG 23: addmultcollapse
0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(0x0048e79f:14) + #0xfffffffffffffff8
   0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(i) + #0xffffffffffffffa0

DEBUG 24: storevarnode
0x0048e7cb:2a: *(ram,u0x00003800(0x0048e7cb:28)) = #0x4c584a
   0x0048e7cb:2a: s0xffffffffffffffb8(0x0048e7cb:2a) = #0x4c584a

DEBUG 25: storevarnode
0x0048e7d0:2d: *(ram,u0x00003800(0x0048e7d0:2b)) = #0x15
   0x0048e7d0:2d: s0xffffffffffffffc0(0x0048e7d0:2d) = #0x15

DEBUG 26: storevarnode
0x0048e7ea:39: *(ram,RSP(0x0048e7ea:38)) = #0x48e7ef
   0x0048e7ea:39: s0xffffffffffffffa0(0x0048e7ea:39) = #0x48e7ef

DEBUG 27: loadvarnode
0x0048e7ea:58: u0x1000001a:1(0x0048e7ea:58) = *(ram,RSP(0x0048e7ea:38))
   0x0048e7ea:58: u0x1000001a:1(0x0048e7ea:58) = s0xffffffffffffffa0:1(free)
0x0048e7ea:3a: call ffmt.Fprintf(free)(u0x1000001a:1(0x0048e7ea:58),#0x4dc7c0,#0x0:4)
   0x0048e7ea:3a: call ffmt.Fprintf(free)(#0x4dc7c0,#0x0:4)

DEBUG 28: heritage
0x0048e7ea:8c: **
   0x0048e7ea:8c: r0x005601b0(0x0048e7ea:8c) = r0x005601b0(i) [] i0x0048e7ea:3a(free)
0x0048e7ea:3a: call ffmt.Fprintf(free)(#0x4dc7c0,#0x0:4)
   0x0048e7ea:3a: call ffmt.Fprintf(free)(#0x4dc7c0,#0x0:4,s0x00000000:1(i),s0xffffffffffffffa8(0x0048e7bb:23),s0xffffffffffffffb0(0x0048e7bf:26),s0xffffffffffffffb8(0x0048e7cb:2a),s0xffffffffffffffc0(0x0048e7d0:2d),s0xffffffffffffffc8(0x0048e7d9:30),s0xffffffffffffffd0:10(0x0048e7e5:37),s0xfffffffffffffff8(0x0048e7a3:1d))
0x0048e7ea:91: **
   0x0048e7ea:91: s0x00000000:1(0x0048e7ea:91) = s0x00000000:1(i) [] i0x0048e7ea:3a(free)
0x0048e7ea:92: **
   0x0048e7ea:92: s0xffffffffffffffa0:1(0x0048e7ea:92) = SUB81(s0xffffffffffffffa0(0x0048e7ea:39),#0x0)
0x0048e7ea:95: **
   0x0048e7ea:95: s0xffffffffffffffa0(0x0048e7ea:95) = s0xffffffffffffffa0(0x0048e7ea:39) [] i0x0048e7ea:3a(free)
0x0048e7ea:98: **
   0x0048e7ea:98: s0xffffffffffffffa8(0x0048e7ea:98) = s0xffffffffffffffa8(0x0048e7bb:23) [] i0x0048e7ea:3a(free)
0x0048e7ea:9b: **
   0x0048e7ea:9b: s0xffffffffffffffb0(0x0048e7ea:9b) = s0xffffffffffffffb0(0x0048e7bf:26) [] i0x0048e7ea:3a(free)
0x0048e7ea:9e: **
   0x0048e7ea:9e: s0xffffffffffffffb8(0x0048e7ea:9e) = s0xffffffffffffffb8(0x0048e7cb:2a) [] i0x0048e7ea:3a(free)
0x0048e7ea:a1: **
   0x0048e7ea:a1: s0xffffffffffffffc0(0x0048e7ea:a1) = s0xffffffffffffffc0(0x0048e7d0:2d) [] i0x0048e7ea:3a(free)
0x0048e7ea:a4: **
   0x0048e7ea:a4: s0xffffffffffffffc8(0x0048e7ea:a4) = s0xffffffffffffffc8(0x0048e7d9:30) [] i0x0048e7ea:3a(free)
0x0048e7ea:a7: **
   0x0048e7ea:a7: s0xffffffffffffffd0:10(0x0048e7ea:a7) = s0xffffffffffffffd0:10(0x0048e7e5:37) [] i0x0048e7ea:3a(free)
0x0048e7ea:ab: **
   0x0048e7ea:ab: s0xfffffffffffffff8(0x0048e7ea:ab) = s0xfffffffffffffff8(0x0048e7a3:1d) [] i0x0048e7ea:3a(free)

DEBUG 29: activeparam
0x0048e7ea:3a: call ffmt.Fprintf(free)(#0x4dc7c0,#0x0:4,s0x00000000:1(i),s0xffffffffffffffa8(0x0048e7bb:23),s0xffffffffffffffb0(0x0048e7bf:26),s0xffffffffffffffb8(0x0048e7cb:2a),s0xffffffffffffffc0(0x0048e7d0:2d),s0xffffffffffffffc8(0x0048e7d9:30),s0xffffffffffffffd0:10(0x0048e7e5:37),s0xfffffffffffffff8(0x0048e7a3:1d))
   0x0048e7ea:3a: call ffmt.Fprintf(free)

DEBUG 30: deadcode
0x0048e7cb:28: u0x00003800(0x0048e7cb:28) = RSP(i) + #0xffffffffffffffb8
   0x0048e7cb:28: **
0x0048e7d0:2b: u0x00003800(0x0048e7d0:2b) = RSP(i) + #0xffffffffffffffc0
   0x0048e7d0:2b: **
0x0048e7ea:58: u0x1000001a:1(0x0048e7ea:58) = s0xffffffffffffffa0:1(0x0048e7ea:92)
   0x0048e7ea:58: **
0x0048e7ea:38: RSP(0x0048e7ea:38) = RSP(i) + #0xffffffffffffffa0
   0x0048e7ea:38: **
0x0048e7ea:91: s0x00000000:1(0x0048e7ea:91) = s0x00000000:1(i) [] i0x0048e7ea:3a(free)
   0x0048e7ea:91: **
0x0048e7ea:92: s0xffffffffffffffa0:1(0x0048e7ea:92) = SUB81(s0xffffffffffffffa0(0x0048e7ea:39),#0x0)
   0x0048e7ea:92: **
0x0048e7ea:ab: s0xfffffffffffffff8(0x0048e7ea:ab) = s0xfffffffffffffff8(0x0048e7a3:1d) [] i0x0048e7ea:3a(free)
   0x0048e7ea:ab: **

DEBUG 31: earlyremoval
0x0048e7ea:95: s0xffffffffffffffa0(0x0048e7ea:95) = s0xffffffffffffffa0(0x0048e7ea:39) [] i0x0048e7ea:3a(free)
   0x0048e7ea:95: **

DEBUG 32: earlyremoval
0x0048e7ea:98: s0xffffffffffffffa8(0x0048e7ea:98) = s0xffffffffffffffa8(0x0048e7bb:23) [] i0x0048e7ea:3a(free)
   0x0048e7ea:98: **

DEBUG 33: earlyremoval
0x0048e7ea:9b: s0xffffffffffffffb0(0x0048e7ea:9b) = s0xffffffffffffffb0(0x0048e7bf:26) [] i0x0048e7ea:3a(free)
   0x0048e7ea:9b: **

DEBUG 34: earlyremoval
0x0048e7ea:9e: s0xffffffffffffffb8(0x0048e7ea:9e) = s0xffffffffffffffb8(0x0048e7cb:2a) [] i0x0048e7ea:3a(free)
   0x0048e7ea:9e: **

DEBUG 35: earlyremoval
0x0048e7ea:a1: s0xffffffffffffffc0(0x0048e7ea:a1) = s0xffffffffffffffc0(0x0048e7d0:2d) [] i0x0048e7ea:3a(free)
   0x0048e7ea:a1: **

DEBUG 36: earlyremoval
0x0048e7ea:a4: s0xffffffffffffffc8(0x0048e7ea:a4) = s0xffffffffffffffc8(0x0048e7d9:30) [] i0x0048e7ea:3a(free)
   0x0048e7ea:a4: **

DEBUG 37: earlyremoval
0x0048e7ea:a7: s0xffffffffffffffd0:10(0x0048e7ea:a7) = s0xffffffffffffffd0:10(0x0048e7e5:37) [] i0x0048e7ea:3a(free)
   0x0048e7ea:a7: **

DEBUG 38: earlyremoval
0x0048e7cb:2a: s0xffffffffffffffb8(0x0048e7cb:2a) = #0x4c584a
   0x0048e7cb:2a: **

DEBUG 39: earlyremoval
0x0048e7d0:2d: s0xffffffffffffffc0(0x0048e7d0:2d) = #0x15
   0x0048e7d0:2d: **

DEBUG 40: earlyremoval
0x0048e7ea:39: s0xffffffffffffffa0(0x0048e7ea:39) = #0x48e7ef
   0x0048e7ea:39: **

Decompilation complete

Right near the end of the trace output, at steps 38 and 39, we can see exactly the type of P-Code ops that would be ideal for our analysis:

DEBUG 38: earlyremoval
0x0048e7cb:2a: s0xffffffffffffffb8(0x0048e7cb:2a) = #0x4c584a
   0x0048e7cb:2a: **

DEBUG 39: earlyremoval
0x0048e7d0:2d: s0xffffffffffffffc0(0x0048e7d0:2d) = #0x15
   0x0048e7d0:2d: **

These copy the string address and length values to offsets in the stack address space. This would be great, except earlyremoval is deleting them!

Earlier, at steps 24 and 25, these ops are created by the storevarnode rule:

DEBUG 24: storevarnode
0x0048e7cb:2a: *(ram,u0x00003800(0x0048e7cb:28)) = #0x4c584a
   0x0048e7cb:2a: s0xffffffffffffffb8(0x0048e7cb:2a) = #0x4c584a

DEBUG 25: storevarnode
0x0048e7d0:2d: *(ram,u0x00003800(0x0048e7d0:2b)) = #0x15
   0x0048e7d0:2d: s0xffffffffffffffc0(0x0048e7d0:2d) = #0x15

At step 28, those destination stack varnodes are associated with the fmt.Fprintf function call by the heritage action. In the same step, they’re also assigned to what appears to be a different Static Single Assignment (SSA) form version of the same stack location:

0x0048e7ea:9e: **
   0x0048e7ea:9e: s0xffffffffffffffb8(0x0048e7ea:9e) = s0xffffffffffffffb8(0x0048e7cb:2a) [] i0x0048e7ea:3a(free)
0x0048e7ea:a1: **
   0x0048e7ea:a1: s0xffffffffffffffc0(0x0048e7ea:a1) = s0xffffffffffffffc0(0x0048e7d0:2d) [] i0x0048e7ea:3a(free)

Then at step 29, the activeparam action removes all parameters from the fmt.Fprintf call (note that the “Decompile Parameter ID” analysis wasn’t performed on the program before generating this decompilation request XML payload). Even if the function call parameters were retained, once the stack parameters can be associated with the function call, the original ops that set them up are no longer needed and get marked as dead code.

I looked into manipulating the root action configurations and manually editing the fmt.Fprintf function signature, but I couldn’t find an obvious way to prevent the stack write operations from being associated with the call operation. A lower-level analysis style that preserves stack operations and doesn’t depend on correct function call or data structure definitions would be great to have, but I think it would require more involved modification of the decompiler source code.

Instead, I decided to try disabling the earlyremoval rule itself to preserve the stack copy operations. The decompiler has some code for parsing root action configurations from XML (the setSimplificationStyle documentation seems to refer to this when it mentions that applications might eventually be able to define their own analysis classes), but Ghidra doesn’t have any implementation to generate this XML.

I found that the DecompileOptions class in the Ghidra API is susceptible to XML injection,⁴ so I was able to directly inject the necessary XML elements from my script anyway. The injected currentaction element below disables the deadcode base group in the decompiler:

String protoEvalModel = options.getProtoEvalModel();
options.setProtoEvalModel(protoEvalModel + "n" +
        "n"
        + "   " + SIMPLIFICATION_STYLE + "n"
        + "   deadcoden"
        + "   offn"
        + " n"
        + "" + protoEvalModel);

I don’t consider this XML injection issue to be a vulnerability because the injection is triggered through arbitrary script code the user is voluntarily running; I didn’t see a way to abuse this via a malicious binary. This trick mainly saves me the trouble of customizing Ghidra itself to extend the decompiler interface.

With dead code elimination disabled, the normalize style output now produces the ideal COPY operations for analysis:

0x48e7c4:0x27   (register, 0x0, 8) COPY (const, 0x4c584a, 8)

0x48e7cb:0x28   (unique, 0x3800, 8) INT_ADD (register, 0x20, 8) , (const, 0xffffffffffffffb8, 8)
0x48e7cb:0x29   (unique, 0xc000, 8) COPY (const, 0x4c584a, 8)
0x48e7cb:0x2a   (stack, 0xffffffffffffffb8, 8) COPY (const, 0x4c584a, 8)

0x48e7d0:0x2b   (unique, 0x3800, 8) INT_ADD (register, 0x20, 8) , (const, 0xffffffffffffffc0, 8)
0x48e7d0:0x2c   (unique, 0xc080, 8) COPY (const, 0x15, 8)
0x48e7d0:0x2d   (stack, 0xffffffffffffffc0, 8) COPY (const, 0x15, 8)

An unintended side effect of disabling dead code elimination is that redundant intermediate versions of some operations won’t be removed anymore. This hinders analysis based on finding certain patterns of COPY operations. For example, an analysis loop that looks for a length copy operation directly adjacent to an address copy operation would not catch the above sequence due to the redundant intermediate versions of the same COPY (one to the unique space, one to the stack space). Despite that, the lifted COPY operation output is much easier to work with and still helps recover a significant number of strings.

Normalize Style Analysis

The COPY op’s input will now just be a constant value, there’s no need to traverse its defining operations. We can just check if the constant value corresponds to an address in the memory block where string content is stored (e.g., .rodata for ELF binaries or .rdata for PE), or if it looks like a reasonable string length. For the output varnode, we just check if its address is in the stack address space and then get its offset.

To demonstrate how this eases analysis, here’s how the string length value check looks with the simplified COPY ops:

protected LengthCandidate storeLenCheck(Program program, PcodeOpAST pcodeOpAST) {
    if (pcodeOpAST.getOpcode() != PcodeOp.COPY)
        return null;

    // Get input, make sure it's a constant
    Varnode dataToStore = pcodeOpAST.getInput(0);
    if (!dataToStore.isConstant())
        return null;
    long constantValue = dataToStore.getAddress().getOffset();

    // Simple string length bounds check
    if (constantValue < MIN_STR_LEN || constantValue > MAX_STR_LEN) {
        return null;
    }

    // If output is a stack address, get the offset
    Varnode storeLoc = pcodeOpAST.getOutput();
    if (!storeLoc.getAddress().isStackAddress()) {
        return null;
    }
    Long stackOffset = storeLoc.getAddress().getOffset();

    LengthCandidate result = new LengthCandidate((int) constantValue, stackOffset, pcodeOpAST);
    return result;
}

Due to the deadcode disable hack we’ll miss some strings that the register style analysis script finds, but there’s a property of the string data blob we can use to help fill in the remaining gaps: the strings are stored in ascending length order. Using the known lengths of two identified strings and the size of the gap between them, we can sometimes uniquely identify what size strings could exist in the gap. The end result for the register and hacked normalize analysis styles is generally the same after performing the gap filler step.

Of course, these hacks are not ideal: the XML injection vector could be patched in a future release of Ghidra, and disabling all dead code removal leaves unwanted operations around. Assuming the decompiler analysis configuration interface might later be exposed through the Ghidra API, I checked which rules are necessary to produce the stack COPY operations. Adding the stackvars base group to the register root action is enough to produce the simplified COPY ops, but without adding the deadcode base group as well, the problem of the leftover intermediate transformations remains.

I noticed an interesting command in the decompiler CLI called deadcode delay. The class documentation for the command handler shows that it delays dead code elimination in a specific address space:

/// class IfcDeadcodedelay
/// brief Change when dead code elimination starts: `deadcode delay  `
///
/// An address space is selected by name, along with a pass number.
/// Dead code elimination for Varnodes in that address space is changed to start
/// during that pass.  If there is a e current function, the delay is altered only for
/// that function, otherwise the delay is set globally for all functions.

Instead of disabling all dead code elimination, I could use this to disable dead code elimination for varnodes specifically in the stack space.

[decomp]> deadcode delay stack 40
Successfully overrided deadcode delay for single function
[decomp]> decompile
Clearing old decompilation
Decompiling main.main
...

I haven’t included the trace output here because storevarnode still creates the same COPY ops shown before; the ops are just never deleted by the end of the trace. This seemed promising, but the only way to use this feature outside of the debug CLI was to edit the current architecture’s compiler specification file, such as Ghidra/Processors/ARM/data/languages/ARM.cspec for ARM. The decompiler will check the compiler_spec element for a deadcodedelay element with this format:

 space="stack" delay="40"/>

This is yet another kludge, but it eliminates the problem with the intermediate COPY op forms. Here is the P-Code output with dead code delay enabled:

PrintHighPCode.java> Running...
PCode for function main.main @ 0048e790 (simplification style: normalize)
...
0x48e7bf:0x26   (stack, 0xffffffffffffffb0, 8) COPY (ram, 0x5601b0, 8)
0x48e7cb:0x2a   (stack, 0xffffffffffffffb8, 8) COPY (const, 0x4c584a, 8)
0x48e7d0:0x2d   (stack, 0xffffffffffffffc0, 8) COPY (const, 0x15, 8)
0x48e7d9:0x30   (stack, 0xffffffffffffffc8, 8) COPY (const, 0x0, 8)
0x48e7e5:0x37   (stack, 0xffffffffffffffd0, 16) COPY (unique, 0x1000001b, 16)
0x48e7e5:0x79   (unique, 0x1000001b, 16) INT_ZEXT (const, 0x0, 8)
0x48e7ea:0x39   (stack, 0xffffffffffffffa0, 8) COPY (const, 0x48e7ef, 8)
0x48e7ea:0x3a    ---  CALL (ram, 0x4866f0, 8)
...
PrintHighPCode.java> Finished!

The normalize style analysis using deadcodedelay performs at least as well as the original register style analysis – in some cases better – and the implementation is much simpler. I can’t use the XML injection trick to enable this only when running an analysis script, however, so for now this is just information to aid future work.

Concluding Thoughts

Thinking beyond Go binary analysis or this specific string recovery problem, for automated program analysis I would generally like to have a medium-level form of lifted P-Code output that doesn’t depend on correct function signature, calling convention, or data structure definitions (along the lines of the Low Level and Medium Level IL concept in Binary Ninja). The ideal solution for analyzing data structure creation on the stack would be a normalize-like simplification style that preserves stack operations that would otherwise be consumed by function call analysis. This is the main opportunity for future work I’m interested in.

For now, when implementing new P-Code analysis scripts where I have no need for source code output, I’d favor the decompiler’s normalize simplification style. When the normalize P-Code output is unsuitable due to issues like the one I explored in this blog post, I’d fall back to the register style, at the cost of needing to reimplement some decompiler analysis passes in my Ghidra script.

Ghostrings Release

The Go string definition recovery scripts described in this article have been released as “Ghostrings”. See the tool release announcement post at https://research.nccgroup.com/2022/05/20/tool-release-ghostrings/. This release also includes the PrintHighPCode.java script for decompiling a function with a certain simplification style and inspecting the P-Code output.

References

Golang Reverse Engineering

Ghidra Decompiler and P-Code Analysis

For example, see the x86-64 string defining patterns listed in https://github.com/SentineLabs/AlphaGolang/blob/main/4.string_cast.py#L86 ↩︎
See DecompilerFunctionAnalyzer.java ↩︎
The “unique” space is “a pool of temporary registers that can hold data but that aren’t a formal part of the state of the processor”.↩︎
The raw protoEvalModel option string is copied directly into an XML string here.↩︎

James Chambers

James is a Senior Security Consultant in the NCC Group Hardware & Embedded Systems practice.